How do I calculate the probability that a data storage device breaks?

There are lots of different storage media. To name a few:

  • DVD’s
  • CD’s
  • Normal hard drives
  • SSD hard drives
  • USB flash sticks

Let’s say that I have saved some files to a certain media. How would I calculate the probability that the media/device will break within X time units and I’ll be unable to access the contents?

Are there any good sources which provide such statistics and formulas for different devices and media types?

I want numbers and formulas if possible. Going with “Use a son, father and grandfather type backup scheme where the grandfather is duplicated and stored in two different secure locations” might be fine advice indeed. But I want to be able to calculate the probability that a device/media fails, based on some real world statistics.

Answer

Let’s start with hard drives. There are three good studies giving real-world statistics on a large enough quantity of those to be interesting: Carnegie Mellon, Google, and Netapp. The stats that mean something are the annual failure rates (AFR), how often a drive is to fail within a given year. One unsurprising result these studies show is that the manufacturer specs like Mean Time Before Failure wildly understate the chance of disk failure in a year. The numbers vary based on conditions, the rule of thumb I’ve extracted from them is that under best conditions with good equipment, you might hit a 2% AFR, but you should expect a worst-case AFR of closer to 10%.

If you have two devices with copies of the same data and their odds of failure are statistically independent, you can just multiply the percentages together to get the chance that both will fail. For example, given two hard drives with a 5% of failure (a reasonable middle of the road value), the odds you’ll lose both of them in a given year is 5% * 5% = 0.25%. Now, if both drives are in the same system, the odds of something taking out both drives are much higher than that–they are far from independent–so the actual odds here are somewhere between 5% and 0.25%; impossible to get closer than that without digging into the stats for things like controller and power supply failures. See Standard RAID levels for more examples and background here.

Returning to your original question, what about other types of media? Despite the fact that MTBF rate has proven to be a very optimistic value for hard drives, for many other media types that’s the best rating you’re going to find. You can combine MTBF or its cousin MTTF of multiple devices using the Online Reliability Calculator. You might be able to find some real-world studies of the other media types you’re considering that help you find more realistic MTBF figures for those, rather than using the manufacturer’s numbers.

Attribution
Source : Link , Question Author : Deleted , Answer Author : Greg Smith

Leave a Comment