What Does 11 Nines of Durability Really Mean?
Data storage is serious business. Customers entrust their data to their storage vendors with the understanding that it will be there when they want it. No excuses.
I was a founder and CEO of Carbonite, a well-known backup company. Carbonite backs up about a half a billion computer files every day. So we know a few things about data loss in the cloud and what it can mean to customers. Here’s what I can tell you.
Cloud storage is more reliable than physical storage
People are inherently more comfortable with the notion of physical storage than they are with storing data in the cloud. That’s understandable. We live in a physical world where losing something like your car keys is tangible and easy to grasp. Perhaps that’s why some IT professionals (erroneously) think that having data stored in their own data centers is somehow inherently safer than storing it in a public cloud. When dealing with something as ethereal as millions of computer files, most people don’t have a good gut feel for how reliable data storage needs to be in order to avoid costly and embarrassing losses.
Even physical storage of paper documents or tapes is not foolproof.
From Computer Weekly…
“Data storage company Iron Mountain has admitted losing backup tapes containing the data of thousands of employees at one of its customers. This is the third major data breach to affect Iron Mountain.”
From The New York Times…
“Time Warner said the data, on 40 tapes in a container the size of a cooler, disappeared more than a month ago while being shipped to an offsite storage center. Iron Mountain issued a statement saying, ‘Iron Mountain performs upwards of five million pickups and deliveries of backup tapes each year, with greater than 99.999% reliability.’”
What does 5 nines of reliability mean?
Let’s do the math. If 99% reliability means that you will lose one object out of 100 every year, then 99.999% (5 nines) reliability means that you will lose one object out of 100,000 objects every year. Iron Mountain makes 5 million pickups and deliveries a year, so by their numbers you can expect them to lose 50 objects per year. That’s probably consistent with the losses we read about in the newspapers.
By contrast, top-tier cloud storage vendors, including Amazon S3, Microsoft Azure, and my company, Wasabi, offer 11 nines of reliability (or durability as they say in the industry). That makes such cloud storage 1 million times more reliable than Iron Mountain’s physical storage. In other words, if you gave Amazon or Wasabi 1 million files to store, statistically they would lose one file every 659,000 years. You are about 411 times more likely to get hit by a meteor.
Let’s try to look at this in a more tangible way. At Wasabi, we store billions of “objects,” or files that customers have sent us. On average, files are about 800 MB in size. So if your organization is storing 1 PB of data, it’s likely that you have something like 1.2 billion objects.
If your storage were 99% reliable, that would mean that you would lose one out of every 100 objects every year. The least durable commercial cloud storage is Amazon S3 Reduced Redundancy Storage (RRS) which is spec’d at 99.99%. Using RRS, you could expect to lose .01% of your files every year, or .0001 x 1.2B = 12 million lost files per year. Here’s a table with some representative products and the expected data loss per year:
Active Integrity Checking means extra protection
With either S3 RRS or lower reliability services like Backblaze B2, the problem is that you won’t know you’ve lost files until you try to use them. It’s not like when you lose all your files and can restore them from a backup. Let’s say you store your data for five years in Backblaze B2. After five years you would expect to accumulate 600 lost files (5 x 120). Backups from five years ago are probably gone, leaving you with permanent data loss. That’s why many IT managers resort to annual (or more frequent) testing of all their data to create and test checksums on what they actually have in storage. If a mismatch is found, hopefully, there is another copy somewhere that they can access to restore a corrupted or missing file.
Wasabi does a checksum compare every 90 days–what we call Active Integrity Checking. Since there are effectively five copies of every piece of data to achieve 11 nines, any one copy that becomes corrupted or lost can be quickly and reliably restored. With 11 nines of durability, the likelihood is that you will never experience data loss in your lifetime. So why replicate data to a second data center?
It’s all about availability
Replicating your data in a second data center at a different location gets you two things: insurance against a local disaster (flood, fire, earthquake) that could physically destroy one of the data centers, and increased availability. Data centers can and do go offline from time to time due to power or local Internet failures. If a data center guarantees 99.9% uptime, that means that it will be offline .1% of the time, or about 9 hours per year. Geographic replication would give you 99.9999% uptime, or 1/1000th the amount of downtime. This level of availability may or may not be worth the extra money; it really depends on your application and what any amount of downtime means to your business.
No amount of nines can prevent data loss
There is one very important and inconvenient truth about reliability: Two-thirds of all data loss has nothing to do with hardware failure.
The real culprits are a combination of human error, viruses, bugs in application software, and malicious employees or intruders. Almost everyone has accidentally erased or overwritten a file. Even if your cloud storage had one million nines of durability, it can’t protect you from human error.
For this reason, Wasabi introduced the notion of the “immutable bucket”—storage that cannot be erased or modified by anyone–not even the admin or anyone at Wasabi. Once you write it, it’s there until the hold time that you designated expires. If someone tries to erase or modify an immutable file, you just get an error message. I’ve written a whole blog post on immutability if you’d like to learn more.