the bucket

Why Keep a Second Copy of your Data in the Cloud?

Why Keep a Second Copy of your Data in the Cloud?

Jim Donovan
By Jim Donovan
Senior Vice President, Product

January 18, 2018

There’s an old maxim in the backup world that “you can’t have too many copies of your data.” While that’s clearly an exaggeration, the sentiment is borne by bitter, real-world experience. And if the cost of storing a data set is a tiny fraction of what the data is worth, why not keep multiple copies just in case?

In this blog post, I’d like to review the reasons IT professionals want to maintain multiple copies of their data and how the emergence of cloud storage, especially from the new “cloud storage 2.0” vendors like Wasabi, fits into the picture.

Data Loss Probabilities

There are two primary reasons for keeping extra copies of data. The first, of course, is to prevent its loss. Any data storage device, whether on-premises or in the cloud, could malfunction and lose data. This possibility is represented by a value known as “durability” and is usually expressed in percentages. For example, a Dell-EMC storage server might have a durability of “eight nines,” meaning 99.999999 percent of stored files (or objects) will survive a full year intact. Another way to look at the example of eight nines durability is that one out of every 100,000,000 objects will be lost each year.

Losing one file, however, is not the usual failure mode of modern storage servers. Redundancy techniques, such as erasure coding, make multiple copies of the data, so random bit errors that occur on individual disks are quickly detected and corrected. What does happen is a catastrophic loss of a lot of data all at once. This can be due to physical disasters such as fires and floods. But software failures can also cause the loss of an entire array of disk drives.

This is a well-known phenomenon with RAID arrays. One of the most common involve “soft errors,” meaning disk errors that go undetected by the RAID software. Once too many such errors accumulate, it is possible to render an entire RAID array unusable. While such occurrences are rare, because they would involve the simultaneous failure of multiple disks, they are disastrous when they do happen. So depending on your tolerance for such risk, it's generally worth having a completely separate copy of the data in a different location.

Even more concerning would be the destruction of an entire data center due to fire, flood, terrorism, or earthquake. In this case, all your data at that site would be lost. If the lost data is a backup and not the original working copy of the data, the loss might not be catastrophic. But a production system with no backup is very precarious. You have to hope that there is no problem with the production system before you can make a complete backup elsewhere.

Keeping Data Available

The other reason to have a second copy of the data is to ensure availability. Entire data centers can go offline for several hours per year under common service-level agreements (SLAs). The causes can stem from power outages, maintenance, or interruptions in the Internet service to the building. For data such as backups, being offline a few hours a year may not be a problem. But if the idea of data being unavailable to users for several hours is completely unacceptable, then retaining a second copy of the data in another location makes sense. You can provide automatic failover in the event that the primary data center is offline. With two data centers and automatic replication between them, downtime is reduced to seconds per year on average.

Replicating data across two data centers also provides the option of accessing data from the location that provides the faster response time. For example, a common data replication strategy is to have data centers in separate regions of the country. Customers will get faster response times from the data center that is closest to them. Also, distributing the query load across two data centers can keep one data center from slowing down due to too much load.

Wasabi is an excellent choice as the destination for a second copy of data, whether your primary site is an in-house data center or Amazon’s S3. At $.0039/GB/month, Wasabi storage costs less than one-fifth the price of Amazon S3 storage. In-house storage is generally estimated to have a total cost of ownership of $.02-.03/GB/month, or roughly the same as Amazon S3. So for approximately 20 percent of the cost of the first copy of the data, you could have a second copy in Wasabi’s cloud. In fact, when Wasabi’s West Coast data center is completed in Q1 2018, Wasabi will offer real-time replication to both sides of North America, providing both a second and third copy of the data for those who need it.

Avoid Replicating ‘Bad’ Data

Wasabi offers “immutable buckets,” meaning that data, once written, cannot be deleted or modified. Some data replication strategies actually mirror any changes in the primary data to the secondary location. A good example is Dropbox. Dropbox replicates data across multiple machines. However, if data is accidentally modified, deleted, or overwritten in one location, the accident will propagate, permanently damaging all copies of the data. Immutable buckets make such accidental data destruction impossible.

Wasabi also makes sure that when using cloud storage for long-term file duration, the integrity of files remains intact. Our Active Integrity Checking function reads all stored objects and checks for integrity every 90 days for as long as the object is stored.

Because Wasabi is cheaper and faster than Amazon S3 (6x the speed, in fact), it provides a good way to offer access to stored data without anyone touching the primary copy. For example, a well-known university’s medical school generates very large amounts of genetic data. Scientists from around the world can access the second copy of that data, stored by Wasabi, without the privileges and credentials required for accessing the primary data. And since the Wasabi copy is stored in immutable buckets, there is no way for anyone to accidentally or maliciously alter the data.

We highly recommend retaining a second copy of any valuable data. The fact that you can now do so at a fraction of the price tilts the cost/benefit equation in favor of keeping that second copy.

the bucket
Jim Donovan
By Jim Donovan
Senior Vice President, Product