Digital Preservation & the Library of Congress
I recently had the privilege of addressing a meeting at the Library of Congress on the preservation of digital information. The participants included archivists and IT leaders responsible for digital archiving at universities, nonprofits, and government agencies across the country, as well as leading thinkers and solution providers in this space. As the day went on, it became clear to me that digital preservation, the ability to accurately render, store, and access authenticated content over time, is still very much in its infancy. There was much debate about the best strategies to pursue, and clearly there is still a lot of experimentation taking place.
A number of themes emerged. First, it’s a lot easier to create digital information than it is to create physical or paper-based information. Consequently, the volume and velocity at which new digital-native content is being generated today is creating challenges of cost and scale that few archivists are prepared to handle. Just think how much data would be involved in archiving every television news show ever recorded, for example.
So, it was no surprise that the high cost of highly reliable storage was front-and-center at this conference. IT budgets are taking an increasingly higher percentage of total budgets as the number of physical documents decreases relative to digital data. Even so, from my perspective, the amount of money that people are willing to spend on digital preservation is fairly flat over time. The budgets for institutions like the Library of Congress are constantly under pressure and IT directors are struggling to reconcile the urgent need to preserve an ever-growing mountain of digital archives against the substantial cost of doing so. It would be great to save everything, but economically that’s just not possible. Given the cost of storage, the very people tasked with preserving our historical, cultural, and scientific heritage are forced to classify large volumes of work or information as not worth saving.
There were several presentations by IT folks who were doing their best to rollout and maintain their own storage systems in order to scale and lower overall costs. Then, of course, there were cloud storage conversations, and tactical discussions concerning what to archive in cold storage versus what was worth premium prices for the ability to make content readily available to the public. Both were music to my ears, as our Wasabi hot storage can help IT and archivists with both issues. The average cost of storing a petabyte of data with Wasabi is typically less than the maintenance costs alone for on-premise storage. And because Wasabi is a “single tier” cloud storage offering that is lower than the cost of cold storage but six times faster than high-performance tiers, such as Amazon S3 standard, archivists no longer need to decide what to make available and what to lock away . . . or throw away.
Data immutability, a feature of Wasabi about which I am quite passionate, is also right in the sweet spot for this audience. They are not into erasing things, so the idea that you can do a better job of protecting digital archives from accidental or malicious destruction makes products like Wasabi a perfect fit for digital archiving.
There’s still a lot of use of tape in the digital archiving world. It’s actually a very good medium for digital preservation because tapes last a long time. So, over the life of a tape, storage is cheap. However, unless you keep your tapes in a robotic library, they are very expensive and slow to access since they would have to be handled by humans. And the cost of a robotic library makes the whole tape proposition a little less appealing relative to disk-based systems.
There was also lot of talk about checking your archives to make sure they were still readable and that data was not deteriorating. A show of hands indicated that most attendees were running check sums once a year or so to determine if their archives were still free of errors. Wasabi reads every file every 90 days, and automatically corrects any errors, so that’s another reason why I think Wasabi is a great fit for digital preservation.
Finally, I have to report on a bit of serendipity. The speaker just before me was a product manager for Amazon S3. Perhaps knowing that Wasabi, the new cloud storage kid on the block, was speaking next, he started by explaining how Amazon S3 is now 11 years old and has a proven track record. That was a perfect setup for me, because my speech started by explaining that Amazon S3 is based on 11-year-old technology and a lot has changed to make next-generation Cloud 2.0 products like Wasabi hot storage possible. The audience was amused. 🙂