Data Protection: 20th Century Storage Architectures Won’t Solve 21st Century Storage Problems
Don’t be fooled by GUI Blankets
Data Protection marketplace is undergoing sea change right before our eyes. All the legacy players, including Symantec, IBM Tivoli, EMC NetWorker, CommVault, are feverishly trying to bring their decades-old architectures to the 21st Century. For at least a decade, I have been saying the old method of doing full and incremental backups has to go. It is archaic and smacks of the 70’s. Why a full backup must be done every week, moving all data across the application server, into the network, then into the backup server before placing it on tape or disk, when 90%+ of that data is the same as last week’s, has never made much sense to me. Ditto with incrementals. Why move an entire 2MB file across the network when only four bytes have changed? The inefficiencies have been mind boggling. Regardless, that is how it has been for three decades. Granted there have been improvements over the past several years. Data deduplication has been a godsend. Now we dedupe the original full at a sub-file level and only keep one copy of each “chunk” in the backup. But we still lug all that data across the network and only dedupe it at the target (so network efficiency remains sub-optimal). Of course, with onset of source level deduplication, we have been able to positively impact the network traffic too. But most of the world is still doing target-based deduplication. Regardless, all these have been positive movements towards the goal of simplifying data protection and ultimately making it disappear entirely as a task to be performed by IT administrators. But wait. Just as we were making these positive strides something else changed the nature of the problem.
Virtualization, Big Data, Cloud Play Havoc
As if from nowhere came the Big Data tsunami, the cloud and server virtualization. The latter alone has played havoc on legacy data protection players. Their initial attempt to employ the same techniques as they did on physical servers, on to virtual machines, backfired badly, with many IT shops not backing up their virtual infrastructures for days for fear of destroying application performance for hours or days. This spawned an entirely new generation of startups that looked at the fundamental differences brought about by a hypervisor and developed a product that operated at the hypervisor, rather than the OS level. Some of these new companies have established quite a beachhead and are growly rapidly, simply because legacy products have been slow to make fundamental changes to their product lines. We are finding enterprises of all sizes that have had strategic partnerships with a legacy vendor or two for decades, buying these “specialty” products from relatively unknown players simply because they felt they had no other options.
Onslaught of New Players
On the other end of the spectrum, a variety of new players have come on the scene, focusing entirely on delivering data protection in the cloud. Some give you a choice of installing local protection and cloud protection and some have focused on bypassing local protection altogether and only deliver protection in the cloud. These methods have appealed to small to the mid-size companies as they get a semblance of DR to come with it, something that many couldn’t afford before.
I consider all these attempts to be forward movements for the industry. But I still believe the basic way of looking at the data protection problem is wrong. In an ideal environment, I would move the original data, in the most condensed fashion, to “protection storage” only once. After that, only changes to data would be moved to protection storage. The concept of a backup window completely disappears in this vision. Nothing is scheduled except how many application-consistent snapshots one wants to create and what level of protection one wants for each (in other words, each application’s protection is SLA driven). In this vision the protection server “mounts” the required image instantly and there is no requirement to “recover a volume.” The required image is mountable like a storage volume is. Recovery is instantaneous. In fact there is no concept of recovery, as we know it today. The method applies to virtual servers, physical servers, cloud or whatever else we can dream up next. Cloud would just be another type of “protection storage.” This type of solution would be implementable as an enterprise solution or as an MSP solution (to serve many customers) or a mix of the two (an enterprise IT shop acting like an MSP for its internal constituents). Replication would be initiated from the same “protection storage” and use all the efficiency-enhancing techniques of WAN Optimization. All copies of data required by the enterprise, whether for recovering data, or to use in a test/development or support environment, would originate from one common pool of “protected storage”, bringing in incredible efficiencies to storage.
Taneja Group described this vision in a 2006 paper titled, “Continuous Data Technologies: A New Paradigm.” A few vendors tried but failed to implement such a vision, mostly because storage virtualization as a technology was still immature and compute power needed to make the solution real, was not quite there yet. Most recently, we updated this vision and described it in a paper titled, “Today’s Data Protection Business Dilemma and Associated Challenges”. This paper was published in InfoStor in February 2012.
In my view, all data protection companies, legacy or new, are chasing this dream right now. The new virtual server focused players have applied many of these principles but solved the problem only for one environment. Others have solved the problem partially but for both physical and virtual server environments. And I believe each legacy player is either re-architecting their entire data protection portfolio or eyeing one or more of the new players, in order to be relevant in the 21st Century. Dell buying AppAssure is one such example.
The Vendor Battle Is On
In the meantime, as I have stated many times in the past year, one company, Actifio, exemplifies, the most complete implementation of our vision to date. And by all measures they are receiving great reception in the marketplace. But another phenomenon is taking place as we speak. Some of the legacy players, most notably CommVault, are presenting themselves, as an incarnation of this vision. Granted, relative to other legacy players, CommVault, as I see it, has done a great job of maintaining consistency between various elements of their product portfolio. This has given them leverage against their larger competitors, who have assembled a potpourri of products, some developed in-house, others purchased, yet others OEM’d, to solve the backup, recovery, snapshot, replication, cloning, etc. problems. CommVault has over the years resisted the temptation of buying products in the open market and have religiously developed their own. This has given them a common repository and a common method to feed and extract information from it.
However, not for a second do I assume that CommVault meets the majority of requirements we described in our vision. And neither do most of the other legacy vendors. CommVault’s architecture is still almost two decades old. You still have to perform an act of “recovery” with their solution. Data deduplication, compression, etc. are all tacked on, as opposed to being inherently part of the architecture. Copy making is still at the heart of everything they do. No surprise here. That is how 20thCentury architectures did it. And being relatively better than other legacy players is not the same thing as meeting all the requirements of this decade and the next. For that you need to look at architectures designed from scratch for the world of virtualization, Big Data and the Cloud.
For if you don’t your data protection across the enterprise will still look like a quilt that grandma knitted. Just as it does today. Even if it has a GUI blanket thrown on top of it.