Join Newsletter
Trusted Business Advisors, Expert Technology Analysts

Taneja Blog

Taneja Blog / Data Protection/Management

Thoughts on Source-based Storage Capacity Optimization

I've had the chance recently to spend a lot of time talking to vendors about their source-based storage capacity optimization (SCO) options. In short, these are generally data de-duplication + compression technologies targeted for use with secondary storage that are done using host (i.e. backup client) resources so that the backup gets into capacity optimized form before it gets sent out across the WAN. The big benefit: ROBO backups can usually be done in a fraction of the time since you're sending a lot less data, and it offers similar advantages during restores (although the fact that most restores are at the object level means that the technology shines operationally more for backups than restores). The big knock: it takes host cycles, which can really affect application performance when backups are being done on-line.

Most of the majors have an offering in this space. EMC has integrated Avamar technology into the NetWorker client, Symantec offers PureDisk technology and is able to track data backed up by it in the NetBackup catalog, and IBM and CommVault recently announced source-based SCO technology for their respective flagship backup products (TSM and Simpana). VMware will also be entering the game this month with an interesting twist to the technology. Expect other enterprise backup software players to enter this space as well in the coming months. Target-based SCO options (e.g. storage target appliances like Data Domain's DD appliances, FalconStor's VTL, IBM's TS7650 (Diligent), Sepaton's S2100, etc.) are outselling source-based options not quite 10:1, but choosing the technology that's right for you depends on what your problem is.

What has struck me about these source-based offerings is that there are significant differences between them that really affect how much backup client overhead they impose. The whole reason to look at SCO technology in the first place is generally to reduce the amount of data that has to be handled, whether it's for WAN savings or better managing storage capacity in these times of explosive data growth, so capacity optimization ratios achieved are important. The solutions that tend to achieve the highest ratios usually operate at the sub-file level and leverage a global de-duplication repository (index) of some kind. But if the problem you need to solve is being caused by WAN bandwidth issues, that should drive you to look at source-based SCO options, and we'd encourage you to understand the overhead impact of the different options, since there are real differences. Obviously there are other issues you'll care about as well, such as application compatibility, reliability, cost, and how easy it is to integrate with what you're already doing in data protection. Look underneath the covers on these products to understand how they do what they do, and how that will impact backup client performance in your environment.


There are no comments to display. Scroll down to leave your own!


Leave a Comment

You must be logged in to comment. Click here to log in or register if you don't have an account.