Includes Backup/Recovery, Archiving, DPM, VTL, CDP, Data De-duplication, DRM.
Data is the lifeblood of an enterprise. And yet data has been protected in essentially the same fashion over the past two decades, i.e. by backing it up to tape and sending the tapes offsite. This method alone is no longer adequate and a spade of new technologies has become available in the last five years. These new technologies are already transforming the way data is protected, how long it is kept online, how it is archived. Recovery Management has emerged as a new discipline focused on recovering data rather than copying data. The new compliance requirements are essentially requiring companies of all sizes to upgrade their data protection infrastructures or be subject to huge fines. The level of innovation in this space is torrid. Taneja Group covers this space from end to end and has defined many of the new categories that now are considered the norm. The analysts that cover this space have deep industry backgrounds in developing and marketing these technologies.
Over the past few years, backup has become a busy market. For the first time in many years, a new wave of energy hit this market as small innovators sprang forth to try to tackle pressing challenges around virtual server backup. The market has taken off because of a unique set of challenges and simultaneous opportunities within the virtual infrastructure – with large amounts of highly similar data, interesting APIs for automation, and a uniquely limited set of IO and processing resources, the data behind the virtual server can be captured and protected in unique new ways. As innovators in turn attacked these opportunities, backup has been fundamentally changed. In many cases, backup has been put in the hands of the virtual infrastructure administrator, made lighter weight and vastly more accessible, and has become a powerful tool for data protection and data management.
In reality, the innovations with virtual backup have leveraged the unifying layer of virtualization to tackle several key backup challenges. These challenges have been long-standing in the practice of data protection, and include ever-tightening backup windows, ever more demanding recovery point objectives (RPO or the amount of tolerable data loss when recovering), short recovery time objectives (RTO or how long it takes to complete a recovery), recovery reliability, and complexity. Specialized data protection for the virtual infrastructure has made enormous progress in tackling these challenges, and simplifying the practice of data protection to boot.
But we’ve often wondered what it would take to bring the innovation from virtual infrastructure protection to a full-fledged backup product that could tackle both physical and virtual systems. At the recent request of Dell, Taneja Group Labs had the opportunity to look at just such a product. That product is AppAssure – a set of technology that seems destined to be the future architectural anchor for the many data protection technologies in Dell’s rapidly growing product portfolio. We jumped at the chance to run AppAssure through the paces in a hands-on exercise, as we wanted to see whether AppAssure had an architecture that might be poised to change how datacenter-wide protection is typically done, perhaps by making it more agile and accessible.
Microsoft officially acquired StorSimple on November 15, 2012. StorSimple was a relative startup that had been shipping products for about 18 months. Why did Microsoft buy StorSimple? What is the strategy behind the purchase? Where will Microsoft take this newly acquired technology? These are many of the questions we are being asked at present. Here is our view....
Taneja Group and InfoStor jointly ran a survey asking IT managers about their big data experiences and roadmaps. We concluded that there is a great deal of uncertainty around big data: what it is, how to manage it, and if it is even in the IT domain rather than specialized application administrators.
Storing and managing large volumes of data certainly involves IT. However, “big data” is its own class: large data sets that are subjected to ongoing analytics and/or massive re-use. Some big data is structured into databases; most of it is unstructured. Big data operations continuously act upon large and growing volumes of data, which generates fast and frequent data movement between servers, networks and storage. Big data analytics in particular need fast and large feedback loops for decision-making as the specialized software tools analyze and reform data into a variety of views, reports and reformed data sets.
IT is rarely involved at the analytics administration level, but they are very involved at the storage level. Big data needs both high capacity and high performance, which requires storage with high capacity disk and the ability to process storage IO very quickly. It must also be highly available since big data by definition is active and important data. And it should be cost-effective as well, though it will not inexpensive.
[Taneja Group discusses scale-out storage as a best practice solution to big data analytics in our report: “Big Data, Big Storage: Scale-Out NAS for Big Data Environments.” (http://bit.ly/UGCVjm)]
Big data means different things to different people. A database administrator might insist that big data is large databases; a 100-server SharePoint administrator might classify content blobs as big data; a storage administrator in a hospital radiology lab may define big data as digitized x-rays times 100,000 yearly patients. In fact they are all right: each administrator’s data is large, active, and must be kept protected and highly available to applications. In other words, big data.
It is the business units’ responsibility to decide how to use and analyze this data; it is IT’s lookout to store the data in a way that provides required service levels of availability and performance. IT frequently turns to NAS to do this, citing its familiarity, file-based architecture and general ease of use. However, traditional NAS’s very simplicity can impact its usability for big data growth and capacity needs. Given fast data growth and more active data than ever before, this model soon disintegrates into poorly managed storage sprawl and forced data migrations in the name of balancing workloads.
There are several storage choices for big data depending on your big data environment: projected growth, data types, performance, capacity and scalability. One excellent option for many big data storage environments is scale-out NAS. This report will briefly discuss scale-out and suggest important questions to ask when researching vendors.
This paper examines CTERA’s storage and data protection solution for large-scale remote and branch offices (ROBOs), and demonstrates its fundamental advantages over alternative approaches, including a real-world customer example and comparative cost assessment.
If you are evaluating big data storage solutions for your enterprise or mid-sized company, Taneja Group has identified 5 strategic questions that you should ask your vendors during the evaluation process. In this Technology-in-Depth, we’ll review these five questions and look to one specific solution in the market – DataDirect Networks’ (DDN’s) GRIDScaler and the democratization of Big Data.