New enterprise-grade file systems don’t come around very often. Over the last two decades we have seen very few show up; ZFS was introduced in 2004, Isilon’s OneFS in 2003, Lustre in 2001, and WAFL in 1992. There is a good reason behind this: the creation of a unique enterprise-grade file system is not a trivial undertaking and takes considerable resources and vision. During the last ten years, we have seen seismic changes in the data center and storage industry. Today’s data center runs a far different workload than what was prevalent when these first-generation file systems were developed. For example, today’s data center runs virtual machines, and it is the exception when there is a one-to-one correlation between a server and storage. Databases span the largest single disk drives. A huge amount of data is ingested by big data applications, social media. Data must be kept in order to meet the requirements of government and corporate policy. Technology has dramatically changed over the last decade. For instance, flash memory has become prevalent, commodity x86 processors now rival ASIC chips in power and performance, and new software development and delivery methodologies such as “agile” have become mainstream. In the past, we were concerned with how to deal with the underlying storage, but now we are concerned with how to deal with this huge amount of data that we have stored.
What could be accomplished if a new file system was created from the ground up to take advantage of the latest advances in technology and, more importantly, had an experienced engineering team that had done this once before? That is, in fact, what Qumulo has done with the Qumulo Core data-aware scale-out NAS software, powered by its new file system, QSFS (Qumulo Scalable File System). Qumulo’s three co-founders were the primary inventors of OneFS – Peter Godman, Neal Fachan, Aaron Passey – and they assembled some of the brightest minds in the storage industry to create a new modern file system designed to support the requirements of today’s datacenter, not the datacenter of decades ago.
Qumulo embraced from day one an agile software development and release model to create their product. This allows them to push out fully tested and certified releases every two weeks. Doing this allows for new feature releases and bug fixes that can be seamlessly introduced to the system as soon as they are ready – not based on an arbitrary 6, 12 or even 18-month release schedule.
Flash storage has radically changed the face of the storage industry. All of the major file systems in use today were designed to work with HDD devices that could produce 150 IOPS; if you were willing to sacrifice capacity and short-stroke them you might get twice that. Now flash is prevalent in the industry, and we have commodity flash devices that can produce up to 250,000 IOPs. Traditional file systems were optimized for slower HDD drives – not to take advantage of the lower latency and higher performance of today’s solid state drives. Many traditional file systems and storage arrays have devised ways to “bolt on” SSD devices to their storage to boost their performance. However, their initial architecture was based around the capabilities of yesterday’s HDD drives and not the capabilities of today’s new flash technology.
An explosion in scale-out large capacity file systems has empowered enterprises to do very interesting things, but has also come with some very interesting problems. Even one of the most trivial tasks—determining how much space the files on a file system are consuming—is very complicated to answer on first-generation file systems. Other questions that are difficult to answer without being aware of the data on a file system include finding out who is consuming the most space on a file system, and which clients and files or applications are consuming the most bandwidth. Second-generation file systems need to be designed to be data-aware, not just storage-aware.
In order to reach performance targets, traditional high performance storage arrays were designed to take advantage of an ASIC-optimized architecture. ASIC architecture is good in the sense that it can speed up performance for some storage related operations; however, this benefit comes at a heavy price – both in dollars and in flexibility. It can take years and millions of dollars to embed new features in an ASIC. By using very powerful and relative inexpensive x86 processors new features can be introduced very quickly via software. The slight performance advantage of ASIC-based storage is disappearing fast as x86 processors get more cores (Intel E5-2600 V3 has up to 18 cores) and advanced features.
When Qumulo approached us to take a look at the world’s first data-aware, scale-out enterprise-grade storage system we welcomed the opportunity. Qumulo’s new storage system is not based on an academic project or designed around an existing storage system, but has been designed and built on entirely new code that the principals at Qumulo developed based on what they learned in interviews with more than 600 storage professionals. What they came up with after these conversations was a new data-aware, scale-out NAS file system designed to take advantage of the latest advances in technology. We were interested in finding out how this file system would work in today’s data center.
The IT industry is in the middle of a massive transition toward simplification and efficiency around managing on-premise infrastructure at today’s enterprise data centers. In the past few years there has been a rampant onset of technology clearly focused at simplifying and radically changing the economics of traditional enterprise infrastructure. These technologies include Public/Private Clouds, Converged Infrastructure, and Integrated Systems to name a few. All of these technologies are geared to provide more efficiency of resources, take less time to administer, all at a reduced TCO. However, these technologies all rely on efficiency and simplicity of the underlying technologies of Compute, Network, and Storage. Often times the overall solution is only as good as the weakest link in the chain. The storage tier of the traditional infrastructure stack is often considered the most complex to manage.
This technology validation focuses on measuring the efficiency and management simplicity by comparing two industry leading mid-range external storage arrays configured in the use case of unified storage. Unified storage has been a popular approach to storage subsystems that consolidates both file access and block access within a single external array thus being able to share the same precious drive capacity resources across both protocols simultaneously. Businesses value the ability to send server workloads down a high performance low latency block protocol while still taking advantage of simplicity and ease of sharing file protocols to various clients. In the past businesses would have either setup a separate file server in front of their block array or buy completely separate NAS devices, thus possibly over buying storage resource and adding complexity. Unified storage takes care of this by providing ease of managing one storage device for all business workload needs. In this study we compared the attributes of storage efficiency and ease of managing and monitoring an EMC VNX unified array versus an HP 3PAR StoreServ unified array. The approach we used was to setup two arrays side-by-side and recorded the actual complexity of managing each array for file and block access, per the documents and guides provided for each product. We also went through the exercise of sizing various arrays via publicly available configuration guides to see what the expected storage density efficiency would be for some typically configured systems.
Our conclusion was nothing short of astonishment. In the case of the EMC VNX2 technology, the approach to unification more closely resembles a hardware packaging and management veneer approach than what would have been expected for a second generation unified storage system. HP 3PAR StoreServ on the other hand, in its second generation of unified storage has transitioned the file protocol services from external controllers to completely converged block and file services within the common array controllers. In addition, all the data path and control plumbing is completely internal as well with no need to wire loop back cables between controllers. HP has also made the investment to create a totally new management paradigm based on the HP OneView management architecture, which radically simplifies the administrative approach to managing infrastructure. After performing this technology validation we can state with confidence that HP 3PAR StoreServ 7400c is 2X easier to provision, 2X easier to monitor, and up to 2X more data density efficient than a similarly configured EMC VNX 5600.
Scale Computing was an early proponent of hyperconverged appliances and is one of the innovators in this marketplace. Since the release of Scale Computing’s first hyperconverged appliance, many others have come to embrace the elegance of having storage and compute functionality combined on a single server. Even the virtualization juggernaut VMware has seen the benefits of abstracting, pooling, and running storage and compute on shared commodity hardware. VMware’s current hyperconverged storage initiative, VMware Virtual SAN, seems to be gaining traction in the marketplace. We thought it would be an interesting exercise to compare and contrast Scale Computing’s hyperconverged appliance to a hyperconverged solution built around VMware Virtual SAN. Before we delve into this exercise, however, let’s go over a little background history on the topic.
Taneja Group defines hyperconvergence as the integration of multiple previously separate IT domains into one system in order to serve up an entire IT infrastructure from a single device or system. This means that hyperconverged systems contain all IT infrastructure—networking, compute and storage—while promising to preserve the adaptability of the best traditional IT approaches. Such capability implies an architecture built for seamless and easy scaling over time, in a "grow as needed” fashion.
Scale Computing got its start with scale-out storage appliances and has since morphed these into a hyperconverged appliance—HC3. HC3 was the natural evolution of its well-regarded line of scale-out storage appliances, which includes both a hypervisor and a virtual infrastructure manager. HC3’s strong suit is its ease of use and affordability. The product has seen tremendous growth and now has over 900 deployments.
VMware got its start with compute virtualization software and is by far the largest virtualization company in the world. VMware has always been a software company, and takes pride in its hardware agnosticism. VMware’s first attempt to combine shared direct-attached storage (DAS) storage and compute on the same server resulted in a product called “VMware vSphere Storage Appliance” (VSA), which was released in June of 2011. VSA had many limitations and didn’t seem to gain traction in the marketplace and reached its end of availability (EOA) in June of 2014. VMware’s second attempt, VMware Virtual SAN (VSAN), which was announced at VMworld in 2013, shows a lot of promise and seems to be gaining acceptance, with over 300 paying customers using the product. We will be comparing VMware Virtual SAN to Scale Computing’s hyperconverged appliance, HC3, in this paper.
Here we have two companies: Scale Computing, which has transformed from an early innovator in scale-out storage to a company that provides a hyperconverged appliance; and VMware, which was an early innovator in compute virtualization and since has transformed into a company that provides the software needed to create build-your-own hyperconverged systems. We looked deeply into both systems (HC3 and VSAN) and walked both through a series of exercises to see how they compare. We aimed this review at what we consider a sweet spot for these products: small to medium-sized enterprises with limited dedicated IT staff and a limited budget. After spending time with these two solutions, and probing various facets of them, we came up with some strong conclusions about their ability to provide an affordable, easy to use, scalable solution for this market.
The observations we have made for both products are based on hands-on testing both in our lab and on-site at Scale Computing’s facility in Indianapolis, Indiana. Although we talk about performance in general terms, we do not, and you should not, construe this to be a benchmarking test. We have, in good faith, verified all conclusions made around any timing issues. Moreover, the numbers that we are using are generalities that we believe are widely known and accepted in the virtualization community.
Consolidation and enhanced management enabled by virtualization has revolutionized the practice of IT around the world over the past few years. By abstracting compute from the underlying hardware systems, and enabling oversubscription of physical systems by virtual workloads, IT has been able to pack more systems into the data center than before. Moreover, for the first time in seemingly decades, IT has also taken a serious leap ahead in management, as this same virtual infrastructure has wrapped the virtualized workload with better capabilities than ever before - tools like increased visibility, fast provisioning, enhanced cloning, and better data protection. The net result has been a serious increase in overall IT efficiency.
But not all is love and roses with the virtual infrastructure. In the face of serious benefits and consequent rampant adoption, virtualization continues to advance and bring about more capability. All too often, an increase in capability has come at the cost of complexity. Virtualization now promises to do everything from serving up compute instances, to providing network infrastructure and network security, to enabling private clouds.
For certain, much of this complexity exists between the individual physical infrastructures that IT must touch, and the simultaneous duplication that virtualization often brings into the picture. Virtual and physical networks must now be integrated, the relationship between virtual and physical servers must be tracked, and the administrator can barely answer with certainty whether key storage functions, like snapshots, should be managed on physical storage systems or in the virtual infrastructure.
With challenges surrounding the complexity in managing a virtualized datacenter, Scale Computing, long a provider of scale-out storage, introduced a new line of hyperconverged appliances - HC3 in April, 2012 and updated the appliances with the new HyperCore software in May, 2014. HC3 is an integration of storage and virtualized compute within a scale-out building block architecture that couples all of the elements of a virtual data center together inside a hyperconverged appliance. The result is a system that is simple to use and does away with much of the complexity associated with virtualization in the data center. By virtualizing and intermingling compute and storage inside a system that is designed for scale-out, HC3 does away with the need to manage virtual networks, assemble complex compute clusters, provision and manage storage, and a bevy of other day to day administrative tasks. Provisioning additional resources - any resource - becomes one-click-easy, and adding more physical resources as the business grows is reduced to a simple 2-minute exercise.
While this sounds compelling on the surface, Taneja Group recently turned our Technology Validation service - our hands-on lab service - to the task of evaluating whether Scale Computing's HC3 could deliver on these promises in the real world. For this task, we put an HC3 cluster through the paces to see how well it deployed, how it held up under use, and what special features it delivered that might go beyond the features found in traditional integrations of discreet compute and storage systems.
Storage performance has long been the bane of the enterprise infrastructure. Fortunately, in the past couple of years, solid-state technologies have allowed new comers as well as established storage vendors to start shaping up clever, cost effective, and highly efficient storage solutions that unlock greater storage performance. It is our opinion that the most innovative of these solutions are the ones that require no real alteration in the storage infrastructure, nor a change in data management and protection practices.
This is entirely possible with server-side caching solutions today. Server-side caching solutions typically use either PCIe solid-state NAND Flash or SAS/SATA SSDs installed in the server alongside a hardware or software IO handler component that mirrors commonly utilized data blocks onto the local high speed solid-state storage. Then the IO handler redirects server requests for data blocks to those local copies that are served up with lower latency (microseconds instead of milliseconds) and greater bandwidth than the original backend storage. Since data is simply cached, instead of moved, the solution is transparent to the infrastructure. Data remains consolidated on the same enterprise infrastructure, and all of the original data management practices – such as snapshots and backup – still work. Moreover, server-side caches can actually offload IO from the backend storage system, and can allow a single storage system to effectively serve many more clients. Clearly there’s tremendous potential value in a solution that can be transparently inserted into the infrastructure and address storage performance problems.
The branch office has long been a critical dilemma for the IT organization. Branch offices for many organizations are a critical point of productivity and revenue generation, yet the branch has always come with a tremendous amount of operational overhead and risk. Worse yet, challenges are often exacerbated because the branch office too often looks like a carryover of outdated IT practices.
More often than not, the branch office is still a highly manual, human-effort-driven administration exercise. Physical equipment too often sits at a remote physical office, and requires significant human management and intervention for activities like data protection and recovery, or replacement of failed hardware. Given the remote nature of the branch office, such human intervention often comes with significant overhead in the form of telephone support, less than efficient over-the-wire system configuration, equipment build and ship processes, or even significant travel to remote locations. Moreover, in an attempt to avoid issues, the branch office is often over-provisioned with equipment in order to reduce the impact of outages, or is designed in such a way as to be too dependent on across the Wide Area Network (WAN) services that impair user productivity and simply exchange the risk of equipment failure for the risk of WAN outage. But while such practices come with significant operational cost, there’s a subtler cost lurking below the surface – any branch office outage is enmeshed in data consequences. Data protection may be a slower process for the branch office, subjecting the branch to greater risks with equipment failure or disaster, and restoring branch office data and productivity after a disaster can be a long slow process compared to the capabilities of the modern datacenter.
When branch offices are a key part of a business, these practices that are routinely accepted as the standard can make the branch office one of the costliest and riskiest areas of the IT infrastructure. Worse yet, for many enterprises, the branch office has only increased its importance over time, and may generate more revenue and require more responsive and available IT systems than ever before. The branch office clearly requires better agility and efficiency than it receives today.
Riverbed Technologies has long proven their mettle in helping enterprises optimize and better enable connectivity and data sharing for distributed work teams. Over the past decade, Riverbed has come to dominate the market for WAN optimization technologies that compress data and optimize the connection between branch or remote offices and the datacenter. But Riverbed rose to this position of dominance because their SteelHead appliances do far more than just optimize a connection – Riverbed’s dominance of this market sprung from deep collaboration and interaction optimization of CIFS/SMB and other protocols by way of intelligent interception and caching of the right data to make the remote experience feel like a local experience. Moreover, Riverbed SteelHead could do this while making that remote connection effectively stateless, and eliminating the need to protect or manage data in the branch office.
Almost two years ago, Riverbed announced a continuing evolution of their “location independent computing” focus with the introduction of their SteelFusion family of solutions. The vision behind SteelFusion was a focus on delivering far more performance and capability in branch offices, while doing away with the complexity of multiple component parts and scattered data. SteelFusion does this by transforming the branch office into a stateless “projection” of data, applications, and VMs stored in the datacenter. Moreover, SteelFusion does this with a converged solution that combines storage, networking, and compute all in one device – the first comprehensive converged infrastructure solution purpose-built for the branch. This converged offering though, is built on branch office “statelessness” that, as we’ll review, transparently stores data in the datacenter, and allows the business to configure, change, protect, and manage the branch office with enterprise tools, while eradicating the risk associated with traditional branch office infrastructure.
SteelFusion today does this by virtualizing VMware ESXi VMs on a stateless appliance that in essence “projects” data from the datacenter to a remote location, while maintaining localized speed of access and resilient availability that can tolerate even severe network outages. Three innovative technology components that make up Riverbed’s SteelFusion allow it to host virtual machines that access their primary data via the datacenter, from where it is cached on the SteelFusion appliance while maintaining a highly efficient but near synchronous connection back to the datacenter storage. In turn, SteelFusion makes it possible to run many local applications in a rich, complex branch office while requiring no other servers or devices. Riverbed promises that SteelFusion’s architecture can tolerate outages, but synchronize data so effectively that it will operate as a stateless appliance, enabling branch data to be completely protected by datacenter synchronization and backup, with more up to date protection and faster recovery regardless of whether there’s a loss of a single file, or the loss of an entire system. In short, this is a promise to comprehensively revolutionize the practice of branch office IT.
In January of 2014, Taneja Group took a deeper look at what Riverbed is doing with SteelFusion. While we’ve provided other written assessments on the use case and value of Riverbed SteelFusion, we also wanted to take a hands-on look at how the technology works, and whether in real world use it really delivers management effort reductions, availability improvements, and increased IT capabilities along with consequent improvements in the risks around branch office IT. To do this, we turned to a hands-on lab exercise – what we call a Technology Validation.
What did we find? We found that Riverbed SteelFusion does indeed deliver a transformation of branch office management and capabilities, by fundamentally reducing complexity, injecting a number of powerful capabilities (such as enterprise snapshots and access to all data, copies, and tools in the enterprise) and making the branch office resilient, constantly protected, and instantly recoverable. While the change in capabilities is significant, this also translates into a significant impact on time and effort, and we captured a number of metrics throughout our hands-on look at SteelFusion. For the details, we turn to the full report.