Includes Storage Arrays, NAS, File Systems, Clustered and Distributed File Systems, FC Switches/Directors, HBA, CNA, Routers, Components, Semiconductors, Server Blades.
Taneja Group analysts cover all form and manner of storage arrays, modular and monolithic, enterprise or SMB, large and small, general purpose or specialized. All components that make up the SAN, FC-based or iSCSI-based, and all forms of file servers, including NAS systems based on clustered or distributed file systems, are covered soup to nuts. Our analysts have deep backgrounds in file systems area in particular. Components such as Storage Network Processors, SAS Expanders, FC Controllers are covered here as well. Server Blades coverage straddles this section as well as the Infrastructure Management section above.
The healthcare industry continues to face tremendous cost challenges. The U.S. government estimates national health expenditures in the United States accounted for $3.2 trillion last year – nearly 18% of the country’s total GDP. There are many factors that drive up the cost of healthcare, such as the cost of new drug development and hospital readmissions. In addition, there’s compelling studies that show medical organizations will need to evolve their IT environment to curb healthcare costs and improve patient care in new ways, such as cloud-based healthcare models aimed at research community collaboration, coordinated care and remote healthcare delivery.
For example, Goldman Sachs recently predicted that the digital revolution can save $300 billion in spending in the healthcare sector by powering new patient options, such as home-based patient monitoring and patient self-management. Moreover, the most significant progress may come from a medical organization transforming their healthcare data infrastructure. Here’s why:
- Advancements in digital medical imaging has resulted in an explosion of data that sits in picture archiving and communications systems (PACS) and vendor neutral archives (VNAs).
- Patient care initiatives such as personalized medicine and genomics require storing, sharing and analyzing massive amounts of unstructured data.
- Regulations such as the Health Insurance Portability and Accountability Act (HIPAA) require organizations to have policies for long term image retention and business continuity.
Unfortunately, traditional file storage approaches aren’t well-suited to manage vast amounts of unstructured data and present several barriers to modernizing healthcare infrastructure. A recent Taneja Group survey found the top three challenges to be:
- Lack of flexibility: Traditional file storage appliances require dedicated hardware and don’t offer tight integration with collaborative cloud storage environments.
- Poor utilization: Traditional file storage requires too much storage capacity for system fault tolerance, which reduces usable storage.
- Inability to scale: Traditional storage solutions such as RAID-based arrays are gated by controllers and simply aren’t designed to easily expand to petabyte storage levels.
As a result, healthcare organizations are moving to object storage solutions that offer an architecture inherently designed for web scale storage environments. Specifically, object storage offers healthcare organizations the following advantages:
- Simplified management, hardware independence and a choice of deployment options – private, public or hybrid cloud – lowers operational and hardware storage costs
- Web-scale storage platform provides scale as needed and enables a pay as you go model
- Efficient fault tolerance protects against site failures, node failures and multiple disk failures
- Built in security protects against digital and physical breeches
For hospitals and medical research institutes, the ability to interpret genomics data and identify relevant therapies is key to provide better patient care through personalized medicine. Many such organizations are racing forward, analyzing patients’ genomic profiles to match more clinically actionable treatments using artificial intelligence (AI).
These rapid advancements in genomic research and personalized medicine are very exciting, but they are creating enormous data challenges for healthcare and life sciences organizations. High-throughput DNA sequencing machines can now process a human genome in a matter of hours at a cost approaching one thousand dollars. This is a huge drop from a cost of ten million dollars ten years ago and means the decline in genome sequencing cost has outpaced Moore’s Law (see chart). The result is an explosion in genomic data – driving the need for solutions that can affordably and securely store, access, share, analyze and archive enormous amounts of data in a timely manner.
Challenges include moving large volumes of genomic data from cost-effective archival storage to low latency storage for analysis to reduce the time needed to analyze genetic data. Currently, it takes days to do a comprehensive DNA sequence analysis.
Sharing and interpreting vast amounts of unstructured data to find relationships between a patient’s genetic characteristics and potential therapies adds another layer of complexity. Determining connections requires evaluating data across numerous unstructured data sources, such as genomic sequencing data, medical articles, drug information and clinical trial data from multiple sources.
Unfortunately, the traditional file storage within most medical organizations doesn’t meet the needs of modern genomics. These systems can’t accommodate massive amounts of unstructured data and they don’t support both data archival and high-performance compute. They also don’t facilitate broad collaboration. Today, organizations require a new approach to genomics storage, one that enables:
- Scalable and convenient cloud storage to accommodate rapid unstructured data growth
- Seamless integration between affordable unstructured data storage, low latency storage, high performance compute, big data analytics and a cognitive healthcare platform to quickly analyze and find relationships among complex life science data types
- A multi-tenant hybrid cloud to share and collaborate on sensitive patient data and findings
- Privacy and protection to support regulatory compliance
These days the world operates in real-time all the time. Whether making airline reservations or getting the best deal from an online retailer, data is expected to be up to date with the best information at your fingertips. Businesses are expected to meet this requirement, whether they sell products or services. Having this real-time, actionable information can dictate whether a business survives or dies. In-memory databases have become popular in these environments. The world's 24X7 real-time demands cannot wait for legacy ERP and CRM application rewrites. Companies such as SAP devised ways to integrate disparate databases by building a single super-fast uber-database that could operate with legacy infrastructure while simultaneously creating a new environment where real-time analytics and applications can flourish. These capabilities enable businesses to succeed in the modern age, giving forward-thinking companies a real edge in innovation.
SAP HANA is an example of an application environment that uses in-memory database technology and allows the processing of massive amounts of real-time data in a short time. The in-memory computing engine allows HANA to process data stored in RAM as opposed to reading it from a disk. At the heart of SAP HANA is a database that operates on both
OLAP and OLTP database workloads simultaneously. SAP HANA can be deployed on-premises or in the cloud. Originally, on-premises HANA was available only as a dedicated appliance. Recently SAP has expanded support to best in class components through their SAP Tailored Datacenter Integration (TDI) program. In this solution profile, Taneja Group examined the storage requirements needed for HANA TDI environments and evaluated storage alternatives including the HPE 3PAR StoreServ All Flash. We will make a strong case as to why all-flash arrays like the HPE 3PAR version are a great fit for SAP HANA solutions.
Why discuss storage for an in-memory database? The reason is simple: RAM loses its mind when the power goes off. This volatility means that persistent shared storage is at the heart of the HANA architecture for scalability, disaster tolerance, and data protection. The performance attributes of your shared storage dictate how many nodes you can cluster into a SAP HANA environment which in turn affects your business outcomes. Greater scalability capability means more real-time information is processed. SAP HANA workload shared storage requirements are write intensive with low latency for small files and sequential throughput performance for large files. However, the overall storage capacity is not extreme which makes this workload an ideal fit for all-flash arrays that can meet performance requirements with the smallest quantity of SSDs. Typically you would need 10X the equivalent spinning media drives just to meet the performance requirements, which then leaves you with a massive amount of capacity that cannot be used for other purposes.
In this study, we examined five leading all-flash arrays including the HPE 3PAR StoreServ 8450 All Flash. We found that the unique architecture of the 3PAR array could meet HANA workload requirements with up to 73% fewer SSDs, 76% less power, and 60% less rack space than the alternative AFAs we evaluated.
Is object storage right for your organization? Many companies are asking this question as they seek out storage solutions that support vast unstructured data growth throughout their organizations. Object storage is ideal for large-scale unstructured data storage because it easily scales to several petabytes and beyond by simply adding storage nodes. Object storage also provides high fault tolerance, simplified storage management and hardware independence – core capabilities that are essential to cost-effectively manage large-scale storage environments. Add to this built-in support for geographically distributed environments and it’s easy to see why object storage solutions are the preferred storage approach for multiple use cases such as cloud-native applications, highly scalable file backup, secure enterprise collaboration, active archival, content repositories and increasingly cognitive computing workloads such as Big Data analytics.
To help you decide if object storage is right for your company and to help you understand how to apply various storage technologies, we have created a table below that positions object storage relative to block storage and file storage.
As the table shows, there are several factors that differentiate block, file and object storage. An easy way to think about the differences is the following; block storage is necessary for critical applications where storage performance is the key consideration, file storage is well-suited for highly scalable shared file systems and object storage is ideal when cloud-scale capacity and convenience as well as reliability and geographically distributed access are the major storage requirements.
The storage market is truly changing for the better with new storage architectures finally breaking the rusty chains long imposed on IT by traditional monolithic arrays. Vast increases in CPU power found in newer generations of servers (and supported by ever faster networks) have now freed key storage functionality to run wherever it can best serve applications. This freedom has led to the rise of all software-defined storage (SDS) solutions that power modular HyperConverged infrastructure (HCI). At the same time, increasingly affordable flash resources have enabled all-flash array options that promise both OPEX simplification and inherent performance gains. Now, we see a further evolution of storage that intelligently converges performance-oriented storage functions on each server while avoiding major problems with HyperConverged “single appliance” adoption.
Given the market demand for better, more efficient storage solutions, especially those capable of large scale, low latency and mixed use, we are seeing a new generation of vendors like Datrium emerge. Datrium studied the key benefits that hyperconvergence previously brought to market including the leverage of server-side flash for cost-effective IO performance, but wanted to avoid the all-in transition and the risky “monoculture” that can result from vendor-specific HCI. Their resulting design runs compute-intensive IO tasks scaled-out on each local application server (similar to parts of SDS), but persists and fully protects data on cost-efficient, persistent shared storage capacity. We have come to refer to this optimizing tiered design approach as “Server Powered Storage” (SPS), indicating that it can take advantage of the best of both shared and server-side resources.
Ultimately this results in an “Open Convergence” approach that helps virtualized IT environments transition off of aging storage arrays in an easier, flexible and more natural adoption path than with a fork-lift HyperConvergence migration. In this report we will briefly review the challenges and benefits of traditional convergence with SANs, the rise of SDS and HCI appliances, and now this newer “open convergence” SPS approach as pioneered by Datrium DVX. In particular, we’ll review how Datrium offers benefits ranging from elastic performance, greater efficiency (with independent scaling of performance vs. capacity), VM-centric management, enterprise scalability and mixed workload support while still delivering on enterprise requirements for data resiliency and availability.
DATA Challenges in Virtualized Environments
Virtualized environments present a number of unique challenges for user data. In physical server environments, islands of storage were mapped uniquely to server hosts. While at scale that becomes expensive, isolating resources and requiring a lot of configuration management (all reasons to virtualize servers), this at least provided directly mapped relationships to follow when troubleshooting, scaling capacity, handling IO growth or addressing performance.
However, in the virtual server environment, the layers of virtual abstraction that help pool and share real resources also obfuscate and “mix up” where IO actually originates or flows, making it difficult to understand who is doing what. Worse, the hypervisor platform aggregates IO from different workloads hindering optimization and preventing prioritization. Hypervisors also tend to dynamically move virtual machines around a cluster to load balance servers. Fundamentally, server virtualization makes it hard to meet application storage requirements with traditional storage approaches.
Current Virtualization Data Management Landscape
Let’s briefly review the three current trends in virtualization infrastructure used to ramp up data services to serve demanding and increasingly larger scale clusters:
- Converged Infrastructure - with hybrid/All-Flash Arrays (AFA)
- HyperConverged Infrastructure - with Software Defined Storage (SDS)
- Open Converged Infrastructure - with Server Powered Storage (SPS)
Converged Infrastructure - Hybrid and All-Flash Storage Arrays (AFA)
We first note that converged infrastructure solutions simply pre-package and rack traditional arrays with traditional virtualization cluster hosts. The traditional SAN provides well-proven and trusted enterprise storage. The primary added value of converged solutions is in a faster time-to-deploy for a new cluster or application. However, ongoing storage challenges and pain points remain the same as in un-converged clusters (despite claims of converged management as these tend to just aggregate dashboards into a single view).
The traditional array provides shared storage from which virtual machines draw for both images and data, either across Fibre Channel or IP network (NAS or iSCSI). While many SAN’s in the hands of an experienced storage admin can be highly configurable, they do require specific expertise to administer. Almost every traditional array has by now become effectively hybrid, capable of hosting various amounts of flash, but if the array isn’t fully engineered for flash it is not going to be an optimal choice for an expensive flash investment. Hybrid arrays can offer good performance for the portion of IO that receives flash acceleration, but network latencies are far larger than most gains. Worse, it is impossible for a remote SAN to know which IO coming from a virtualized host should be cached/prioritized (in flash)– it all looks the same and is blended together by the time it hits the array.
Some organizations deploy even more costly all-flash arrays, which can guarantee array-side performance for all IO and promise to simplify administration overhead. For a single key workload, a dedicated AFA can deliver great performance. However, we note that virtual clusters mostly host mixed workloads, many of which don’t or won’t benefit from the expensive cost of persisting all data on all flash array storage. Bottomline - from a financial perspective, SAN flash is always more expensive than server-side flash. And by placing flash remote across a network in the SAN, there is always a relatively large network latency which denigrates the benefit of that array side flash investment.
HyperConverged Infrastructures - Software Defined Storage (SDS)
As faster resources like flash, especially added to servers directly, came down in price, so-called Software Defined Storage (SDS) options proliferated. Because CPU power has continuously grown faster and denser over the years, many traditional arrays came to be actually built on plain servers running custom storage operating systems. The resulting storage “software” often now is packaged as a more cost-effective “software-defined” solution that can be run or converged directly on servers (although we note most IT shops prefer buying ready-to-run solutions, not software requiring on-site integration).
In most cases software-defined storage runs within virtual machines or containers such that storage services can be hosted on the same servers as compute workloads (e.g. VMware VSAN). An IO hungry application accessing local storage services can get excellent IO service (i.e. no network latency), but capacity planning and performance tuning in these co-hosted infrastructures can be exceedingly difficult. Acceptable solutions must provide tremendous insight or complex QoS facilities that can dynamically shift IO acceleration with workloads as they might move across a cluster (eg. to keep data access local). Additionally, there is often a huge increase in East-West traffic between servers.
Software Defined Storage enabled a new kind of HyperConverged Infrastructure (HCI). Hyperconvergence vendors produce modular appliances in which a hypervisor (or container management), networking and (software-defined) storage all are pre-integrated to run within the same server. Because of vendor-specific storage, network, and compute integration, HCI solutions can offer uniquely optimized IO paths with plug-and-play scalability for certain types of workloads (e.g. VDI).
For highly virtualized IT shops, HCI simplifies many infrastructure admin responsibilities. But HCI presents new challenges too, not least among them is that migration to HCI requires a complete forklift turnover of all infrastructure. Converting all of your IT infrastructure to a unique vendor appliance creates a “full stack” single vendor lock-in issue (and increased risk due to lowered infrastructure “diversity”).
As server-side flash is cheaper than other flash deployment options, and servers themselves are commodity resources, HCI does help optimize the total return on infrastructure CAPEX – especially as compared to traditional silo’d server and SAN architectures. But because of the locked-down vendor appliance modularity, it can be difficult to scale storage independently from compute when needed (or even just storage performance from storage capacity). Obviously, pre-configured HCI vendor SKU’s also preclude using existing hardware or taking advantage of blade-type solutions.
With HCI, every node is also a storage node which at scale can have big impacts on software licensing (e.g. if you need to add nodes just for capacity, you will also pay for compute licenses), overbearing “East-West” network traffic, and in some cases unacceptable data availability risks (e.g. when servers lock/crash/reboot for any reason, an HCI replication/rebuild can be a highly vulnerable window).
OPEN Converged Infrastructure - Server Powered Storage (SPS)
When it comes to performance, IO still may need to transit a network incurring a latency penalty. To help, there are several third party vendors of IO caching that can be layered in the IO path – integrated with the server or hypervisor driver stack or even placed in the network. These caching solutions take advantage of server memory or flash to help accelerate IO. However, layering in yet another vendor and product into the IO path incurs additional cost, and also complicates the end-to-end IO visibility. Multiple layers of caches (vm, hypervisor, server, network, storage) can disguise a multitude of ultimately degrading performance issues.
Ideally, end-to-end IO, from within each local server to shared capacity, should all fall into a single converged storage solution – one that is focused on providing the best IO service by distributing and coordinating storage functionality where it best serves the IO consuming applications. It should also optimize IT’s governance, cost, and data protection requirements. Some HCI solutions might claim this in total, but only by converging everything into a single vendor appliance. But what if you want a easier solution capable of simply replace aging arrays in your existing virtualized environments – especially enabling scalability in multiple directions at different times and delivering extremely low latency while still supporting a complex mix of diverse workloads?
This is where we’d look to a Server Powered Storage (SPS) design. For example, Datrium DVX still protects data with cost-efficient shared data servers on the back-end for enterprise quality data protection, yet all the compute-intensive, performance-impacting functionality is “pushed” up into each server to provide local, accelerated IO. As Datrium’s design leverages each application server instead of requiring dedicated storage controllers, the cost of Datrium compared to traditional arrays is quite favorable, and the performance is even better than (and as scalable as) a 3rd party cache layered over a remote SAN.
In the resulting Datrium “open converged” infrastructure stack, all IO is deduped and compressed (and locally served) server-side to optimize storage resources and IO performance, while management of storage is fully VM-centric (no LUN’s to manage). In this distributed, open and unlocked architecture, performance scales with each server added to naturally scale storage performance with application growth.
Datrium DVX makes great leverage for a given flash investment by using any “bring-your-own” SSDs, far cheaper to add than array-side flash (and can be added to specific servers as needed/desired). In fact, most vm’s and workloads won’t ever read from the shared capacity on the network – it is write-optimized persistent data protection and can be filled with cost-effective high-capacity drives.
Taneja Group Opinion
As just one of IT’s major concerns, all data bits must be persisted and fully managed and protected somewhere at the end of the day. Traditional arrays, converged or not, just don’t perform well in highly virtualized environments, and using SDS (powering HCI solutions) to farm all that critical data across fungible compute servers invokes some serious data protection challenges. It just makes sense to look for a solution that leverages the best aspects of both enterprise arrays (for data protection) and software/hyperconverged solutions (that localize data services for performance).
At the big picture level, Server Powered Storage can be seen as similar (although more cost-effective and performant) to a multi-vendor solution in which IT layers server-side IO acceleration functionality from one vendor over legacy or existing SANs from another vendor. But now we are seeing a convergence (yes, this is an overused word these days, but accurate here) of those IO path layers into a single vendor product. Of course, a single vendor solution that fully integrates distributed capabilities in one deployable solution will perform better and be naturally easier to manage and support (and likely cheaper).
There is no point in writing storage RFP’s today that get tangled up in terms like SDS or HCI. Ultimately the right answer for any scenario is to do what is best for applications and application owners while meeting IT responsibilities. For existing virtualization environments, new approaches like Server Powered Storage and Open Convergence offer considerable benefit in terms of performance and cost (both OPEX and CAPEX). We highly recommend that before one invests in expensive all-flash arrays, or takes on a full migration to HCI, that an Open Convergence option like Datrium DVX be considered as a potentially simpler, more cost-effective, and immediately rewarding solution.
NOTICE: The information and product recommendations made by the TANEJA GROUP are based upon public information and sources and may also include personal opinions both of the TANEJA GROUP and others, all of which we believe to be accurate and reliable. However, as market conditions change and not within our control, the information and recommendations are made without warranty of any kind. All product names used and mentioned herein are the trademarks of their respective owners. The TANEJA GROUP, Inc. assumes no responsibility or liability for any damages whatsoever (including incidental, consequential or otherwise), caused by your use of, or reliance upon, the information and recommendations presented herein, nor for any inadvertent errors that may appear in this document.
Today we are seeing big impacts on storage from the huge increase in the scale of an organization’s important data (e.g. Big Data, Internet Of Things) and the growing size of virtualization clusters (e.g. never-ending VM’s, VDI, cloud-building). In addition, virtualization adoption tends to increase the generalization of IT admins. In particular, IT groups are focusing more on servicing users and applications and no longer want to be just managing infrastructure for infrastructure’s sake. Everything that IT does is becoming interpreted, analyzed, and managed in application/business terms, including storage to optimize the return on their total IT investment. To move forward, an organization’s storage infrastructure not only needs to grow internally smarter, it also needs to become both VM and application aware.
While server virtualization made a lot of things better for the over-taxed IT shop, delivering quality storage services in hypervisor infrastructures with traditional storage created difficult challenges. In response Tintri pioneered per-VM storage infrastructure. The Tintri VMstore has eliminated multiple points of storage friction and pain. In fact, it’s now becoming a mandatory checkbox across the storage market for all arrays to claim some kind of VM-centricity. Unfortunately, traditional arrays are mainly focused on checking off rudimentary support for external hypervisor APIs that only serve to re-package the same old storage. The best fit to today’s (and tomorrow’s) virtual storage requirements will only come from fully engineered VM-centric storage and application-aware approaches as Tintri has done.
However, it’s not enough to simply drop in storage that automatically drives best practice policies and handles today’s needs. We all know change is constant, and key to preparing for both growth and change is having a detailed, properly focused view of today’s large scale environments, along with smart planning tools that help IT both optimize current resources and make the best IT investment decisions going forward. To meet those larger needs, Tintri has rolled out a Tintri Analytics SaaS-based offering that applies big data analytical power to the large scale of their customer’s VMstore VM-aware metrics.
In this report we will look briefly at Tintri’s overall “per-VM” storage approach and then take a deeper look at their new Tintri Analytics offering. The new Tintri Analytics management service further optimizes their app-aware VM storage with advanced VM-centric performance and capacity management. With this new service, Tintri is helping their customers receive greater visibility, insight and analysis over large, cloud-scale virtual operations. We’ll see how “big data” enhanced intelligence provides significant value and differentiation, and get a glimpse of the payback that a predictive approach provides both the virtual admin and application owners.