Apache Spark has quickly grown into one of the major big data ecosystem projects and shows no signs of slowing down. In fact, even though Spark is well connected within the broader Hadoop ecosystem, Spark adoption by itself has enough energy and momentum that it may very well become the center of its own emerging market category. In order to better understand Spark’s growing role in big data, Taneja Group conducted a major Spark market research project. We surveyed nearly seven thousand (6900+) qualified technical and managerial people working with big data from around the world to explore their experiences with and intentions for Spark adoption and deployment, their current perceptions of the Spark marketplace and of the future of Spark itself.
We found that across the broad range of industries, company sizes, and big data maturities represented in the survey, over one-half (54%) of respondents are already actively using Spark. Spark is proving invaluable as 64% of those currently using Spark plan to notably increase their usage within the next 12 months. And new Spark user adoption is clearly growing – 4 out of 10 of those who are already familiar with Spark but not yet using it plan to deploy Spark soon.
The top reported use cases globally for Spark include the expected Data Processing/Engineering/ETL (55%), followed by forward-looking data science applications like Real-Time Stream Processing (44%), Exploratory Data Science (33%), and Machine Learning (33%). The more traditional analytics applications like Customer Intelligence (31%) and BI/DW (29%) were close behind, and illustrate that Spark is capable of supporting many different kinds of organizational big data needs. The main reasons and drivers reported for adopting Spark over other solutions start with Performance (mentioned by 74%), followed by capabilities for Advanced Analytics (49%), Stream Processing (42%) and Ease of Programming (37%).
When it comes to choosing a source for Spark, more than 6 out of 10 Spark users in the survey have considered or evaluated Cloudera, nearly double the 35% that may have looked at the Apache Download or the 33% that considered Hortonworks. Interestingly, almost all (90+%) of those looking at Cloudera Spark adopted it for their most important use case, equating to 57% of those who evaluated Cloudera overall. Organizations cited quality of support (46%) as their most important selection factor, followed by demonstrated commitment to open source (29%), enterprise licensing costs (27%) and the availability of cloud support (also 27%).
Interestingly, while on-premise Spark deployments dominate today (more than 50%), there is a strong interest in transitioning many of those to cloud deployments going forward. Overall Spark deployment in public/private cloud (IaaS or PaaS) is projected to increase significantly from 23% today to 36%, along with a corresponding increase in using Spark SaaS, from 3% to 9%.
The biggest challenge with Spark, similar to what has been previously noted across the broader big data solutions space, is still reported by 6 out of 10 active users to be the big data skills/training gap within their organizations. Similarly, more than one-third mention complexity in learning/integrating Spark as a barrier to adoption. Despite these reservations, we note that compared to many previous big data analytics platforms, Spark today offers a higher—and often already familiar—level of interaction to users through its support of Python, R, SQL, notebooks, and seamless desktop-to-cluster operations, all of which no doubt contribute to its greatly increasing popularity and widespread adoption.
Overall, it’s clear that Spark has gained broad familiarity within the big data world and built significant momentum around adoption and deployment. The data highlights widespread current user success with Spark, validation of its reliability and usefulness to those who are considering adoption, and a growing set of use cases to which Spark can be successfully applied. Other big data solutions can offer some similar and overlapping capabilities (there is always something new just around the corner), but we believe that Spark, having already captured significant mindshare and proven real-world value, will continue to successfully expand on its own vortex of focus and energy for at least the next few years.
The race is on at full speed. What race? The race to bring public cloud agility and economics to a data center near you. Ever since the first integrated systems came onto the scene in 2010, vendors have been furiously engineering solutions to make on-premises infrastructure as cost effective and as easy to use as the public cloud, while also providing the security, availability, and control that enterprises demand. Fundamentally, two main architectures have evolved within the race to modernize data centers that will create a foundation enabling fully private and hybrid clouds. The first approach uses traditional compute, storage, and networking infrastructure components (traditional 3-tier) overlaid with varying degrees of virtualization and management software. The second more recent approach is to build a fully virtualized data center using industry standard servers and networking and then layer on top of that a full suite of software-based compute, network, and storage virtualization with management software. This approach is often termed a Software-Defined Data Center (SDDC).
The goal of an SDDC is to extend virtualization techniques across the entire data center to enable the abstraction, pooling, and automation of all data center resources. This would allow a business to dynamically reallocate any part of the infrastructure for various workload requirements without forklifting hardware or rewiring. VMware has taken SDDC to a new level with VMware Cloud Foundation. VMware Cloud Foundation is the only unified SDDC platform for the hybrid cloud, which brings together VMware’s compute, storage, and network virtualization into a natively integrated stack that can be deployed on-premises or run as a service from the public cloud. It establishes a common cloud infrastructure foundation that gives customers a unified and consistent operational model across the private and public cloud.
VMware Cloud Foundation delivers an industry-leading SDDC cloud infrastructure by combining VMware’s highly scalable hyper-converged software (vSphere and VSAN) with the industry leading network virtualization platform, NSX. VMware Cloud Foundation comes with unique lifecycle management capabilities (SDDC Manager) that eliminate the overhead of system operations of the cloud infrastructure stack by automating day 0 to day 2 processes such as bring-up, configuration, workload provisioning, and patching/upgrades. As a result, customers can significantly shorten application time to market, boost cloud admin productivity, reduce risk, and lower TCO. Customers consume VMware Cloud Foundation software in three ways: factory pre-loaded on integrated systems (VxRack 1000 SDDC); deployed on top qualified Ready Nodes from HPE, QCT, Fujitsu, and others in the future, with qualified networking; and run as a service from the public cloud through IBM, vCAN partners, vCloud Air, and more to come.
In this comparative study, Taneja Group performed an in-depth analysis of VMware Cloud Foundation deployed on qualified Ready Nodes and qualified networking versus several traditional 3-tier converged infrastructure (CI) integrated systems and traditional 3-tier do-it-yourself (DIY) systems. We analyzed the capabilities and contrasted key functional differences driven by the various architectural approaches. In addition, we evaluated the key CapEx and OpEx TCO cost components. Taneja Group configured each traditional 3-tier system's hardware capacity to be as close as possible to the VMware Cloud Foundation qualified hardware capacity. Further, since none of the 3-tier systems had a fully integrated SDDC software stack, Taneja Group added the missing SDDC software, making it as close as possible to the VMware Cloud Foundation software stack. The quantitative comparative results from the traditional 3-tier DIY and CI systems were averaged together into one scenario because the hardware and software components are very similar.
Our analysis concluded that both types of solutions are more than capable of handling a variety of virtualized workload requirements. However, VMware Cloud Foundation has demonstrated a new level of ease-of-use due to its modular scale-out architecture, native integration, and automatic lifecycle management, giving it a strong value proposition when building out modern next generation data centers. The following are the five key attributes that stood out during the analysis:
- Native Integration of the SDDC: VMware Cloud Foundation natively integrates vSphere, Virtual SAN (VSAN), and NSX network virtualization.
- Simplest operational experience: VMware SDDC Manager automates the life-cycle of the SDDC stack including bring up, configuration, workload provisioning, and patches/upgrades.
- Isolated workload domains: VMware Cloud Foundation provides unique administrator tools to flexibly provision subsets of the infrastructure for multi-tenant isolation and security.
- Modular linear scalability: VMware Cloud Foundation employs an architecture in which capacity can be scaled by the HCI node, by the rack, or by multiple racks.
- Seamless Hybrid Cloud: Deploy VMware Cloud Foundation for private cloud and consume on public clouds to create a seamless hybrid cloud with a consistent operational experience.
Taneja Group’s in-depth analysis indicates that VMware Cloud Foundation will enable enterprises to achieve significant cost savings. Hyper-converged infrastructure, used by many web-scale service providers, with natively integrated SDDC software significantly reduced server, storage, and networking costs. This hardware cost saving more than offset the incremental SDDC software costs needed to deliver the storage and networking capability that typically is provided in hardware from best of breed traditional 3-tier components. In this study, we measured the upfront CapEx and 3 years of support costs for the hardware and software components needed to build out a VMware Cloud Foundation private cloud on qualified Ready Nodes. In addition, Taneja Group validated a model that demonstrates the labor and time OpEx savings that can be achieved through the use of integrated end-to-end automatic lifecycle management in the VMware SDDC Manager software.
By investing in VMware Cloud Foundation, businesses can be assured that their data center infrastructure can be easily consumed, scaled, managed, upgraded and enhanced to provide the best private cloud at the lowest cost. Using a pre-engineered modular, scale-out approach to building at web-scale means infrastructure is added in hours, not days, and businesses can be assured that adding infrastructure scales linearly without complexity. VMware Cloud Foundation is the only platform that provides a natively integrated unified SDDC platform for the hybrid cloud with end-to-end management and with the flexibility to provision a wide variety of workloads at the push of a button.
In summary, VMware Cloud Foundation enables at least five unparalleled capabilities, generates a 45% lower 3-year TCO than the alternative traditional 3-tier approaches, and delivers a tremendous value proposition when building out a modern hybrid SDDC platform. Before blindly going down the traditional infrastructure approach, companies should take a close look at VMware Cloud Foundation, a unified SDDC platform for the hybrid cloud.
In this report, Taneja Group presents an evaluation of the current IT Cloud Management market landscape for enterprise customers. We look at this landscape as an evolution of IT operations management grown up into the cloud era. In addition to increasingly smart and capable operational monitoring and systems management, good cloud management also requires sophisticated capabilities in both automation and orchestration at scale to support end-user provisioning and agility, and detailed financial management services that reveal multi-cloud costs for analysis and chargeback or showback. Our objective is to evaluate cloud management offerings from leading vendors to enable senior business and technology leaders to decide which vendors offer the best overall solution.
In this study, we evaluated vendors with offerings in one or more of the three fundamental areas. Several well-known vendors (VMware, Microsoft, ServiceNow, HPE, IBM and BMC) have solutions in all three areas. Other vendors focus on only one or two areas, and because it’s possible to compose a broader solution from parts, we’ve evaluated popular niche solutions within each area. All companies were required to have solutions that were generally available as of April 2016. To fairly assess the offerings, we looked at a set of differentiating factors in each of the categories that we believe enterprise customers should use to qualify cloud management solutions. As a final step, to facilitate optimal enterprise selection, we also evaluated the full solution vendors at a higher level where we looked at additional value derived from integrations across areas and other important enterprise vendor engagement factors.
Within each of the three areas that we will refer to as Cloud Orchestration, Operations Management, and Financial Management, and at the vendor level for full-suite vendors, we’ve applied categories of factors for scoring as determined by our team of experts, based on customer buying criteria, technical innovation, and market drivers. The overall results of the evaluation revealed that VMware has a strong lead in today’s competitive cloud management landscape.
In this report, Taneja Group presents an evaluation of the current IT Cloud Management market landscape for enterprise customers. We look at this landscape as an evolution of IT operations management grown up into the cloud era. In addition to increasingly smart and capable operational monitoring and systems management, good cloud management also requires sophisticated capabilities in both automation and orchestration at scale to support end-user provisioning and agility, and detailed financial management services that reveal multi-cloud costs for analysis and chargeback or showback. Our objective is to evaluate cloud management offerings from leading vendors to help senior business and technology leaders decide which vendors offer the best solution. In this study, we evaluated vendors with offerings in one or more of the three fundamental areas. Several well-known vendors (VMware, Microsoft, ServiceNow, HPE, IBM and BMC) have solutions in all three areas. Other vendors focus on only one or two areas, and because it’s possible to compose a broader solution from parts, we’ve evaluated popular niche solutions within each area. All companies were required to have solutions that were generally available as of April 2016. To fairly assess the offerings, we looked at a set of differentiating factors in each of the categories that we believe enterprise customers should use to qualify cloud management solutions. As a final step, to facilitate optimal enterprise selection, we also evaluated the full solution vendors at a higher level where we looked at additional value derived from integrations across areas and other important enterprise vendor engagement factors. Within each of the three areas that we will refer to as Cloud Orchestration, Operations Management, and Financial Management, and at the vendor level for full-suite vendors, we’ve applied categories of factors for scoring as determined by our team of experts, based on customer buying criteria, technical innovation, and market drivers. The overall results of the evaluation revealed that VMware has a strong lead in today’s competitive cloud management landscape.
Virtual Instruments, the company created by the combination of the original Virtual Instruments and Load DynamiX, recently made available a free cloud-based service and community called WorkloadCentral. The service is designed to help storage professionals understand workload behavior and improve their knowledge of storage performance. Most will find valuable insights into storage performance with the simple use of this free service. For those who want to get a deeper understanding of workload behavior over time, or evaluate different storage products to determine which one is right for their specific application environment, or optimize their storage configurations for maximum efficiency, they can buy additional Load DynamiX Enterprise products available from the company.
The intent with WorkloadCentral is to create a web-based community that can share information about a variety of application workloads, perform workload analysis and create workload simulations. In an industry where workload sharing has been almost absent, this service will be well received by storage developers and IT users alike.
Read on to understand where WorkloadCentral fits into the overall application and storage performance spectrum...
Decades of constantly advancing computing solutions have changed the world in tremendous ways, but interestingly, the IT folks running the show have long been stuck with only piecemeal solutions for managing and optimizing all that blazing computing power. Sometimes it seems like IT is a pit crew servicing a modern racing car with nothing but axes and hammers – highly skilled but hampered by their legacy tools.
While that may be a slight exaggeration, there is a serious lack of interoperability or opportunity to create joint insight between the highly varied perspectives that individual IT tools produce (even if each is useful in its own purpose). There simply has never been a widely adopted standard for creating, storing or sharing system management data, much less a cross-vendor way to holistically merge heterogeneously collected or produced management data together – even for the beneficial use of harried and often frustrated IT owners that might own dozens or more differently sourced system management solutions. That is until now.
OpsDataStore has brought the IT management game to a new level with an easy to deploy, centralized, intelligent – and big data enabled – management data “service”. It readily sucks in all the lowest level, fastest streaming management data from a plethora of tools (several ready to go at GA, but easily extended to any data source), automatically and intelligently relates data from disparate sources into a single unified “agile” model, directly provides fundamental visualization and analysis, and then can serve that unified and related data back out to enlightened and newly comprehensive downstream management workflows. OpsDataStore drops in and serves as the new systems management “nexus” between formerly disparate vendor and domain management solutions.
If you have ever been in IT, you’ve no doubt written scripts, fiddled with logfiles, created massive spreadsheets, or otherwise attempted to stitch together some larger coherent picture by marrying and merging data from two (or 18) different management data sources. The more sources you might have, the more the problem (or opportunity) grows non-linearly. OpsDataStore promises to completely fill in this gap, enabling IT to automatically multiply the value of their existing management solutions.