Join Newsletter
Forgot
password?
Register
Trusted Business Advisors, Expert Technology Analysts

Profiles/Reports

Recently Added

Page 3 of 150 pages  < 1 2 3 4 5 >  Last ›
Free Reports

Apache Spark Market Survey: Cloudera Sponsored Research

Apache Spark has quickly grown into one of the major big data ecosystem projects and shows no signs of slowing down. In fact, even though Spark is well connected within the broader Hadoop ecosystem, Spark adoption by itself has enough energy and momentum that it may very well become the center of its own emerging market category. In order to better understand Spark’s growing role in big data, Taneja Group conducted a major Spark market research project. We surveyed nearly seven thousand (6900+) qualified technical and managerial people working with big data from around the world to explore their experiences with and intentions for Spark adoption and deployment, their current perceptions of the Spark marketplace and of the future of Spark itself.

We found that across the broad range of industries, company sizes, and big data maturities represented in the survey, over one-half (54%) of respondents are already actively using Spark. Spark is proving invaluable as 64% of those currently using Spark plan to notably increase their usage within the next 12 months. And new Spark user adoption is clearly growing – 4 out of 10 of those who are already familiar with Spark but not yet using it plan to deploy Spark soon.

The top reported use cases globally for Spark include the expected Data Processing/Engineering/ETL (55%), followed by forward-looking data science applications like Real-Time Stream Processing (44%), Exploratory Data Science (33%), and Machine Learning (33%). The more traditional analytics applications like Customer Intelligence (31%) and BI/DW (29%) were close behind, and illustrate that Spark is capable of supporting many different kinds of organizational big data needs. The main reasons and drivers reported for adopting Spark over other solutions start with Performance (mentioned by 74%), followed by capabilities for Advanced Analytics (49%), Stream Processing (42%) and Ease of Programming (37%).

When it comes to choosing a source for Spark, more than 6 out of 10 Spark users in the survey have considered or evaluated Cloudera, nearly double the 35% that may have looked at the Apache Download or the 33% that considered Hortonworks. Interestingly, almost all (90+%) of those looking at Cloudera Spark adopted it for their most important use case, equating to 57% of those who evaluated Cloudera overall. Organizations cited quality of support (46%) as their most important selection factor, followed by demonstrated commitment to open source (29%), enterprise licensing costs (27%) and the availability of cloud support (also 27%).

Interestingly, while on-premise Spark deployments dominate today (more than 50%), there is a strong interest in transitioning many of those to cloud deployments going forward. Overall Spark deployment in public/private cloud (IaaS or PaaS) is projected to increase significantly from 23% today to 36%, along with a corresponding increase in using Spark SaaS, from 3% to 9%.

The biggest challenge with Spark, similar to what has been previously noted across the broader big data solutions space, is still reported by 6 out of 10 active users to be the big data skills/training gap within their organizations. Similarly, more than one-third mention complexity in learning/integrating Spark as a barrier to adoption. Despite these reservations, we note that compared to many previous big data analytics platforms, Spark today offers a higher—and often already familiar—level of interaction to users through its support of Python, R, SQL, notebooks, and seamless desktop-to-cluster operations, all of which no doubt contribute to its greatly increasing popularity and widespread adoption.

Overall, it’s clear that Spark has gained broad familiarity within the big data world and built significant momentum around adoption and deployment. The data highlights widespread current user success with Spark, validation of its reliability and usefulness to those who are considering adoption, and a growing set of use cases to which Spark can be successfully applied. Other big data solutions can offer some similar and overlapping capabilities (there is always something new just around the corner), but we believe that Spark, having already captured significant mindshare and proven real-world value, will continue to successfully expand on its own vortex of focus and energy for at least the next few years.

Publish date: 11/07/16
Profile

Optimizing VM Storage Performance & Capacity - Tintri Customers Leverage New Predictive Analytics

Today we are seeing big impacts on storage from the huge increase in the scale of an organization’s important data (e.g. Big Data, Internet Of Things) and the growing size of virtualization clusters (e.g. never-ending VM’s, VDI, cloud-building). In addition, virtualization adoption tends to increase the generalization of IT admins. In particular, IT groups are focusing more on servicing users and applications and no longer want to be just managing infrastructure for infrastructure’s sake. Everything that IT does is becoming interpreted, analyzed, and managed in application/business terms, including storage to optimize the return on their total IT investment. To move forward, an organization’s storage infrastructure not only needs to grow internally smarter, it also needs to become both VM and application aware.

While server virtualization made a lot of things better for the over-taxed IT shop, delivering quality storage services in hypervisor infrastructures with traditional storage created difficult challenges. In response Tintri pioneered per-VM storage infrastructure. The Tintri VMstore has eliminated multiple points of storage friction and pain. In fact, it’s now becoming a mandatory checkbox across the storage market for all arrays to claim some kind of VM-centricity. Unfortunately, traditional arrays are mainly focused on checking off rudimentary support for external hypervisor APIs that only serve to re-package the same old storage. The best fit to today’s (and tomorrow’s) virtual storage requirements will only come from fully engineered VM-centric storage and application-aware approaches as Tintri has done.

However, it’s not enough to simply drop in storage that automatically drives best practice policies and handles today’s needs. We all know change is constant, and key to preparing for both growth and change is having a detailed, properly focused view of today’s large scale environments, along with smart planning tools that help IT both optimize current resources and make the best IT investment decisions going forward. To meet those larger needs, Tintri has rolled out a Tintri Analytics SaaS-based offering that applies big data analytical power to the large scale of their customer’s VMstore VM-aware metrics.

In this report we will look briefly at Tintri’s overall “per-VM” storage approach and then take a deeper look at their new Tintri Analytics offering. The new Tintri Analytics management service further optimizes their app-aware VM storage with advanced VM-centric performance and capacity management. With this new service, Tintri is helping their customers receive greater visibility, insight and analysis over large, cloud-scale virtual operations. We’ll see how “big data” enhanced intelligence provides significant value and differentiation, and get a glimpse of the payback that a predictive approach provides both the virtual admin and application owners. 

Publish date: 11/04/16
Report

Qumulo Tackles the Machine Data Challenge: Six Customers Explain How

We are moving into a new era of data storage. The traditional storage infrastructure that we know (and do not necessarily love) was designed to process and store input from human beings. People input emails, word processing documents and spreadsheets. They created databases and recorded business transactions. Data was stored on tape, workstation hard drives, and over the LAN.

In the second stage of data storage development, humans still produced most content but there was more and more of it, and file sizes got larger and larger. Video and audio, digital imaging, websites streaming entertainment content to millions of users; and no end to data growth. Storage capacity grew to encompass large data volumes and flash became more common in hybrid and all-flash storage systems.

Today, the storage environment has undergone another major change. The major content producers are no longer people, but machines. Storing and processing machine data offers tremendous opportunities: Seismic and weather sensors that may lead to meaningful disaster warnings. Social network diagnostics that display hard evidence of terrorist activity. Connected cars that could slash automotive fatalities. Research breakthroughs around the human brain thanks to advances in microscopy.

However, building storage systems that can store raw machine data and process it is not for the faint of heart. The best solution today is massively scale-out, general purpose NAS. This type of storage system has a single namespace capable of storing billions of differently sized files, linearly scales performance and capacity, and offers data-awareness and real-time analytics using extended metadata.

There are a very few vendors in the world today who offer this solution. One of them is Qumulo. Qumulo’s mission is to provide high volume storage to business and scientific environments that produce massive volumes of machine data.

To gauge how well Qumulo works in the real world of big data, we spoke with six customers from life sciences, media and entertainment, telco/cable/satellite, higher education and the automotive industries. Each customer deals with massive machine-generated data and uses Qumulo to store, manage, and curate mission-critical data volumes 24x7. Customers cited five major benefits to Qumulo: massive scalability, high performance, data-awareness and analytics, extreme reliability, and top-flight customer support.

Read on to see how Qumulo supports large-scale data storage and processing in these mission-critical, intensive machine data environments.

Publish date: 10/26/16
Profile

Petabyte-Scale Backup Storage Without Compromise: A Look at Scality RING for Enterprise Backup

Traditional backup storage is being challenged by the immense growth of data. These solutions including tape, RAID devices that are gated by controllers and dedicated storage appliances simply aren’t designed for today’s enterprise backup storage at petabyte levels, especially when that data lives in geographically distributed environments. This insufficiency is due in large part to inefficiency and limited data protection, as well as the limited scalability and the lack of flexibility of these traditional storage solutions.

These constraints can lead to multiple processes and many storage systems to manage. Storage silos develop as a result, creating complexity, increasing operational costs and adding risk. It is not unusual for companies to have 10-20 different storage systems to achieve petabyte storage capacity, which is inefficient from a management point of view. And if companies want to move data from one storage system to another, the migration process can take a lot of time and place even more demand on data center resources.

And the concerns go beyond management complexity. Companies face higher capital costs due to relatively high priced proprietary storage hardware, and worse, limited fault tolerance, which can lead to data loss if a system incurs simultaneous disk failures. Slow access speeds also present a major challenge if IT teams need to restore large amounts of data from tape while maintaining production environments. As a result, midsized companies, large enterprises and service providers that experience these issues have begun to shift to software-defined storage solutions and scale-out object storage technology that addresses the shortcomings of traditional backup storage.

Software-defined scale out storage is attractive for large-scale data backup because these storage solutions offer linear performance and hardware independence – two core capabilities that drive tremendous scalability and enable cost-effective storage solutions. Add to this the high fault tolerance of object storage platforms, and it’s easy to see why software-defined object storage solutions are rapidly becoming the preferred backup storage approach for petabyte-scale data environments. A recent Taneja Group survey underscores the benefits of software-defined scale out storage. IT professionals indicated that the top benefits of software-defined, scale-out architecture on industry standard servers are a high level of flexibility (34%), low cost of deployment (34%), modular scalability (32%), and ability to purchase hardware separate from software (32%).

Going a step further, the Scality backup storage solution built upon the Scality RING platform offers the rare combination of scalability, durability and affordability plus the flexibility to handle mixed workloads at petabyte-scale. Scality backup storage achieves this by supporting multiple file and object protocols so companies can backup files, objects and VMs, leveraging a scale-out file system that delivers linear performance as system capacity increases, offering advanced data protection for extreme fault tolerance, enabling hardware independence for better price performance and providing auto balancing that enables migration-free hardware upgrades.

In this paper, we will look at the limitations of backup appliances and Network-Attached Storage (NAS) and the key requirements for backup storage at petabyte-scale. We will also study the Scality RING software-defined architecture and provide an overview of the Scality backup storage solution.

Publish date: 10/18/16
Page 3 of 150 pages  < 1 2 3 4 5 >  Last ›