Join Newsletter
Forgot
password?
Register
Trusted Business Advisors, Expert Technology Analysts

Research Areas

Big Data

Includes Big Data Appliances, Scale-out Object Storage, On-Premise and Cloud Big Data, Data Management at scale, Big Data Protection and Operations.

Our big data practice covers the whole big data storage and analytics space for enterprises, from on-premises big data storage to cloud-scale infrastructure. We analyze all the emerging and established technology and trends in these active and growing categories, and communicate their detailed impacts on the enterprise and to their business-focused big data analysts. Corporations are now actively looking for key transformative technologies to help them meet their new big data challenges and the increasing demands of both data scientists and production application owners, all while controlling the exploding costs of storing, managing and protecting their growing big data stores.

Page 1 of 7 pages  1 2 3 >  Last ›
Free Reports

Apache Spark Market Survey: Cloudera Sponsored Research

Apache Spark has quickly grown into one of the major big data ecosystem projects and shows no signs of slowing down. In fact, even though Spark is well connected within the broader Hadoop ecosystem, Spark adoption by itself has enough energy and momentum that it may very well become the center of its own emerging market category. In order to better understand Spark’s growing role in big data, Taneja Group conducted a major Spark market research project. We surveyed nearly seven thousand (6900+) qualified technical and managerial people working with big data from around the world to explore their experiences with and intentions for Spark adoption and deployment, their current perceptions of the Spark marketplace and of the future of Spark itself.

We found that across the broad range of industries, company sizes, and big data maturities represented in the survey, over one-half (54%) of respondents are already actively using Spark. Spark is proving invaluable as 64% of those currently using Spark plan to notably increase their usage within the next 12 months. And new Spark user adoption is clearly growing – 4 out of 10 of those who are already familiar with Spark but not yet using it plan to deploy Spark soon.

The top reported use cases globally for Spark include the expected Data Processing/Engineering/ETL (55%), followed by forward-looking data science applications like Real-Time Stream Processing (44%), Exploratory Data Science (33%), and Machine Learning (33%). The more traditional analytics applications like Customer Intelligence (31%) and BI/DW (29%) were close behind, and illustrate that Spark is capable of supporting many different kinds of organizational big data needs. The main reasons and drivers reported for adopting Spark over other solutions start with Performance (mentioned by 74%), followed by capabilities for Advanced Analytics (49%), Stream Processing (42%) and Ease of Programming (37%).

When it comes to choosing a source for Spark, more than 6 out of 10 Spark users in the survey have considered or evaluated Cloudera, nearly double the 35% that may have looked at the Apache Download or the 33% that considered Hortonworks. Interestingly, almost all (90+%) of those looking at Cloudera Spark adopted it for their most important use case, equating to 57% of those who evaluated Cloudera overall. Organizations cited quality of support (46%) as their most important selection factor, followed by demonstrated commitment to open source (29%), enterprise licensing costs (27%) and the availability of cloud support (also 27%).

Interestingly, while on-premise Spark deployments dominate today (more than 50%), there is a strong interest in transitioning many of those to cloud deployments going forward. Overall Spark deployment in public/private cloud (IaaS or PaaS) is projected to increase significantly from 23% today to 36%, along with a corresponding increase in using Spark SaaS, from 3% to 9%.

The biggest challenge with Spark, similar to what has been previously noted across the broader big data solutions space, is still reported by 6 out of 10 active users to be the big data skills/training gap within their organizations. Similarly, more than one-third mention complexity in learning/integrating Spark as a barrier to adoption. Despite these reservations, we note that compared to many previous big data analytics platforms, Spark today offers a higher—and often already familiar—level of interaction to users through its support of Python, R, SQL, notebooks, and seamless desktop-to-cluster operations, all of which no doubt contribute to its greatly increasing popularity and widespread adoption.

Overall, it’s clear that Spark has gained broad familiarity within the big data world and built significant momentum around adoption and deployment. The data highlights widespread current user success with Spark, validation of its reliability and usefulness to those who are considering adoption, and a growing set of use cases to which Spark can be successfully applied. Other big data solutions can offer some similar and overlapping capabilities (there is always something new just around the corner), but we believe that Spark, having already captured significant mindshare and proven real-world value, will continue to successfully expand on its own vortex of focus and energy for at least the next few years.

Publish date: 11/07/16
Profile

Now Big Data Works for Every Enterprise: Pepperdata Adds Missing Performance QoS to Hadoop

While a few well-publicized web 2.0 companies are taking great advantage of foundational big data solution that they have themselves created (e.g. Hadoop), most traditional enterprise IT shops are still thinking about how to practically deploy their first business-impacting big data applications – or have dived in and are now struggling mightily to effectively manage a large Hadoop cluster in the middle of their production data center. This has led to the common perception that realistic big data business value may yet be just out of reach for most organizations – especially those that need to run lean and mean on both staffing and resources.   

This new big data ecosystem consists of scale-out platforms, cutting-edge open source solutions, and massive storage that is inherently difficult for traditional IT shops to optimally manage in production – especially with still evolving ecosystem management capabilities. In addition, most organizations need to run large clusters supporting multiple users and applications to control both capital and operational costs. Yet there are no native ways to guarantee, control, or even gain visibility into workload-level performance within Hadoop. Even if there wasn’t a real high-end skills and deep expertise gap for most, there still isn’t any practical way that additional experts could tweak and tune mixed Hadoop workload environments to meet production performance SLA’s.

At the same time, the competitive game of mining of value from big data has moved from day-long batch ELT/ETL jobs feeding downstream BI systems, to more user interactive queries and business process “real time” applications. Live performance matters as much now in big data as it does in any other data center solution. Ensuring multi-tenant workload performance within Hadoop is why Pepperdata, a cluster performance optimization solution, is critical to the success of enterprise big data initiatives.

In this report we’ll look deeper into today’s Hadoop deployment challenges and learn how performance optimization capabilities are not only necessary for big data success in enterprise production environments, but can open up new opportunities to mine additional business value. We’ll look at Pepperdata’s unique performance solution that enables successful Hadoop adoption for the common enterprise. We’ll also examine how it inherently provides deep visibility and reporting into who is doing what/when for troubleshooting, chargeback and other management needs. Because Pepperdata’s function is essential and unique, not to mention its compelling net value, it should be a checklist item in any data center Hadoop implementation.

To read this full report please click here.

Publish date: 12/17/15
Report

HP Converges to Mine Big Value from Big Data

The promise of Big Data is engaging the imagination of corporations everywhere, even before looking to big data solutions to help handle the accelerated pressures of proliferating new data sources or in managing tremendously increasing amounts of raw and unstructured data. Corporations have long been highly competitive about analytically extracting value from their structured transactional data streams, but are now trying to competitively differentiate with new big data applications that span multiple kinds of data types, run in business interactive timeframes, and deliver more operational-focused (even transactional) values based on multiple types of processing.

This has led to some major re-thinking about the best approach, or journey, to success with Big Data. As mainstream enterprises are learning how and where their inevitable Big Data opportunities lie (and they all have them – ignoring them is simply not a viable strategy), they are also finding that wholesale adoption of a completely open source approach can lead to many unexpected pitfalls, like data islands, batch-analytical timeframes, multiplying scope, and constrained application value. Most of all, IT simply cannot completely halt existing processes and overnight transition to a different core business model or data platform.

But big data is already here. Companies must figure out how to process different kinds of data, stay on top of their big data “deluge”, remain agile, mine value, and yet hopefully leverage existing staff, resources and analytical investments. Some of the important questions include:

1.How to build the really exciting and valuable applications that contain multiple analytical and machine learning processing across multiple big data types?

2.How to avoid setting up two, three, or more parallel environments that require many copies of big data, complex dataflows and far too many new highly skilled experts?

We find that HP Haven presents an intriguing, proven, and enterprise-ready approach by converging structured, unstructured, machine-generated and other kinds of analytical solutions, many already proven world-class existing solutions on their own, into a single big data processing platform. This enables leveraging existing data, applications and existing experts while offering opportunities to analyze data sets in multiple ways. With this solution it’s possible to build applications that can take advantage of multiple data sources, multiple proven solutions, and easily “mash-up” whatever might be envisioned. However, the HP Haven approach doesn’t force a monolithic adoption but rather can be deployed and built-up as a customer’s big data journey progresses.

To help understand the IT challenges of big data and explore this new kind of enterprise data center platform opportunity, we’ve created this special vendor spotlight report. We start with a significant extract from the premium Taneja Group Enterprise Hadoop Infrastructure Market Landscape report to help understand the larger Hadoop market perspective. Then within that context we will review the HP Haven solution for Big Data and look at how it addresses key challenges while presenting a platform on which enterprises can develop their new big data opportunities.

Publish date: 03/16/15
Free Reports

Enterprise File Collaboration Market Landscape

Collaboration is a huge concept, even narrowing it down to enterprise file collaboration (EFC) is still a big undertaking. Many vendors are using “collaboration” in their marketing materials yet they mean many different things by it, ranging from simple business interaction to sophisticated groupware to data sharing and syncing on a wide scale. The result is a good deal of market confusion.

Frankly, vendors selling file collaboration into the enterprise cannot afford massive customer confusion because selling file collaboration into the enterprise is already an uphill battle. First, customers – business end-users – are resistant to changing their Dropbox and Dropbox-like file share applications. As far as the users are concerned their sharing is working just fine between their own devices and small teams.

IT is very concerned about this level of consumer-level file sharing and if they are not, they should be. But IT faces a battle when it attempts to wean thousands of end-users off of Dropbox on the users’ personal devices. There must be a business advantage and clear usability for users who are required to adopt a corporate file sharing application on their own device.

IT must also have good reasons to deploy corporate file sharing using the cloud. From their perspective the Dropboxes of the world are fueling the BYOD (Bring Your Own Device) phenomenon. They need to replace consumer-level file collaboration applications with an enterprise scale application and its robust management console. However, while IT may be anxious about BYOD and insecure file sharing it is not usually the most driving need on their full agenda. They need to understand how an EFC solution can solve a very large problem, and why they need to take advantage of the solution now. 

What is the solution? Enterprise file collaboration (EFC) with: 1) high scalability, 2) security, 3) control, 4) usability, and 5) compliance. In this landscape report we will discuss these five factors and the main customer drivers for this level of enterprise file collaboration.
 
Finally, we will discuss the leading vendors that offer enterprise file collaboration products and see how they stack up against our definition.



 

Publish date: 06/06/13
Report

Are You Making Money With Your Object Storage?

Object storage has long been pigeon-holed as a necessary overhead expense for long-term archive storage, a data purgatory one step before tape or deletion. In our experience, we have seen many IT shops view object storage more as something exotic they have to implement to meet government regulations rather than as a competitive strategic asset that can help their businesses make money.


Normally when companies invest in high-end IT assets like enterprise-class storage, they hope to re-coup those investments in big ways like accelerating the performance of market competitive applica-tions or efficiently consolidating data centers. Maybe they are even starting to analyze big data to find better ways to run the business. There are far more opportunities to be sure, but these kinds of “money-making” initiatives have been mainly associated with “file” and “block” types of storage – the primary storage commonly used to power databases, host office productivity applications, and build pools of shared resources for virtualization projects. But that’s about to change. If you’ve intentionally dismissed or just over-looked object storage it is time to take deeper look. Today’s object storage provides brilliant capabilities for enhancing productivity, creating global platforms and developing new revenue streams.


Object storage has been evolving from its historical second tier data dumping ground into a value-building primary storage platform for content and collaboration. And the latest high performance cloud storage solutions could transform the whole nature of enterprise data storage. To really exploit this new generation of object storage, it is important to understand not only what it is and how it has evolved, but to start thinking about how to harness its emerging capabilities in building net new business.
 

Publish date: 05/03/13
Free Reports

The Move to the Cloud - A Taneja Group eBook

eBook on Cloud Storage -- 6 Critical Questions and Answers You Need to Know

Cloud storage can be a beast to wrangle. Deciding which applications to move into the cloud, understanding how to select and deal with a cloud storage provider, deciding on cloud storage solutions – none of these are easy. Is it worth it?

It's worth it, but it’s vital that you do into it with your eyes wide open. There are many, many questions you need to ask before entrusting your data to a cloud storage provider. Questions like:

#1. What is your system uptime?

#2. What data availability service levels do you support?

#3. How easy is it to move my data to another provider?

#4. What data protection service levels do you provide?

#5. What is your level of performance?

#6. What applications can I best host in the cloud? 

 

*This eBook is free with registration.

Publish date: 09/10/12
Page 1 of 7 pages  1 2 3 >  Last ›