Taneja Group | Spark
Join Newsletter
Forgot
password?
Register
Trusted Business Advisors, Expert Technology Analysts

Items Tagged: Spark

news / Blog

IoT Goes Real-Time, Gets Predictive - Glassbeam Launches Spark-based Machine Learning

In-Memory processing was all the rage at Strata 2014 NY last month, and the hottest word was Spark! Spark is big data scale-out cluster solution that provides a way to speedily analyze large data sets in-memory using a "resilient distributed data" design for fault-tolerance. It can deploy into its own optimized cluster, or ride on top of Hadoop 2.0 using YARN... I haven't done justice to Spark itself and perhaps its biggest onrushing use case - taming the real-time data from from the Internet of Things (IoT)...

  • Premiered: 11/21/14
  • Author: Mike Matchett
Topic(s): Glassbeam Big Data Spark In Memory GridGain Machine Learning
news

Navigate data lakes to manage big data

While the data lake concept appeals to business today, IT administrators must exercise caution prior to a full-scale implementation.

  • Premiered: 06/05/15
  • Author: Mike Matchett
  • Published: TechTarget: Search Storage
Topic(s): TBA data lake TBA Storage TBA Big Data TBA storage infrastructure TBA Data protection TBA big data lake TBA analysis TBA HDFS TBA Hadoop TBA Hadoop virtualization TBA Virtualization TBA Hadoop Distributed File System TBA software-defined TBA software-defined storage TBA BI TBA Business Intelligence TBA Disaster Recovery TBA Business Continuity TBA BC TBA DR TBA analytics TBA Spark TBA HP TBA Vertica TBA HP Haven TBA Haven TBA OLAP TBA data-aware
news

Big data analytics applications impact storage systems

Analytics applications for big data have placed extensive demands on storage systems, which Mike Matchett says often requires new or modified storage structures.

  • Premiered: 09/03/15
  • Author: Mike Matchett
  • Published: TechTarget: Search Storage
Topic(s): TBA Mike Matchett TBA Big Data TBA analytics TBA Storage TBA Primary Storage TBA scalability TBA Business Intelligence TBA BI TBA AWS TBA Amazon AWS TBA S3 TBA HPC TBA High Performance Computing TBA High Performance TBA ETL TBA HP Haven TBA HP TBA Hadoop TBA Vertica TBA convergence TBA converged TBA IOPS TBA Capacity TBA latency TBA scale-out TBA software-defined TBA software-defined storage TBA SDS TBA YARN TBA Spark
news / Blog

Visualizing (and Optimizing) Cluster Performance

Clusters are the scale-out way to go in today's data center. Why not try to architect an infrastructure that can grow linearly in capacity and/or performance? Well, one problem is that operations can get quite complex especially when you start mixing workloads and tenants on the same cluster. In vanilla big data solutions everyone can compete, and not always fairly, for the same resources. This is a growing problem in production environments where big data apps are starting to underpin key business-impacting processes. Pepperdata was formed to help deliver consistent big data application performance. They lay in their solution into Big Data clusters (i.e. YARN/Hadoop, Spark...) and then can dynamically optimize runtime operations - tuning and tweaking at sub-second intervals to help guarantee required QoS to key workloads....

  • Premiered: 09/29/15
  • Author: Mike Matchett
Topic(s): Pepperdata Big Data Cluster Management Chargeback Hadoop Spark Performance Optimization
news

Big Data Grows Up: APM Tools Emerging

The next emerging market for big data may be application performance monitoring. Concurrent has released a new tool in this space, and more tools are expected to hit the market in the next six months. The availability of such tools will drive more enterprise adoption of big data.

  • Premiered: 10/26/15
  • Author: Taneja Group
  • Published: Information Week
Topic(s): TBA apm TBA Big Data TBA Mike Matchett TBA Concurrent TBA Hadoop TBA High Performance TBA application performance TBA DevOps TBA Apache TBA Apache Hive TBA MapReduce TBA Cascading TBA Scalding TBA Spark TBA Performance TBA Data Center
Profiles/Reports

Multiplying the Value of All Existing IT Solutions

Decades of constantly advancing computing solutions have changed the world in tremendous ways, but interestingly, the IT folks running the show have long been stuck with only piecemeal solutions for managing and optimizing all that blazing computing power. Sometimes it seems like IT is a pit crew servicing a modern racing car with nothing but axes and hammers – highly skilled but hampered by their legacy tools.

While that may be a slight exaggeration, there is a serious lack of interoperability or opportunity to create joint insight between the highly varied perspectives that individual IT tools produce (even if  each is useful in its own purpose). There simply has never been a widely adopted standard for creating, storing or sharing system management data, much less a cross-vendor way to holistically merge heterogeneously collected or produced management data together – even for the beneficial use of harried and often frustrated IT owners that might own dozens or more differently sourced system management solutions. That is until now.

OpsDataStore has brought the IT management game to a new level with an easy to deploy, centralized, intelligent – and big data enabled – management data “service”.  It readily sucks in all the lowest level, fastest streaming management data from a plethora of tools (several ready to go at GA, but easily extended to any data source), automatically and intelligently relates data from disparate sources into a single unified “agile” model, directly provides fundamental visualization and analysis, and then can serve that unified and related data back out to enlightened and newly comprehensive downstream management workflows. OpsDataStore drops in and serves as the new systems management “nexus” between formerly disparate vendor and domain management solutions. 

If you have ever been in IT, you’ve no doubt written scripts, fiddled with logfiles, created massive spreadsheets, or otherwise attempted to stitch together some larger coherent picture by marrying and merging data from two (or 18) different management data sources. The more sources you might have, the more the problem (or opportunity) grows non-linearly. OpsDataStore promises to completely fill in this gap, enabling IT to automatically multiply the value of their existing management solutions.

Publish date: 12/03/15
news

Can your cluster management tools pass muster?

The right designs and cluster management tools ensure your clusters don't become a cluster, er, failure.

  • Premiered: 11/17/15
  • Author: Mike Matchett
  • Published: TechTarget: Search Data Center
Topic(s): TBA cluster TBA Cluster Management TBA Cluster Server TBA Storage TBA Cloud TBA Public Cloud TBA Private Cloud TBA Virtual Infrastructure TBA Virtualization TBA hyperconvergence TBA hyper-convergence TBA software-defined TBA software-defined storage TBA SDS TBA Big Data TBA scale-up TBA CAPEX TBA IT infrastructure TBA OPEX TBA Hypervisor TBA Migration TBA QoS TBA Virtual Machine TBA VM TBA VMWare TBA VMware VVOLs TBA VVOLs TBA Virtual Volumes TBA cloud infrastructure TBA OpenStack
Profiles/Reports

Now Big Data Works for Every Enterprise: Pepperdata Adds Missing Performance QoS to Hadoop

While a few well-publicized web 2.0 companies are taking great advantage of foundational big data solution that they have themselves created (e.g. Hadoop), most traditional enterprise IT shops are still thinking about how to practically deploy their first business-impacting big data applications – or have dived in and are now struggling mightily to effectively manage a large Hadoop cluster in the middle of their production data center. This has led to the common perception that realistic big data business value may yet be just out of reach for most organizations – especially those that need to run lean and mean on both staffing and resources.   

This new big data ecosystem consists of scale-out platforms, cutting-edge open source solutions, and massive storage that is inherently difficult for traditional IT shops to optimally manage in production – especially with still evolving ecosystem management capabilities. In addition, most organizations need to run large clusters supporting multiple users and applications to control both capital and operational costs. Yet there are no native ways to guarantee, control, or even gain visibility into workload-level performance within Hadoop. Even if there wasn’t a real high-end skills and deep expertise gap for most, there still isn’t any practical way that additional experts could tweak and tune mixed Hadoop workload environments to meet production performance SLA’s.

At the same time, the competitive game of mining of value from big data has moved from day-long batch ELT/ETL jobs feeding downstream BI systems, to more user interactive queries and business process “real time” applications. Live performance matters as much now in big data as it does in any other data center solution. Ensuring multi-tenant workload performance within Hadoop is why Pepperdata, a cluster performance optimization solution, is critical to the success of enterprise big data initiatives.

In this report we’ll look deeper into today’s Hadoop deployment challenges and learn how performance optimization capabilities are not only necessary for big data success in enterprise production environments, but can open up new opportunities to mine additional business value. We’ll look at Pepperdata’s unique performance solution that enables successful Hadoop adoption for the common enterprise. We’ll also examine how it inherently provides deep visibility and reporting into who is doing what/when for troubleshooting, chargeback and other management needs. Because Pepperdata’s function is essential and unique, not to mention its compelling net value, it should be a checklist item in any data center Hadoop implementation.

To read this full report please click here.

Publish date: 12/17/15
news

Concurrent app management tools work on Hadoop and Spark

If Hadoop and Spark are to sneak into the enterprise, they will need to be manageable. With Driven, Concurrent Inc. takes a stab at the problem.

  • Premiered: 12/09/15
  • Author: Taneja Group
  • Published: TechTarget: Search Data Management
Topic(s): TBA Hadoop TBA Spark TBA Driven TBA Concurrent TBA manageability TBA Big Data TBA Performance TBA Performance Management TBA Mike Matchett TBA Hive TBA MapReduce TBA SLA TBA service level agreement TBA software TBA high-fidelity TBA HiFi TBA cluster TBA Pepperdata TBA Oracle TBA IBM TBA CA
news

Mobile gaming company plays new Hadoop cluster management card

Chartboost, which operates a platform for mobile games, turned to new cluster management software in an effort to overcome problems in controlling the use of its Hadoop processing resources.

  • Premiered: 01/05/16
  • Author: Taneja Group
  • Published: TechTarget: Search Data Management
Topic(s): TBA Chartboost TBA mobile TBA cluster TBA Cluster Management TBA Hadoop TBA processing TBA data processing TBA analytics TBA Big Data TBA MapReduce TBA Hive TBA Spark TBA Optimization TBA Cloudera TBA AWS TBA Amazon TBA Cloud TBA YARN TBA Pepperdata TBA Memory TBA CPU TBA Application TBA Concurrent TBA SLA TBA service-level agreement TBA HBase TBA application performance TBA application performance management TBA Mike Matchett
news

Making Sense of the Internet of Things with Converged Infrastructure

With its flexibility and scalability, converged infrastructure can be a good solution to the influx of IoT data.

  • Premiered: 03/22/16
  • Author: Taneja Group
  • Published: Windows IT Pro
Topic(s): TBA Internet of Things TBA IoT TBA converged TBA Converged Infrastructure TBA convergence TBA IT infrastructure TBA Servers TBA Storage TBA network TBA flexibility TBA scalability TBA Data protection TBA storage architecture TBA Hadoop TBA Apache TBA Spark TBA Apache Spark TBA structured data TBA Mike Matchett
news

Galactic Exchange Launches Into Big Data Space With 5 Minute Set-Up Spark/Hadoop Powered Clusters

Galactic Exchange, Inc. officially came out of stealth mode this week to announce initial beta availability of ClusterGX™, an open source clustering solution which provides unprecedented simplicity of deployment and management of Spark/Hadoop clusters.

  • Premiered: 03/25/16
  • Author: Taneja Group
  • Published: Inside Big Data
Topic(s): TBA Galactic Exchange TBA ClusterGX TBA cluster TBA clusters TBA Open Source TBA Spark TBA Hadoop TBA Cloud TBA managed cloud TBA cloud cluster TBA Docker TBA Storage TBA cluster scaling TBA VM TBA Virtual Machine TBA Big Data TBA Virtualization TBA hyperconverged TBA hyperconvergence TBA VM-centric TBA Mike Matchett
news

Galactic Exchange can get your Hadoop cluster up and running in just 5 minutes

Stealthy startup Galactic Exchange Inc. burst out of the shadows this weekend touting a new product that’s able to spin up an Hadoop or Spark cluster, ready to go, in just five minutes.

  • Premiered: 03/28/16
  • Author: Taneja Group
  • Published: Silicon Angle
Topic(s): TBA Galactic Exchange TBA Hadoop TBA hadoop cluster TBA cluster TBA Spark TBA ClusterGX TBA strata TBA simplicity TBA Infrastructure TBA Big Data TBA Mike Matchett TBA Apache TBA Apache Mesos TBA Docker TBA hyperconverged TBA hyperconvergence TBA application performance TBA Backup TBA flexibility TBA Cloud TBA Storage TBA analysis TBA data lake TBA big data lake TBA IoT
news

Google enterprise cloud challenge unlikely to be solved soon

The Internet giant predicts a tipping point for adoption of its public cloud offering, despite lingering questions about the size of its enterprise customer base and maturity of the platform.

  • Premiered: 02/11/16
  • Author: Taneja Group
  • Published: TechTarget: Search Cloud Computing
Topic(s): TBA Google TBA enterprise cloud TBA Enterprise TBA Cloud TBA Public Cloud TBA cloud adoption TBA Google Cloud Platform TBA container TBA Storage TBA Amazon TBA Microsoft TBA Oracle TBA Amazon AWS TBA AWS TBA Amazon Web Services TBA Big Data TBA SSD TBA Flash TBA Spark TBA NoSQL TBA Virtual Machine TBA VM TBA Mike Matchett
news

Galactic Exchange Delivers ClusterGX™ Full-Service, for On-Premise Managed Big Data

Galactic Exchange, Inc. today announced the availability of its Docker container powered clustering technology (ClusterGX™) as an on-premise managed service solution (ClusterGX™ Full-Service) for customers deploying Hadoop/Spark Big Data applications.

  • Premiered: 05/31/16
  • Author: Taneja Group
  • Published: MarketWired
Topic(s): TBA Galactic Exchange TBA cluster TBA ClusterGX TBA managed big data TBA Big Data TBA Storage TBA Hadoop TBA Spark TBA Docker TBA Amazon AWS TBA AWS TBA Cloud TBA Hybrid TBA Security TBA HDFS TBA data security TBA Mike Matchett TBA Datacenter TBA Data Center
news

Spark speeds up adoption of big data clusters and clouds

Infrastructure that supports big data comes from both the cloud and clusters. Enterprises can mix and match these seven infrastructure choices to meet their needs.

  • Premiered: 07/19/16
  • Author: Mike Matchett
  • Published: TechTarget: Search IT Operations
Topic(s): TBA Apache Spark TBA Spark TBA Mike Matchett TBA Cloud TBA cloud cluster TBA cluster TBA Big Data TBA big data analytics TBA MapReduce TBA Business Intelligence TBA BI TBA MLlib TBA High Performance TBA hadoop cluster TBA HDFS TBA Hadoop Distributed File System TBA IBM TBA Hortonworks TBA Cloudera TBA capacity management TBA Performance Management TBA API TBA SAN TBA storage area networks TBA CAPEX TBA DataDirect Networks TBA HPC TBA Lustre TBA Virtualization TBA VM
news

Big Data Storage Solutions: Options Abound

Hadoop, Spark and other big data analysis tools all have one thing in common: they need some form of big data storage to hold the vast quantities of data that they crunch through. The good news is that big data storage options are proliferating.

  • Premiered: 08/09/16
  • Author: Taneja Group
  • Published: InfoStor
Topic(s): TBA Hadoop TBA Spark TBA Big Data TBA big data storage TBA DAS TBA Compute TBA cluster TBA flexibility TBA Mike Matchett TBA Hadoop Distributed File System TBA HDFS TBA NFS TBA MapReduce TBA API TBA SAN TBA NAS TBA TCO TBA DDN TBA EMC TBA EMC Isilon TBA Isilon TBA SDS TBA software-defined TBA software-defined storage TBA ViPR TBA DriveScale TBA hScaler TBA Cisco TBA HPE TBA IBM
news

When data storage infrastructure really has a brain

Big data analysis and the internet of things are helping produce more intelligent storage infrastructure.

  • Premiered: 09/06/16
  • Author: Mike Matchett
  • Published: TechTarget: Search Storage
Topic(s): TBA Big Data TBA big data analytics TBA Internet of Things TBA IoT TBA storage infrastructure TBA Storage TBA Intelligent Storage TBA CPU TBA software-defined TBA software-defined storage TBA SDS TBA HPE TBA StoreVirtual TBA hyper-converged TBA hyper-converged architectures TBA HyperGrid TBA Nutanix TBA Pivot3 TBA SimpliVity TBA Optimization TBA Datrium TBA Provisioning TBA Artificial Intelligence TBA Cloud TBA elastic cloud TBA data processing TBA Python TBA Spark TBA API TBA REST API
news

Cask Releases Preview of First Unified Integration Platform for Big Data

Cask (cask.co), the company that makes building and running big data solutions easy, today announced a public preview release of CDAP 4, the first unified integration platform for big data.

  • Premiered: 09/19/16
  • Author: Taneja Group
  • Published: Yahoo! Finance
Topic(s): TBA Cask TBA Big Data TBA Spark TBA Hadoop TBA HBase TBA Oracle TBA Netezza TBA AWS TBA Amazon AWS TBA Amazon Web Services TBA Kinesis TBA API TBA Mike Matchett
Profiles/Reports

Apache Spark Market Survey: Cloudera Sponsored Research

Apache Spark has quickly grown into one of the major big data ecosystem projects and shows no signs of slowing down. In fact, even though Spark is well connected within the broader Hadoop ecosystem, Spark adoption by itself has enough energy and momentum that it may very well become the center of its own emerging market category. In order to better understand Spark’s growing role in big data, Taneja Group conducted a major Spark market research project. We surveyed nearly seven thousand (6900+) qualified technical and managerial people working with big data from around the world to explore their experiences with and intentions for Spark adoption and deployment, their current perceptions of the Spark marketplace and of the future of Spark itself.

We found that across the broad range of industries, company sizes, and big data maturities represented in the survey, over one-half (54%) of respondents are already actively using Spark. Spark is proving invaluable as 64% of those currently using Spark plan to notably increase their usage within the next 12 months. And new Spark user adoption is clearly growing – 4 out of 10 of those who are already familiar with Spark but not yet using it plan to deploy Spark soon.

The top reported use cases globally for Spark include the expected Data Processing/Engineering/ETL (55%), followed by forward-looking data science applications like Real-Time Stream Processing (44%), Exploratory Data Science (33%), and Machine Learning (33%). The more traditional analytics applications like Customer Intelligence (31%) and BI/DW (29%) were close behind, and illustrate that Spark is capable of supporting many different kinds of organizational big data needs. The main reasons and drivers reported for adopting Spark over other solutions start with Performance (mentioned by 74%), followed by capabilities for Advanced Analytics (49%), Stream Processing (42%) and Ease of Programming (37%).

When it comes to choosing a source for Spark, more than 6 out of 10 Spark users in the survey have considered or evaluated Cloudera, nearly double the 35% that may have looked at the Apache Download or the 33% that considered Hortonworks. Interestingly, almost all (90+%) of those looking at Cloudera Spark adopted it for their most important use case, equating to 57% of those who evaluated Cloudera overall. Organizations cited quality of support (46%) as their most important selection factor, followed by demonstrated commitment to open source (29%), enterprise licensing costs (27%) and the availability of cloud support (also 27%).

Interestingly, while on-premise Spark deployments dominate today (more than 50%), there is a strong interest in transitioning many of those to cloud deployments going forward. Overall Spark deployment in public/private cloud (IaaS or PaaS) is projected to increase significantly from 23% today to 36%, along with a corresponding increase in using Spark SaaS, from 3% to 9%.

The biggest challenge with Spark, similar to what has been previously noted across the broader big data solutions space, is still reported by 6 out of 10 active users to be the big data skills/training gap within their organizations. Similarly, more than one-third mention complexity in learning/integrating Spark as a barrier to adoption. Despite these reservations, we note that compared to many previous big data analytics platforms, Spark today offers a higher—and often already familiar—level of interaction to users through its support of Python, R, SQL, notebooks, and seamless desktop-to-cluster operations, all of which no doubt contribute to its greatly increasing popularity and widespread adoption.

Overall, it’s clear that Spark has gained broad familiarity within the big data world and built significant momentum around adoption and deployment. The data highlights widespread current user success with Spark, validation of its reliability and usefulness to those who are considering adoption, and a growing set of use cases to which Spark can be successfully applied. Other big data solutions can offer some similar and overlapping capabilities (there is always something new just around the corner), but we believe that Spark, having already captured significant mindshare and proven real-world value, will continue to successfully expand on its own vortex of focus and energy for at least the next few years.

Publish date: 11/07/16