Items Tagged: Spark
IoT Goes Real-Time, Gets Predictive - Glassbeam Launches Spark-based Machine Learning
In-Memory processing was all the rage at Strata 2014 NY last month, and the hottest word was Spark! Spark is big data scale-out cluster solution that provides a way to speedily analyze large data sets in-memory using a "resilient distributed data" design for fault-tolerance. It can deploy into its own optimized cluster, or ride on top of Hadoop 2.0 using YARN... I haven't done justice to Spark itself and perhaps its biggest onrushing use case - taming the real-time data from from the Internet of Things (IoT)...
Navigate data lakes to manage big data
While the data lake concept appeals to business today, IT administrators must exercise caution prior to a full-scale implementation.
- Premiered: 06/05/15
- Author: Mike Matchett
- Published: TechTarget: Search Storage
Big data analytics applications impact storage systems
Analytics applications for big data have placed extensive demands on storage systems, which Mike Matchett says often requires new or modified storage structures.
- Premiered: 09/03/15
- Author: Mike Matchett
- Published: TechTarget: Search Storage
Visualizing (and Optimizing) Cluster Performance
Clusters are the scale-out way to go in today's data center. Why not try to architect an infrastructure that can grow linearly in capacity and/or performance? Well, one problem is that operations can get quite complex especially when you start mixing workloads and tenants on the same cluster. In vanilla big data solutions everyone can compete, and not always fairly, for the same resources. This is a growing problem in production environments where big data apps are starting to underpin key business-impacting processes. Pepperdata was formed to help deliver consistent big data application performance. They lay in their solution into Big Data clusters (i.e. YARN/Hadoop, Spark...) and then can dynamically optimize runtime operations - tuning and tweaking at sub-second intervals to help guarantee required QoS to key workloads....
- Premiered: 09/29/15
- Author: Mike Matchett
Big Data Grows Up: APM Tools Emerging
The next emerging market for big data may be application performance monitoring. Concurrent has released a new tool in this space, and more tools are expected to hit the market in the next six months. The availability of such tools will drive more enterprise adoption of big data.
- Premiered: 10/26/15
- Author: Taneja Group
- Published: Information Week
Multiplying the Value of All Existing IT Solutions
Decades of constantly advancing computing solutions have changed the world in tremendous ways, but interestingly, the IT folks running the show have long been stuck with only piecemeal solutions for managing and optimizing all that blazing computing power. Sometimes it seems like IT is a pit crew servicing a modern racing car with nothing but axes and hammers – highly skilled but hampered by their legacy tools.
While that may be a slight exaggeration, there is a serious lack of interoperability or opportunity to create joint insight between the highly varied perspectives that individual IT tools produce (even if each is useful in its own purpose). There simply has never been a widely adopted standard for creating, storing or sharing system management data, much less a cross-vendor way to holistically merge heterogeneously collected or produced management data together – even for the beneficial use of harried and often frustrated IT owners that might own dozens or more differently sourced system management solutions. That is until now.
OpsDataStore has brought the IT management game to a new level with an easy to deploy, centralized, intelligent – and big data enabled – management data “service”. It readily sucks in all the lowest level, fastest streaming management data from a plethora of tools (several ready to go at GA, but easily extended to any data source), automatically and intelligently relates data from disparate sources into a single unified “agile” model, directly provides fundamental visualization and analysis, and then can serve that unified and related data back out to enlightened and newly comprehensive downstream management workflows. OpsDataStore drops in and serves as the new systems management “nexus” between formerly disparate vendor and domain management solutions.
If you have ever been in IT, you’ve no doubt written scripts, fiddled with logfiles, created massive spreadsheets, or otherwise attempted to stitch together some larger coherent picture by marrying and merging data from two (or 18) different management data sources. The more sources you might have, the more the problem (or opportunity) grows non-linearly. OpsDataStore promises to completely fill in this gap, enabling IT to automatically multiply the value of their existing management solutions.
Can your cluster management tools pass muster?
The right designs and cluster management tools ensure your clusters don't become a cluster, er, failure.
- Premiered: 11/17/15
- Author: Mike Matchett
- Published: TechTarget: Search Data Center
Now Big Data Works for Every Enterprise: Pepperdata Adds Missing Performance QoS to Hadoop
While a few well-publicized web 2.0 companies are taking great advantage of foundational big data solution that they have themselves created (e.g. Hadoop), most traditional enterprise IT shops are still thinking about how to practically deploy their first business-impacting big data applications – or have dived in and are now struggling mightily to effectively manage a large Hadoop cluster in the middle of their production data center. This has led to the common perception that realistic big data business value may yet be just out of reach for most organizations – especially those that need to run lean and mean on both staffing and resources.
This new big data ecosystem consists of scale-out platforms, cutting-edge open source solutions, and massive storage that is inherently difficult for traditional IT shops to optimally manage in production – especially with still evolving ecosystem management capabilities. In addition, most organizations need to run large clusters supporting multiple users and applications to control both capital and operational costs. Yet there are no native ways to guarantee, control, or even gain visibility into workload-level performance within Hadoop. Even if there wasn’t a real high-end skills and deep expertise gap for most, there still isn’t any practical way that additional experts could tweak and tune mixed Hadoop workload environments to meet production performance SLA’s.
At the same time, the competitive game of mining of value from big data has moved from day-long batch ELT/ETL jobs feeding downstream BI systems, to more user interactive queries and business process “real time” applications. Live performance matters as much now in big data as it does in any other data center solution. Ensuring multi-tenant workload performance within Hadoop is why Pepperdata, a cluster performance optimization solution, is critical to the success of enterprise big data initiatives.
In this report we’ll look deeper into today’s Hadoop deployment challenges and learn how performance optimization capabilities are not only necessary for big data success in enterprise production environments, but can open up new opportunities to mine additional business value. We’ll look at Pepperdata’s unique performance solution that enables successful Hadoop adoption for the common enterprise. We’ll also examine how it inherently provides deep visibility and reporting into who is doing what/when for troubleshooting, chargeback and other management needs. Because Pepperdata’s function is essential and unique, not to mention its compelling net value, it should be a checklist item in any data center Hadoop implementation.
To read this full report please click here.
Concurrent app management tools work on Hadoop and Spark
If Hadoop and Spark are to sneak into the enterprise, they will need to be manageable. With Driven, Concurrent Inc. takes a stab at the problem.
- Premiered: 12/09/15
- Author: Taneja Group
- Published: TechTarget: Search Data Management
Mobile gaming company plays new Hadoop cluster management card
Chartboost, which operates a platform for mobile games, turned to new cluster management software in an effort to overcome problems in controlling the use of its Hadoop processing resources.
- Premiered: 01/05/16
- Author: Taneja Group
- Published: TechTarget: Search Data Management
Making Sense of the Internet of Things with Converged Infrastructure
With its flexibility and scalability, converged infrastructure can be a good solution to the influx of IoT data.
- Premiered: 03/22/16
- Author: Taneja Group
- Published: Windows IT Pro
Galactic Exchange Launches Into Big Data Space With 5 Minute Set-Up Spark/Hadoop Powered Clusters
Galactic Exchange, Inc. officially came out of stealth mode this week to announce initial beta availability of ClusterGX™, an open source clustering solution which provides unprecedented simplicity of deployment and management of Spark/Hadoop clusters.
- Premiered: 03/25/16
- Author: Taneja Group
- Published: Inside Big Data
Galactic Exchange can get your Hadoop cluster up and running in just 5 minutes
Stealthy startup Galactic Exchange Inc. burst out of the shadows this weekend touting a new product that’s able to spin up an Hadoop or Spark cluster, ready to go, in just five minutes.
- Premiered: 03/28/16
- Author: Taneja Group
- Published: Silicon Angle
Google enterprise cloud challenge unlikely to be solved soon
The Internet giant predicts a tipping point for adoption of its public cloud offering, despite lingering questions about the size of its enterprise customer base and maturity of the platform.
- Premiered: 02/11/16
- Author: Taneja Group
- Published: TechTarget: Search Cloud Computing
Galactic Exchange Delivers ClusterGX™ Full-Service, for On-Premise Managed Big Data
Galactic Exchange, Inc. today announced the availability of its Docker container powered clustering technology (ClusterGX™) as an on-premise managed service solution (ClusterGX™ Full-Service) for customers deploying Hadoop/Spark Big Data applications.
- Premiered: 05/31/16
- Author: Taneja Group
- Published: MarketWired
Spark speeds up adoption of big data clusters and clouds
Infrastructure that supports big data comes from both the cloud and clusters. Enterprises can mix and match these seven infrastructure choices to meet their needs.
- Premiered: 07/19/16
- Author: Mike Matchett
- Published: TechTarget: Search IT Operations
Big Data Storage Solutions: Options Abound
Hadoop, Spark and other big data analysis tools all have one thing in common: they need some form of big data storage to hold the vast quantities of data that they crunch through. The good news is that big data storage options are proliferating.
- Premiered: 08/09/16
- Author: Taneja Group
- Published: InfoStor
When data storage infrastructure really has a brain
Big data analysis and the internet of things are helping produce more intelligent storage infrastructure.
- Premiered: 09/06/16
- Author: Mike Matchett
- Published: TechTarget: Search Storage
Cask Releases Preview of First Unified Integration Platform for Big Data
Cask (cask.co), the company that makes building and running big data solutions easy, today announced a public preview release of CDAP 4, the first unified integration platform for big data.
- Premiered: 09/19/16
- Author: Taneja Group
- Published: Yahoo! Finance
Apache Spark Market Survey: Cloudera Sponsored Research
Apache Spark has quickly grown into one of the major big data ecosystem projects and shows no signs of slowing down. In fact, even though Spark is well connected within the broader Hadoop ecosystem, Spark adoption by itself has enough energy and momentum that it may very well become the center of its own emerging market category. In order to better understand Spark’s growing role in big data, Taneja Group conducted a major Spark market research project. We surveyed nearly seven thousand (6900+) qualified technical and managerial people working with big data from around the world to explore their experiences with and intentions for Spark adoption and deployment, their current perceptions of the Spark marketplace and of the future of Spark itself.
We found that across the broad range of industries, company sizes, and big data maturities represented in the survey, over one-half (54%) of respondents are already actively using Spark. Spark is proving invaluable as 64% of those currently using Spark plan to notably increase their usage within the next 12 months. And new Spark user adoption is clearly growing – 4 out of 10 of those who are already familiar with Spark but not yet using it plan to deploy Spark soon.
The top reported use cases globally for Spark include the expected Data Processing/Engineering/ETL (55%), followed by forward-looking data science applications like Real-Time Stream Processing (44%), Exploratory Data Science (33%), and Machine Learning (33%). The more traditional analytics applications like Customer Intelligence (31%) and BI/DW (29%) were close behind, and illustrate that Spark is capable of supporting many different kinds of organizational big data needs. The main reasons and drivers reported for adopting Spark over other solutions start with Performance (mentioned by 74%), followed by capabilities for Advanced Analytics (49%), Stream Processing (42%) and Ease of Programming (37%).
When it comes to choosing a source for Spark, more than 6 out of 10 Spark users in the survey have considered or evaluated Cloudera, nearly double the 35% that may have looked at the Apache Download or the 33% that considered Hortonworks. Interestingly, almost all (90+%) of those looking at Cloudera Spark adopted it for their most important use case, equating to 57% of those who evaluated Cloudera overall. Organizations cited quality of support (46%) as their most important selection factor, followed by demonstrated commitment to open source (29%), enterprise licensing costs (27%) and the availability of cloud support (also 27%).
Interestingly, while on-premise Spark deployments dominate today (more than 50%), there is a strong interest in transitioning many of those to cloud deployments going forward. Overall Spark deployment in public/private cloud (IaaS or PaaS) is projected to increase significantly from 23% today to 36%, along with a corresponding increase in using Spark SaaS, from 3% to 9%.
The biggest challenge with Spark, similar to what has been previously noted across the broader big data solutions space, is still reported by 6 out of 10 active users to be the big data skills/training gap within their organizations. Similarly, more than one-third mention complexity in learning/integrating Spark as a barrier to adoption. Despite these reservations, we note that compared to many previous big data analytics platforms, Spark today offers a higher—and often already familiar—level of interaction to users through its support of Python, R, SQL, notebooks, and seamless desktop-to-cluster operations, all of which no doubt contribute to its greatly increasing popularity and widespread adoption.
Overall, it’s clear that Spark has gained broad familiarity within the big data world and built significant momentum around adoption and deployment. The data highlights widespread current user success with Spark, validation of its reliability and usefulness to those who are considering adoption, and a growing set of use cases to which Spark can be successfully applied. Other big data solutions can offer some similar and overlapping capabilities (there is always something new just around the corner), but we believe that Spark, having already captured significant mindshare and proven real-world value, will continue to successfully expand on its own vortex of focus and energy for at least the next few years.