Taneja Group | Apache
Join Newsletter
Forgot
password?
Register
Trusted Business Advisors, Expert Technology Analysts

Items Tagged: Apache

news / Blog

Enterprise IT Will Dive Into Big Data Solutions in 2013

If you are in IT, 2013 is going to be the year that you will want to dive into the "big data" pool if you haven't been pushed in already. But don't worry - it's no longer sink or swim. For one, we'll be here to help coach IT folks through it all. And while the concepts, terminology and hype have been all over the place, once you start floating around you'll find that under the surface much of what fills the big data pool is familiar IT infrastructure, data management, and services re-cast around a few easy-to-grasp innovations. For example, if you are in IT and asked to pick a Hadoop distro to stand up, you'd probably start with evaluating the three main distributions of Hadoop (other than getting it straight off Apache) followed by other downstream OEM'd and pre-integrated versions. The main distros are from Cloudera, Hortonworks, and MapR. I didn't really appreciate the differences until talking with all three individually (at 2012 NY Strata, see below).

  • Premiered: 01/15/13
  • Author: Mike Matchett
Topic(s): Big Data Hadoop Cloudera MapR Hortonworks Dell EMC strata Apache
Profiles/Reports

Extreme Applications in the Enterprise Drive Parallel File System Adoption

With the advent of big data and cloud-scale delivery, companies are racing to deploy cutting-edge services that include “extreme” applications like massive voice and image processing or complex fi-nancial analysis modeling that can push storage systems to their limits. Examples of some high visi-bility and big market impacting solutions include applications based on image pattern recognition at large scale and financial risk management based on decision-making at high speed.

 
These ground-breaking solutions, made up of very different activities but with similar data storage challenges, create incredible new lines of business representing significant revenue potential. Every day here at Taneja Group we see more and more mainstream enterprises exploring similar “extreme service” opportunities. But when enterprise IT data centers take stock of what it is required to host and deliver these new services, it quickly becomes apparent that traditional clustered and even scale-out file systems - of the kind that most enterprise data centers (or cloud providers) have racks and racks of - simply can’t handle the performance requirements.

 
There are already great enterprise storage solutions for applications that need either raw throughput, high capacity, parallel access, low latency, or high availability – maybe even for two or three of those at a time. But when an “extreme” application needs all of those requirements at the same time, only supercomputing type storage in the form of parallel file systems provides a functional solution. The problem is that most commercial enterprises simply can’t afford or risk basing a line of business on an expensive research project.


The good news is that some storage vendors have been industrializing former supercomputing storage technologies, hardening massively parallel file systems into commercially viable solutions. This opens the door for revolutionary services creation, enabling mainstream enterprise datacenters to support the exploitation of new extreme applications.  
 

Publish date: 05/03/13
news

Virtualizing Hadoop Impacts Big Data Storage

Hadoop is soon coming to enterprise IT in a big way. VMware’s new vSphere Big Data Extensions (BDE) commercializes its open source Project Serengeti to make it dead easy for enterprise admins to spin and up down virtual Hadoop clusters at will.

  • Premiered: 07/17/13
  • Author: Mike Matchett
  • Published: Enterprise Storage Forum
Topic(s): TBA Virtualization TBA VMWare TBA Hadoop TBA Project Serengeti TBA vSphere TBA Big Data TBA NAS TBA SAN TBA DAS TBA HDFS TBA HVE TBA Apache TBA scale-out TBA Hypervisor TBA EMC World 2013 TBA EMC World TBA virtualizing Hadoop TBA Project Savannah TBA OpenStack TBA KVM
news

Myths Surrounding Big Data Technology

Big data technology is a big deal for storage shops, and a clear understanding of what it means -- and doesn't mean -- is required to successfully configure storage for big data apps.

  • Premiered: 08/08/13
  • Author: Mike Matchett
  • Published: Tech Target: Search Storage
Topic(s): TBA Big Data TBA Storage TBA Cloudera TBA Apache TBA Hadoop TBA HDFS TBA MapR TBA NFS TBA CIFS TBA EMC TBA Isilon TBA DDN TBA DataDirect Networks TBA hScaler TBA Hortonworks
news

Hadoop Coming to Enterprise IT in Big Way – Taneja Group

Hadoop is coming to enterprise IT in a big way. The competitive advantage that can be gained from analyzing big data is just too 'big' to ignore. And the amount of data available to crunch is only growing bigger, whether from new sensors, capture of people, systems and process 'data exhaust', or just longer retention of available raw or low-level details.

  • Premiered: 10/16/13
  • Author: Taneja Group
  • Published: Storage Newsletter
Topic(s): TBA Hadoop TBA Big Data TBA Storage TBA Apache TBA Virtualization TBA Dell TBA HP TBA Project Serengeti TBA VMWare TBA Mirantis TBA RedHat TBA Project Savanna TBA DDN TBA NetApp TBA Teradata TBA Oracle TBA EMC TBA Isilon
news

What does the next big thing in technology mean for the data center?

There are plenty of technologies touted as the next big thing. Big data, flash, high-performance computing, in-memory processing, NoSQL, virtualization, convergence, software-defined whatever all represent wild new forces that could bring real disruption but big opportunities to your local data center.

  • Premiered: 03/19/14
  • Author: Mike Matchett
  • Published: Tech Target: Search Data Center
Topic(s): TBA data TBA Data Center TBA Big Data TBA Storage TBA Flash TBA SSD TBA HPC TBA High Performance Computing TBA NoSQL TBA Virtualization TBA convergence TBA software-defined TBA Hadoop TBA scale-out TBA Apache TBA analytics TBA scalability TBA Converged Infrastructure TBA hyper convergence TBA Platform as a Service TBA PaaS TBA Hypervisor TBA Hybrid
news

Data World Needs a Mature In-Memory Data Fabric

Much of what human beings experience as commonplace today - social networking, online gaming, mobile and wearable computing -- was impossible a decade ago. One thing is certain: we're going to see even more impressive advances in the next few years.

  • Premiered: 11/12/14
  • Author: Taneja Group
  • Published: Sys-Con Media
Topic(s): TBA GridGain TBA RAM TBA In-Memory TBA Big Data TBA Saas TBA mobile computing TBA Apache
news

Apache Ignite v1.0 Release Candidate By GridGain

Today, GridGain announces the first code drop of Apache Ignite, Apache Ignite v1.0 RC (Release Candidate).

  • Premiered: 02/18/15
  • Author: Taneja Group
  • Published: Sys-con
Topic(s): TBA GridGain TBA Apache TBA Apache Ignite TBA In-Memory TBA Storage TBA High Performance
news

New approaches to scalable storage

With all these scalable storage approaches, IT organizations must evaluate the options against their data storage and analytics needs, as well as future architectures.

  • Premiered: 03/16/15
  • Author: Mike Matchett
  • Published: TechTarget: Search Data Center
Topic(s): TBA Mike Matchett TBA TechTarget TBA Storage TBA scalable TBA scalability TBA analytics TBA Data Storage TBA Big Data TBA Block Storage TBA File Storage TBA object storage TBA scale-out TBA scale-up TBA Performance TBA Capacity TBA HA TBA high availability TBA latency TBA IOPS TBA Flash TBA SSD TBA File System TBA Security TBA NetApp TBA Data ONTAP TBA ONTAP TBA EMC TBA Isilon TBA OneFS TBA Cloud
news

IT pros get a handle on machine learning and big data

Despite its benefits, machine learning can also go very wrong. Beginners need to understand their input data, project scope and purpose, and the machine learning algorithms at work.

  • Premiered: 07/15/15
  • Author: Mike Matchett
  • Published: TechTarget: Search Data Center
Topic(s): TBA IT TBA Mike Matchett TBA Big Data TBA Machine Learning TBA predictive modeling TBA Optimization TBA scale-out TBA Apache TBA Apache Mahout TBA Apache Madlib TBA High Performance TBA High Performance Computing TBA HPC TBA storage architecture TBA Storage TBA In-Memory TBA Microsoft TBA Microsoft Azure TBA AI TBA Artificial Intelligence
news

Big Data Grows Up: APM Tools Emerging

The next emerging market for big data may be application performance monitoring. Concurrent has released a new tool in this space, and more tools are expected to hit the market in the next six months. The availability of such tools will drive more enterprise adoption of big data.

  • Premiered: 10/26/15
  • Author: Taneja Group
  • Published: Information Week
Topic(s): TBA apm TBA Big Data TBA Mike Matchett TBA Concurrent TBA Hadoop TBA High Performance TBA application performance TBA DevOps TBA Apache TBA Apache Hive TBA MapReduce TBA Cascading TBA Scalding TBA Spark TBA Performance TBA Data Center
news

Making Sense of the Internet of Things with Converged Infrastructure

With its flexibility and scalability, converged infrastructure can be a good solution to the influx of IoT data.

  • Premiered: 03/22/16
  • Author: Taneja Group
  • Published: Windows IT Pro
Topic(s): TBA Internet of Things TBA IoT TBA converged TBA Converged Infrastructure TBA convergence TBA IT infrastructure TBA Servers TBA Storage TBA network TBA flexibility TBA scalability TBA Data protection TBA storage architecture TBA Hadoop TBA Apache TBA Spark TBA Apache Spark TBA structured data TBA Mike Matchett
news

Galactic Exchange can get your Hadoop cluster up and running in just 5 minutes

Stealthy startup Galactic Exchange Inc. burst out of the shadows this weekend touting a new product that’s able to spin up an Hadoop or Spark cluster, ready to go, in just five minutes.

  • Premiered: 03/28/16
  • Author: Taneja Group
  • Published: Silicon Angle
Topic(s): TBA Galactic Exchange TBA Hadoop TBA hadoop cluster TBA cluster TBA Spark TBA ClusterGX TBA strata TBA simplicity TBA Infrastructure TBA Big Data TBA Mike Matchett TBA Apache TBA Apache Mesos TBA Docker TBA hyperconverged TBA hyperconvergence TBA application performance TBA Backup TBA flexibility TBA Cloud TBA Storage TBA analysis TBA data lake TBA big data lake TBA IoT
news

Kinetica Unveils GPU-accelerated Database for Analyzing Streaming Data with Enhanced Performance

Kinetica today announced the newest release of its distributed, in-memory database accelerated by GPUs that simultaneously ingests, explores, and visualizes streaming data.

  • Premiered: 09/21/16
  • Author: Taneja Group
  • Published: Business Wire
Topic(s): TBA high availability TBA Mike Matchett TBA Kinetica TBA In-Memory TBA Security TBA IoT TBA Internet of Things TBA Data Management TBA OLTP TBA CPU TBA GPU TBA NVIDIA TBA Data Center TBA scalability TBA Apache TBA Hadoop TBA Apache Hadoop TBA Apache Kafka TBA Apache Spark TBA Apache NiFi TBA High Performance TBA cluster TBA Big Data TBA scale-out
news

Hedvig storage update offers multicloud capabilities

Hedvig outlined its vision for a Universal Data Plane spanning public and private clouds, as it announced an updated version of its software-defined storage.

  • Premiered: 09/22/16
  • Author: Taneja Group
  • Published: TechTarget: Search Cloud Storage
Topic(s): TBA Hedvig TBA Cloud TBA Jeff Kato TBA Public Cloud TBA Private Cloud TBA software-defined TBA software-defined storage TBA SDS TBA Software Defined Storage TBA software defined TBA Universal Data Plane TBA iSCSI TBA NFS TBA Amazon TBA Amazon S3 TBA OpenStack TBA OpenStack Swift TBA Deduplication TBA Data Deduplication TBA Compression TBA Snapshots TBA Snapshot TBA tiering TBA Caching TBA VMWare TBA VMware vSphere TBA vSphere TBA Docker TBA Mirantis TBA scale-out
news

Tintri OS storage upgrade focuses on cloud, containers for DevOps

Tintri storage moves 'in lockstep' with VMware for cloud, container and DevOps support with a vRealize Orchestrator plug-in and vSphere Integrated Containers support.

  • Premiered: 11/01/16
  • Author: Taneja Group
  • Published: TechTarget: Search Storage
Topic(s): TBA Tintri TBA DevOps TBA VMWare TBA Cloud TBA container TBA containers TBA vRealize TBA vSphere TBA VMware vSphere TBA Docker TBA IBM TBA cloud object storage TBA object storage TBA Storage TBA Public Cloud TBA S3 TBA Backup TBA copy data management TBA copy data TBA CDM TBA VM TBA Virtual Machine TBA Flocker TBA Google Kubernetes TBA Docker Swarm TBA Apache TBA Apache Mesos TBA Snapshot TBA Snapshots TBA Mike Matchett
Profiles/Reports

Apache Spark Market Survey: Cloudera Sponsored Research

Apache Spark has quickly grown into one of the major big data ecosystem projects and shows no signs of slowing down. In fact, even though Spark is well connected within the broader Hadoop ecosystem, Spark adoption by itself has enough energy and momentum that it may very well become the center of its own emerging market category. In order to better understand Spark’s growing role in big data, Taneja Group conducted a major Spark market research project. We surveyed nearly seven thousand (6900+) qualified technical and managerial people working with big data from around the world to explore their experiences with and intentions for Spark adoption and deployment, their current perceptions of the Spark marketplace and of the future of Spark itself.

We found that across the broad range of industries, company sizes, and big data maturities represented in the survey, over one-half (54%) of respondents are already actively using Spark. Spark is proving invaluable as 64% of those currently using Spark plan to notably increase their usage within the next 12 months. And new Spark user adoption is clearly growing – 4 out of 10 of those who are already familiar with Spark but not yet using it plan to deploy Spark soon.

The top reported use cases globally for Spark include the expected Data Processing/Engineering/ETL (55%), followed by forward-looking data science applications like Real-Time Stream Processing (44%), Exploratory Data Science (33%), and Machine Learning (33%). The more traditional analytics applications like Customer Intelligence (31%) and BI/DW (29%) were close behind, and illustrate that Spark is capable of supporting many different kinds of organizational big data needs. The main reasons and drivers reported for adopting Spark over other solutions start with Performance (mentioned by 74%), followed by capabilities for Advanced Analytics (49%), Stream Processing (42%) and Ease of Programming (37%).

When it comes to choosing a source for Spark, more than 6 out of 10 Spark users in the survey have considered or evaluated Cloudera, nearly double the 35% that may have looked at the Apache Download or the 33% that considered Hortonworks. Interestingly, almost all (90+%) of those looking at Cloudera Spark adopted it for their most important use case, equating to 57% of those who evaluated Cloudera overall. Organizations cited quality of support (46%) as their most important selection factor, followed by demonstrated commitment to open source (29%), enterprise licensing costs (27%) and the availability of cloud support (also 27%).

Interestingly, while on-premise Spark deployments dominate today (more than 50%), there is a strong interest in transitioning many of those to cloud deployments going forward. Overall Spark deployment in public/private cloud (IaaS or PaaS) is projected to increase significantly from 23% today to 36%, along with a corresponding increase in using Spark SaaS, from 3% to 9%.

The biggest challenge with Spark, similar to what has been previously noted across the broader big data solutions space, is still reported by 6 out of 10 active users to be the big data skills/training gap within their organizations. Similarly, more than one-third mention complexity in learning/integrating Spark as a barrier to adoption. Despite these reservations, we note that compared to many previous big data analytics platforms, Spark today offers a higher—and often already familiar—level of interaction to users through its support of Python, R, SQL, notebooks, and seamless desktop-to-cluster operations, all of which no doubt contribute to its greatly increasing popularity and widespread adoption.

Overall, it’s clear that Spark has gained broad familiarity within the big data world and built significant momentum around adoption and deployment. The data highlights widespread current user success with Spark, validation of its reliability and usefulness to those who are considering adoption, and a growing set of use cases to which Spark can be successfully applied. Other big data solutions can offer some similar and overlapping capabilities (there is always something new just around the corner), but we believe that Spark, having already captured significant mindshare and proven real-world value, will continue to successfully expand on its own vortex of focus and energy for at least the next few years.

Publish date: 11/07/16
news

Apache Spark Survey Reveals Increased Growth in Users

In order to better understand Apache Spark’s growing role in big data, Taneja Group conducted a major market research project, surveying approximately 7,000 people.

  • Premiered: 11/08/16
  • Author: Taneja Group
  • Published: Satellite Press Releases
Topic(s): TBA Apache TBA Apache Hadoop TBA Apache Spark TBA Hadoop TBA Storage TBA Big Data TBA Data Management TBA Cloudera TBA In-Memory TBA Mike Matchett
news

Machine learning and data science workloads ignite Apache Spark adoption

The use of Apache Spark is dramatically increasing as new workloads create more use cases.

  • Premiered: 11/08/16
  • Author: Taneja Group
  • Published: CBR Online
Topic(s): TBA Apache TBA Apache Spark TBA Spark TBA Machine Learning TBA Big Data TBA Storage TBA Cloudera TBA Mike Matchett TBA analytics TBA Hadoop TBA Cloud TBA Public Cloud TBA Private Cloud TBA IBM TBA MapReduce
news

Four big data and AI trends to keep an eye on

AI is making a comeback - and it's going to affect your data center soon.

  • Premiered: 11/17/16
  • Author: Mike Matchett
  • Published: TechTarget: Search IT Operations
Topic(s): TBA AI TBA Artificial Intelligence TBA Big Data TBA Data Center TBA Datacenter TBA Machine Learning TBA Apache TBA Apache Spark TBA Spark TBA Hadoop TBA MapReduce TBA latency TBA In-Memory TBA big data analytics TBA Business Intelligence TBA Python TBA Dataiku TBA Cask TBA ETL TBA data flow management TBA Virtualization TBA Storage TBA scale-up TBA scale-out TBA scalability TBA GPU TBA IBM TBA NVIDIA TBA Virtual Machine TBA VM