Items Tagged: Hadoop
According to Amazon, size doesn't really matter in their definition of Big Data. Instead, it's more about the threshold where distributed processing solutions like Elastic Map Reduce start providing cost-effective development and operations.
Cleversafe is announcing a platform integration with Hadoop as part of their upcoming Cleversafe 3.0 release scheduled for later this year.
Symantec Corp. today announced an Apache Hadoop add-on capability for its Veritas Cluster File System to help run "big data" analytics on storage area networks instead of scale-out, commodity servers using local storage.
Daegis Acumen, part of the Daegis eDiscovery Platform, offers a hosted solution that handles big data with Hadoop-based clustering and Probabilistic Latent Semantic indexing.
If you are in IT, 2013 is going to be the year that you will want to dive into the "big data" pool if you haven't been pushed in already. But don't worry - it's no longer sink or swim. For one, we'll be here to help coach IT folks through it all. And while the concepts, terminology and hype have been all over the place, once you start floating around you'll find that under the surface much of what fills the big data pool is familiar IT infrastructure, data management, and services re-cast around a few easy-to-grasp innovations. For example, if you are in IT and asked to pick a Hadoop distro to stand up, you'd probably start with evaluating the three main distributions of Hadoop (other than getting it straight off Apache) followed by other downstream OEM'd and pre-integrated versions. The main distros are from Cloudera, Hortonworks, and MapR. I didn't really appreciate the differences until talking with all three individually (at 2012 NY Strata, see below).
Here we are in Santa Clara eagerly awaiting Strata.... Hadoop's R+D infant years are passing, and now it is of the age where vendors are truly adding value for the enterprise IT shop. Clearly the theme is to wrap up low level complexities into higher value solutions. One standout announcement is DDN's hScaler appliance - a monster of a Hadoop machine.
This morning we were able to attend EMC Greenplum's launch their new Hadoop distro called Pivotal HD. Core to this distro is HAWQ, their new massively parallel processing analytical database built with Hadoop at its heart... consider that horizontal multi-PB scale-out, business class interactive performance, and high-end easily leveraged analytics are now available in one package from a trusted enterprise vendor. This is fully SQL compliant analytical database stuff...
IT pros called in on big data projects are finding that the typical approach doesn’t play nice on enterprise-grade virtualized infrastructure.
There are a lot of HPC technologies coming soon to a data center near you! The latest offering from ScaleOut Software, known for their in-memory data grid solutions, is a customized in-memory data grid for Hadoop. This enables a blistering fast big data style real-time analysis of dynamically changing data. Solutions that use this are processing live operational data into actionable intelligence - financials, reservation systems, live customer experience,...
Extreme Applications in the Enterprise Drive Parallel File System Adoption
With the advent of big data and cloud-scale delivery, companies are racing to deploy cutting-edge services that include “extreme” applications like massive voice and image processing or complex fi-nancial analysis modeling that can push storage systems to their limits. Examples of some high visi-bility and big market impacting solutions include applications based on image pattern recognition at large scale and financial risk management based on decision-making at high speed.
These ground-breaking solutions, made up of very different activities but with similar data storage challenges, create incredible new lines of business representing significant revenue potential. Every day here at Taneja Group we see more and more mainstream enterprises exploring similar “extreme service” opportunities. But when enterprise IT data centers take stock of what it is required to host and deliver these new services, it quickly becomes apparent that traditional clustered and even scale-out file systems - of the kind that most enterprise data centers (or cloud providers) have racks and racks of - simply can’t handle the performance requirements.
There are already great enterprise storage solutions for applications that need either raw throughput, high capacity, parallel access, low latency, or high availability – maybe even for two or three of those at a time. But when an “extreme” application needs all of those requirements at the same time, only supercomputing type storage in the form of parallel file systems provides a functional solution. The problem is that most commercial enterprises simply can’t afford or risk basing a line of business on an expensive research project.
The good news is that some storage vendors have been industrializing former supercomputing storage technologies, hardening massively parallel file systems into commercially viable solutions. This opens the door for revolutionary services creation, enabling mainstream enterprise datacenters to support the exploitation of new extreme applications.
VMware today announced advancements that will allow vSphere to manage Hadoop clusters.
This 30 minute webcast will address the following:
-Why virtualize Hadoop? What are the benefits to IT and the user?
-Who are the players? VMware, Project Savanna (RedHat), Amazon EMR
-How does virtualizing Hadoop work technically with its scale-out computing and distributed storage models?
-What's the impact on performance?
-How virtualized Hadoop becomes a foundation of the datacenter as a unified platform for all kinds of workloads.
IT can now offer Big-Data-as-a-service.
About the speaker: Mike Matchett brings over 20 years experience in managing and marketing IT datacenter solutions particularly at the nexus of performance, capacity and virtualization. Currently he is focused on IT optimization for virtualization and convergence across servers, storage and networks, especially to handle the requirements of mission-critical applications, Big Data analysis, and the next generation data center. Mike has a deep understanding of systems management, IT operations, and solutions marketing to help drive architecture, messaging, and positioning initiatives. For more info visit: http://tanejagroup.com/about/who-we-are
- Premiered: 07/30/13 at 1:30pm ET (10:30am PT)
- Location: Live and OnDemand
- Speaker(s): Mike Matchett
- Sponsor(s): BrightTALK, Taneja Group
Hadoop is soon coming to enterprise IT in a big way. VMware’s new vSphere Big Data Extensions (BDE) commercializes its open source Project Serengeti to make it dead easy for enterprise admins to spin and up down virtual Hadoop clusters at will.
- Premiered: 07/17/13
- Author: Mike Matchett
- Published: Enterprise Storage Forum
With 358 sessions, time is money. Here are five sessions where your time will be well spent.
Big data technology is a big deal for storage shops, and a clear understanding of what it means -- and doesn't mean -- is required to successfully configure storage for big data apps.
Market Landscape Abstract: Enterprise Hadoop Infrastructure for Big Data IT
Hadoop is coming to enterprise IT in a big way. The competitive advantage that can be gained from analyzing big data is just too “big” to ignore. And the amount of data available to crunch is only growing bigger, whether from new sensors, capture of people, systems and process “data exhaust”, or just longer retention of available raw or low-level details. It’s clear that enterprise IT practitioners everywhere are soon going to have to operate scale-out computing platforms in the production data center, and being the first, most mature solution on the scene, Hadoop is the likely target. The good news is that there is now a plethora of Hadoop infrastructure options to choose from to fit almost every practical big data need – the challenge now for IT is to implement the best solutions for their business client needs.
While Apache Hadoop as originally designed had a relatively narrow application for only certain kinds of batch-mode parallel algorithms applied over unstructured (or semi-structured depending on your definition) data, because of its widely available open source nature, commodity architecture approach, and ability to extract new kinds of value out of previously discarded or ignored data sets, the Hadoop ecosystem is rapidly evolving and expanding. With recent new capabilities like YARN that opens up the main execution platform to applications beyond batch MapReduce, the integration of structured data analysis, real-time streaming and query support, and the roll out of virtualized enterprise hosting options, Hadoop is quickly becoming a mainstream data processing platform.
There has been much talk that in order to derive top value from big data efforts, rare and potentially expensive data scientist types are needed to drive. On the other hand, there is an abundance of higher level analytical tools and pre-packaged applications emerging to support the existing business analyst and user with familiar tools and interfaces. While completely new companies have been founded on the exciting information and operational intelligence gained from exploiting big data, we expect wider adoption by existing organizations based on augmenting traditional lines of business with new insight and revenue enhancing opportunity. In addition, a Hadoop infrastructure serves as a great data capture and ETL base for extracting more structured data to feed downstream workflows, including traditional BI/DW solutions. No matter how you want to slice it, big data is becoming a common enterprise workload, and enterprise IT infrastructure folks will need to deploy, manage, and provide Hadoop services to their businesses.
In the run-up to Strata/Hadoop in NY coming up here at the end of October, we are hearing a lot about some exciting new ways to implement big data solutions. One of the most interesting is the recent release of ScaleOut's hServer V2, which evolves their high performance "in-memory data grid" (IMDG) to further support Hadoop workloads - gaining a reported 20x speedup on MapReduce jobs.
We've published a new market landscape on Enterprise Hadoop Infrastructure aimed at helping IT folks survey, evaluate and choose the right Hadoop distribution and supporting server and storage infrastructure...One of the big takeaways from this analysis is that Hadoop is coming in a big way to enterprise IT organizations, whether they are familiar with big data architectures or not... we aimed to address the first two big questions about supporting big data in IT: 1. Which Hadoop distribution makes the most sense? 2. What is the right infrastructure/deployment model given Hadoop is available in physical, cloud, and virtual forms, with appliance, converged, and external storage options?
Hadoop is coming to enterprise IT in a big way. The competitive advantage that can be gained from analyzing big data is just too 'big' to ignore. And the amount of data available to crunch is only growing bigger, whether from new sensors, capture of people, systems and process 'data exhaust', or just longer retention of available raw or low-level details.
- Premiered: 10/16/13
- Author: Taneja Group
- Published: Storage Newsletter
In this webcast, Sr. IT Analyst Mike Matchett from Taneja Group will briefly review the storage architecture of Hadoop and HDFS, and then examine some of the more prominent big data storage options for enterprises with data protection, integration, and governance concerns that might lead them to choose an advanced SAN/NAS solution over the default local DAS design.
- Premiered: 12/10/13 at 10 am PT/ 1 pm ET
- Location: OnDemand
- Speaker(s): Mike Matchett, Senior Analyst, Taneja Group