Is Hadoop the New Data Center Platform for All Data?

This morning we were able to attend EMC Greenplum's launch of their new Hadoop distro called Pivotal HD. Core to this distro is HAWQ, their new massively parallel processing analytical database built with Hadoop at its heart. I'm not sure I can cover all the implications of this evolution in this short post, but consider that horizontal multi-PB scale-out, business class interactive performance, and high-end easily leveraged analytics are now available in one package from a trusted enterprise vendor.

This is fully SQL compliant analytical database stuff, integrated with and powered by a tailored Hadoop implementation. Not only does this enable end business users to blast away with thier favorite existing tools and expertise, but the performance they get from this distro, as demonstrated today, is blisteringly fast over billions of records. Part of the sauce here is something they call Dynamic Pipelining (tm), which we will no doubt dive into as more details become available, but appears to be a data pipelining way to support parallel SQL queries with a more fabric-like intercommunication layer.  And when I say parallel queries, I should clarify that these are not limited to embarassingly parallel (i.e. simple) applications or queries.  HAWQ was shown to handle large multi-way joins, windowing functions, and other SQL analytical type queries with stellar performance.

This does bring up some philosophical questions about structured data v.s unstructured or semi-structured data, and the applications and infrastructures one can leverage to extract value from it all. This distro can handle both, and we expect to see some new application frontiers opened up. This kind of solution can be a game-changing technology as the normal IT enterprise comes to see Hadoop and Hadoop like solutions as the way all data could be processed in tomorrow's data center. Adoption of that vision may take awhile, but we think this HAWQ will take off immediately.

  • Premiered: 02/25/13
  • Author: Mike Matchett
Topic(s): EMC Pivotal HD Hadoop


