Join Newsletter
Trusted Business Advisors, Expert Technology Analysts

Taneja Blog

Taneja Blog / Big Data / Data Center Systems

Kudu Might Be Invasive: Cloudera Breaks Out Of HDFS

For the IT crowd just now getting to used to the idea of big data's HDFS (Hadoop's Distributed File System) and it's peculiarities, there is another alternative open source big data file system coming from Cloudera called Kudu. Like HDFS, Kudu is designed to be hosted across a scale-out cluster of commodity systems, but specifically intended to support more low-latency analytics.

At it's heart, Kudu sits between the capabilities of HDFS and HBase to meet the growing use of interactive drill-down analytics (e.g. Impala) and the faster time-to-response Spark platform.  It's a combination of on disk column store technology (for low latency queries) fronted by an in-memory write layer (for low latency updates/inserts), and fully distributed across the cluster.

Kudu will probably take some time to mature, but the need for it was heralded by the rapid adoption of Spark (often at the expense of investment in core Hadoop MapReduce).  I suspect machine learning use cases are driving that trend at heart - who doesn't want to be more intelligent, even if only artificially? 

Kudu is not quite intended for OLTP as it's not going to be optimized (at least not anytime soon) for single row insert or multi-row commit speed.  In that respect, MapR's proprietary POSIX compliant read/write storage layer may still hold an advantage for combining and supporting a wider variety of workloads.  However you can bet the Teradata's and MapR's of the world are looking closely at this threatening open source erosion of their proprietary value propositions.

If I were a betting man, now that Cloudera has opened this door to one-upping even the core HDFS, I'd be looking for the "next" open source file system for big data.  I'd expect this to be some containerized (thereby software defined) drop-in that provides the OLTP, NFS, and other enterprise features to make/convert those big data lakes into something for more than a quiet lake-like repository.  Converting the deep and mostly still Loch Ness into something more like an Escher-like endless Niagara Falls.

  • Premiered: 01/11/16
  • Author: Mike Matchett
Topic(s): Big Data Cloudera Storage Kudu MapR Teradata


There are no comments to display. Scroll down to leave your own!


Leave a Comment

You must be logged in to comment. Click here to log in or register if you don't have an account.