Join Newsletter
Trusted Business Advisors, Expert Technology Analysts

Taneja Blog

Taneja Blog / Big Data / Data Center Systems

Application Performance Management (APM) For Big Data

Concurrent, the folks behind Cascading, have today announced the beta of "Driven" - an Application Performance Management (APM) solution for Hadoop. APM has been sorely missing from the Hadoop ecosystem at a level in which developers, IT ops, and even end users can quickly get to the bottom of any issues.

First, if you don't know about Cascading - one of the big impediments to using Hadoop on a wider scale is the need for programmers to work Map Reduce algorithms at a low, detailed, and often mind-bending level. Cascading is a popular open source package that provides a higher level of abstraction on top of Hadoop. Instead of working with mappers and reducers, you can work with more commonly understood higher-level objects like sources and sinks, functions, filters and joins. There are others of course, like Pig, but Cascading is designed for super reliability in production at scale.

Yet when anything goes wrong in Hadoop at scale, it can be hard to figure out (to say the least!). Especially in production when there are service level agreements in play which start the clock ticking on resolution. Downtime or degradation for big data "batch" apps and even more for the newer big data "streaming" apps, can cost big bucks. So Concurrent saw an opportunity to further leverage their app platform with a plug-in to feed detailed instrumentation into a management service.

Driven is at first a free-for-development cloud service that deploys quickly into any Cascading implementation and will easily help track and gain immediate insight into "enterprise-grade" apps. Commercial production and on-prem versions should be available once it gets into GA. 

As a service, Driven monitors all the running apps and processes, and tracks successes and failures with all the expected alerts and notifications. Visualizations are interesting maps with a detailed "high fidelity" view into the app components, highlighted of course where failures occur, with direct drill down into exceptions and stack traces. This is just a first cut though, lots of advanced analysis could easily be layered on later.

According to Concurrent, the community collaboration will be a key value for folks too, with Driven being integrated into the Cascading community web site.

While "free-for-development" Driven seems a no-brainer for Cascading developers, and should help to drive Cascading adoption even faster, we expect the big impact of Driven's APM to really be with IT and Dev operations folks who have to manage large data processing solutions in production. APM is sorely needed in the big data for enterprise production, and we think Driven has a good chance of expanding over time to give Concurrent a real foothold in the enterprise management space. And on the backend Concurrent will gain instant broad and deep visibility into how the platform is doing in the field, helping improve Cascading.


By the way, I don't much enjoy programming in Java, but I love languages like Ruby (I suspect one could make a Myers Briggs like test that sorts the world into those who tend towards verbosely detailed strictly-typed languages and those that like meta-programming elegance).  If you prefer programming in Ruby to tackle big data, there is a cascading.jruby project you should check out (and similar extensions for Scala, Clojure, Python...).

  • Premiered: 02/04/14
  • Author: Mike Matchett
Topic(s): Cascading Big Data Hadoop apm Performance Concurrent


There are no comments to display. Scroll down to leave your own!


Leave a Comment

You must be logged in to comment. Click here to log in or register if you don't have an account.