Join Newsletter
Trusted Business Advisors, Expert Technology Analysts

Taneja Blog

Taneja Blog / Big Data / Data Center Systems

Project Myriad Will Become Your Next Data Center Platform

One of the big things bubbling around at Strata this week is talk about YARN, Mesos, and Project Myriad (initiated/sponsored by MapR).  One on hand it seems that this is just about some evolution of the Hadoop scheduling layer, but looking at with a critical eye, I see the impending culmination of what I predicted years ago - that the Hadoop ecosystem will quickly evolve to bring high-powered HPC technologies right into the heart of the next gen enterprise data center.

The whole data lake/hub/refinery marketing movement by various big data vendors certainly drives in this direction, but is more of a re-visioning of how to tackle big data in a data center effectively and efficiently.  Of course big data vendors want you to put more data on big data platforms, thus the data lake/hub concept. Supporting and powering this are new evolutions in big data security and mixed workload and data type integration (i.e. SQL and NoSQL!).  I'll be writing about this trend in my "The Next Big Thing" column in next month's Modern Infrastructure.  

But I propose that Project Myriad signifies a huge milestone in bringing the HPC clustering value propositions right into enterprise IT.  Myriad unifies and optimizes the value one can now extract out of building large datacenter commodity clusters.  By design Mesos and YARN work differently, with YARN aiming at Hadoop ecosystem workloads and Mesos biult for data center apps.  In particular, Mesos brings a dockerized container approach while YARN is more of an enhanced distributed job/task (i.e. process) scheduler. But through Project Myriad, Mesos can host YARN as one of its workloads, assembling the best of all worlds.

Now we finally have a data center level resource scheduling solution that promises to unifiy all large clustering workloads that we might have in the enterprise data center onto a single pooled commodity infrastructure cluster. Technically Mesos becomes the master scheduler, turning a large infrastructure cluster (nodes, memory, disks, etc) into essentially a pooled cloud that dynamically serves carved out sub-clusters.  This is a different than Hadoop virtualization in which cluster nodes run as virtual machines within a virtualized host environment - there is no hypervisor abstraction or logical "sharing". In this case Mesos dynamically assigns actual physical cluster resources to YARN clusters (through project Myriad including Hadoop MR, Spark, and other types of data center cluster-based workloads (including long running java, ruby, web apps, databases, and possibly hypervisor clusters, etc.).

This is a big idea, and as we digest the full implications we'll no doubt have many more thoughts to post. In the meantime, keep an eye on Project Myriad.

Bookmark and Share
  • Premiered: 02/19/15
  • Author: Mike Matchett
Topic(s): MapR Big Data YARN Hadoop Mesos Scheduling Project Myriad


There are no comments to display. Scroll down to leave your own!


Leave a Comment

You must be logged in to comment. Click here to log in or register if you don't have an account.