Join Newsletter
Trusted Business Advisors, Expert Technology Analysts

Taneja Blog

Taneja Blog / Big Data / Data Center Systems / Software Defined/Virtualized Infrastructure

Unifying Big Data Through Virtualized Data Services - Rewrites the Storage Stack

One of the more interesting new companies to arrive on the big data storage scene is The team has designed a whole new, purpose-built storage stack that can store and serve the same master data in multiple formats, at high performance and in parallel streaming speeds to multiple different kinds of big data applications. This promises to obliterate the current spaghetti data flows with many moving parts, numerous transformation and copy steps, and Frankenstein architectures required to currently stitch together increasingly complex big data workflows. We've seen enterprises need to build environments that commonly span from streaming ingest and real time processing through interactive query and into larger data lake and historical archive based analysis, and end up making multiple data copies in multiple storage formats in multiple storage services.

This new unified virtualized storage service has three fundamental layers. The top virtualizing layer is made up of essentially containerized microservice (stateless) API translators that provide file, object, stream, and various NoSQL interfaces to storage clients. This is a great application of microservice architecture - scalable, fungible, and non-blocking.

The second layer in's stack is their "data container" based core data service engine. Here all data is stream-processed - indexed, compressed, buffered in memory (and/or NVMe) for protection/speed, and managed/tiered to the third storage "media" persistence layer. Security, QoS and other storage functions are "inserted" into the data services pipeline at this core level as well.  The resulting architecture looks like a pipeline for assembling (or serving) data objects to/from media from/to the API users.

The media layer can be assembled from file, block, object and even cloud storage. Natively, would find high affinity with high performance object "oriented" storage, as its data service engine is mapping all other kinds of end user visibible data formats into internal "data objects".  Future hard drives with native low-level key/value object services are going to be very interesting, but we expect any storage that can be leveraged today could be turned into a big data storage server.

What's not obvious from a few sentences here is just how simplifying this can make a multi-stage big data workflow, and the kinds of TCO (capex and opex) that can be saved, not just the acceleration in time to value, increased agility, and unlocked opportunities with big data this can provide.  With IoT coming on strong, this may be unlocking the enterprise answer to big data storage.

  • Premiered: 06/14/16
  • Author: Mike Matchett


There are no comments to display. Scroll down to leave your own!


Leave a Comment

You must be logged in to comment. Click here to log in or register if you don't have an account.