Join Newsletter
Trusted Business Advisors, Expert Technology Analysts

Taneja Blog

Taneja Blog / Cloud / Data Protection/Management

Mass Data Fragmentation: A Problem to Be Reckoned With

When it comes to the short list of IT priorities in 2019, data management is becoming a white-hot issue. In the era of big data and multicloud, organizations’ data is growing in velocity, volume and variety like it never has before. Whether your data is at the edge, on-premises, in the cloud, or somewhere in-between, it’s one of your organization’s most valuable assets, and yet one of the most difficult to control and manage. As a result, many IT leaders today are asking the same question: how can we more effectively manage our data to ensure it’s not only secure and protected, but also visible and accessible, so that it can be analyzed and used productively to help improve and grow our business?

Unfortunately, if you’re like most companies, the answer is not so simple. The reality is that companies’ data tends to be stored in many different fragments, across multiple locations and use-case silos. These can include:

  • Traditional use-case silos: Data across numerous separate file shares and across different backup methods and archives.
  • Secondary use-case silos: Where multiple copies of the same data exist for test/dev, analytics and backup scenarios.
  • Geographically dispersed silos: Data silos exist on-premises, across regional offices, and in different public clouds.

And the problem is not just that your data is siloed; large chunks are likely also dark, invisible to users and apps. As a company grows, data is accumulated faster than it can be processed and analyzed, and much of that data is buried away in storage systems scattered around the organization; untagged, untapped and unrecognized.

Once data becomes siloed and in many cases also dark, it becomes a digital boat anchor, a costly but largely invisible burden that can no longer benefit your organization. This problem is now becoming known as “mass data fragmentation” (or “MDF” for short).

Though many firms don’t realize it, the problem is already huge: think of all the backup and archival copies, test/dev and analytics clones, and DR replicas of data that are routinely made in your organization. Based on Taneja Group research, companies tend to make 5 or more copies of a majority of their data, resulting in additional data stores that are 4 or 5 times the size of their primary data stores. As data copies proliferate, the problem continues to get worse. Who can possibly be aware and keep track of all this data, let alone manage and protect it?

Simply google “mass data fragmentation”, and you’ll get a sense for the scope and severity of the issue.

As companies move their IT assets from on-premises to one or more public clouds, data becomes even more fragmented. Data often becomes captive in the cloud, due to proprietary data formats and the egress tax that service providers impose. In a recent Taneja Group study, we learned that cross-cloud data portability and avoidance of lock-in are the two primary motivators for adopting multicloud storage, and yet a majority of companies are not achieving those benefits today.

MDF leads to several significant challenges, including a lack of data visibility, the risk of regulatory non-compliance, and exploding storage requirements. Together, these issues increase management complexity and cost and—worse yet—reduce a company’s competitiveness and ability to serve its customers.

Companies have attempted to rein in the problem of data fragmentation using various approaches, such as deduplicating incoming data and limiting the number and frequency of copies. But at the end of the day, each of these attempts has fallen short.

How Do We Rein In MDF, Before the Problem Gets Any Bigger?

How can companies overcome the challenges of mass data fragmentation, even as their data stores continue to grow exponentially?

Attempts to address MDF using conventional methods and technologies have failed. What’s needed is a new, breakthrough approach, based on innovative technology that supports existing data management processes but is not constrained by them.

Just as virtualization helped address the problem of underutilized islands of compute, a new paradigm is required to identify, reach and allow organizations to benefit from the vast pockets of inaccessible, untapped and often invisible data that is trapped in their data centers and beyond.

Here are seven characteristics we believe you should look for in a solution to address MDF:

  • First and foremost, look for a solution based on an architectural platform designed to make data visible and accessible across location and silo boundaries, no matter how or where it resides. One platform to access your data and related services, across on-prem and cloud environments.
  • Given the sensitive data an MDF solution will touch, prioritize built-in data protection, security and compliance capabilities.
  • Find a solution that brings compute to where your data sits, both simply and cost-effectively. This will enable apps to run directly on the data, whether to secure and protect it or to analyze and extract value. To take advantage of continuing innovation, focus on solutions that are extensible to allow third party apps to fit into the architectural framework, and that enable your developers to write their own custom apps, all within the context of the same platform, UI and operating model.
  • Integrated machine learning capabilities are now table stakes, so that users can gain operational and business insights from the vast troves of data residing in different departments and functions throughout your organization.
  • Emphasize multi-protocol support, so users and apps can access data in its native format.
  • Look for offerings with proven data resiliency and integrity.
  • Last but not least, focus on MDF solutions that enable simple, SaaS-based management.

Some exciting new solutions aimed at overcoming MDF challenges are now emerging, which don’t require you to change your data collection, curation or management processes. Take a test drive of one or more of these solutions in your own environment to see how effectively they will work for you.

  • Premiered: 05/30/19
  • Author: Jeff Byrne
Topic(s): Data protection hyperconvergence Cloud Mass Data Fragmentation secondary data


There are no comments to display. Scroll down to leave your own!


Leave a Comment

You must be logged in to comment. Click here to log in or register if you don't have an account.