Taneja Group | MapReduce
Join Newsletter
Forgot
password?
Register
Trusted Business Advisors, Expert Technology Analysts

Items Tagged: MapReduce

Resources

Big Data Storage Options for Enterprise Hadoop

In this webcast, Sr. IT Analyst Mike Matchett from Taneja Group will briefly review the storage architecture of Hadoop and HDFS, and then examine some of the more prominent big data storage options for enterprises with data protection, integration, and governance concerns that might lead them to choose an advanced SAN/NAS solution over the default local DAS design.

  • Premiered: 12/10/13 at 10 am PT/ 1 pm ET
  • Location: OnDemand
  • Speaker(s): Mike Matchett, Senior Analyst, Taneja Group
Topic(s): TBA Topic(s): BrightTALK Topic(s): TBA Topic(s): Mike Matchett Topic(s): TBA Topic(s): Hadoop Topic(s): TBA Topic(s): Storage Topic(s): TBA Topic(s): Enterprise Storage Topic(s): TBA Topic(s): SAN Topic(s): TBA Topic(s): NAS Topic(s): TBA Topic(s): DAS Topic(s): TBA Topic(s): HDFS Topic(s): TBA Topic(s): MapReduce
news

External storage might make sense for Hadoop

Using Hadoop to drive big data analytics doesn't necessarily mean building clusters of distributed storage -- good old external storage might be a better choice.

  • Premiered: 02/28/14
  • Author: Mike Matchett
  • Published: Tech Target: Search Storage
Topic(s): TBA Hadoop TBA Big Data TBA analytics TBA SAN TBA NAS TBA scale-out TBA HDFS TBA MapReduce TBA DAS TBA RAID TBA replication TBA Sentry TBA Accumulo TBA scalability
news

Hadoop Storage Options: Time to Ditch DAS?

Hadoop is immensely popular today because it makes big data analysis cheap and simple: you get a cluster of commodity servers and use their processors as compute nodes to do the number crunching, while their internal direct attached storage (DAS) operate as very low cost storage nodes.

  • Premiered: 02/19/15
  • Author: Taneja Group
  • Published: Infostor
Topic(s): TBA Hadoop TBA Storage TBA DAS TBA Direct attached storage TBA Compute TBA SATA TBA HDFS TBA Hadoop Distributed File System TBA data TBA MapReduce TBA YARN TBA Hadoop 2 TBA data lake TBA data refinery TBA Enterprise Storage TBA DR TBA Disaster Recovery TBA compliance TBA Security TBA Business Continuity TBA Performance TBA FC TBA Fibre Channel TBA SAN TBA NAS TBA Virtualization TBA Cloud TBA VM TBA Virtual Machine TBA MapR
news

Big data analytics applications impact storage systems

Analytics applications for big data have placed extensive demands on storage systems, which Mike Matchett says often requires new or modified storage structures.

  • Premiered: 09/03/15
  • Author: Mike Matchett
  • Published: TechTarget: Search Storage
Topic(s): TBA Mike Matchett TBA Big Data TBA analytics TBA Storage TBA Primary Storage TBA scalability TBA Business Intelligence TBA BI TBA AWS TBA Amazon AWS TBA S3 TBA HPC TBA High Performance Computing TBA High Performance TBA ETL TBA HP Haven TBA HP TBA Hadoop TBA Vertica TBA convergence TBA converged TBA IOPS TBA Capacity TBA latency TBA scale-out TBA software-defined TBA software-defined storage TBA SDS TBA YARN TBA Spark
news

Big Data Grows Up: APM Tools Emerging

The next emerging market for big data may be application performance monitoring. Concurrent has released a new tool in this space, and more tools are expected to hit the market in the next six months. The availability of such tools will drive more enterprise adoption of big data.

  • Premiered: 10/26/15
  • Author: Taneja Group
  • Published: Information Week
Topic(s): TBA apm TBA Big Data TBA Mike Matchett TBA Concurrent TBA Hadoop TBA High Performance TBA application performance TBA DevOps TBA Apache TBA Apache Hive TBA MapReduce TBA Cascading TBA Scalding TBA Spark TBA Performance TBA Data Center
Profiles/Reports

Now Big Data Works for Every Enterprise: Pepperdata Adds Missing Performance QoS to Hadoop

While a few well-publicized web 2.0 companies are taking great advantage of foundational big data solution that they have themselves created (e.g. Hadoop), most traditional enterprise IT shops are still thinking about how to practically deploy their first business-impacting big data applications – or have dived in and are now struggling mightily to effectively manage a large Hadoop cluster in the middle of their production data center. This has led to the common perception that realistic big data business value may yet be just out of reach for most organizations – especially those that need to run lean and mean on both staffing and resources.   

This new big data ecosystem consists of scale-out platforms, cutting-edge open source solutions, and massive storage that is inherently difficult for traditional IT shops to optimally manage in production – especially with still evolving ecosystem management capabilities. In addition, most organizations need to run large clusters supporting multiple users and applications to control both capital and operational costs. Yet there are no native ways to guarantee, control, or even gain visibility into workload-level performance within Hadoop. Even if there wasn’t a real high-end skills and deep expertise gap for most, there still isn’t any practical way that additional experts could tweak and tune mixed Hadoop workload environments to meet production performance SLA’s.

At the same time, the competitive game of mining of value from big data has moved from day-long batch ELT/ETL jobs feeding downstream BI systems, to more user interactive queries and business process “real time” applications. Live performance matters as much now in big data as it does in any other data center solution. Ensuring multi-tenant workload performance within Hadoop is why Pepperdata, a cluster performance optimization solution, is critical to the success of enterprise big data initiatives.

In this report we’ll look deeper into today’s Hadoop deployment challenges and learn how performance optimization capabilities are not only necessary for big data success in enterprise production environments, but can open up new opportunities to mine additional business value. We’ll look at Pepperdata’s unique performance solution that enables successful Hadoop adoption for the common enterprise. We’ll also examine how it inherently provides deep visibility and reporting into who is doing what/when for troubleshooting, chargeback and other management needs. Because Pepperdata’s function is essential and unique, not to mention its compelling net value, it should be a checklist item in any data center Hadoop implementation.

To read this full report please click here.

Publish date: 12/17/15
news

Concurrent app management tools work on Hadoop and Spark

If Hadoop and Spark are to sneak into the enterprise, they will need to be manageable. With Driven, Concurrent Inc. takes a stab at the problem.

  • Premiered: 12/09/15
  • Author: Taneja Group
  • Published: TechTarget: Search Data Management
Topic(s): TBA Hadoop TBA Spark TBA Driven TBA Concurrent TBA manageability TBA Big Data TBA Performance TBA Performance Management TBA Mike Matchett TBA Hive TBA MapReduce TBA SLA TBA service level agreement TBA software TBA high-fidelity TBA HiFi TBA cluster TBA Pepperdata TBA Oracle TBA IBM TBA CA
news

Mobile gaming company plays new Hadoop cluster management card

Chartboost, which operates a platform for mobile games, turned to new cluster management software in an effort to overcome problems in controlling the use of its Hadoop processing resources.

  • Premiered: 01/05/16
  • Author: Taneja Group
  • Published: TechTarget: Search Data Management
Topic(s): TBA Chartboost TBA mobile TBA cluster TBA Cluster Management TBA Hadoop TBA processing TBA data processing TBA analytics TBA Big Data TBA MapReduce TBA Hive TBA Spark TBA Optimization TBA Cloudera TBA AWS TBA Amazon TBA Cloud TBA YARN TBA Pepperdata TBA Memory TBA CPU TBA Application TBA Concurrent TBA SLA TBA service-level agreement TBA HBase TBA application performance TBA application performance management TBA Mike Matchett
Profiles/Reports

Cohesity Data Platform: Hyperconverged Secondary Storage

Primary storage is often defined as storage hosting mission-critical applications with tight SLAs, requiring high performance.  Secondary storage is where everything else typically ends up and, unfortunately, data stored there tends to accumulate without much oversight.  Most of the improvements within the overall storage space, most recently driven by the move to hyperconverged infrastructure, have flowed into primary storage.  By shifting the focus from individual hardware components to commoditized, clustered and virtualized storage, hyperconvergence has provided a highly-available virtual platform to run applications on, which has allowed IT to shift their focus from managing individual hardware components and onto running business applications, increasing productivity and reducing costs. 

Companies adopting this new class of products certainly enjoyed the benefits, but were still nagged by a set of problems that it didn’t address in a complete fashion.  On the secondary storage side of things, they were still left dealing with too many separate use cases with their own point solutions.  This led to too many products to manage, too much duplication and too much waste.  In truth, many hyperconvergence vendors have done a reasonable job at addressing primary storage use cases, , on their platforms, but there’s still more to be done there and more secondary storage use cases to address.

Now, however, a new category of storage has emerged. Hyperconverged Secondary Storage brings the same sort of distributed, scale-out file system to secondary storage that hyperconvergence brought to primary storage.  But, given the disparate use cases that are embedded in secondary storage and the massive amount of data that resides there, it’s an equally big problem to solve and it had to go further than just abstracting and scaling the underlying physical storage devices.  True Hyperconverged Secondary Storage also integrates the key secondary storage workflows - Data Protection, DR, Analytics and Test/Dev - as well as providing global deduplication for overall file storage efficiency, file indexing and searching services for more efficient storage management and hooks into the cloud for efficient archiving. 

Cohesity has taken this challenge head-on.

Before delving into the Cohesity Data Platform, the subject of this profile and one of the pioneering offerings in this new category, we’ll take a quick look at the state of secondary storage today and note how current products haven’t completely addressed these existing secondary storage problems, creating an opening for new competitors to step in.

Publish date: 03/30/16
news

Spark speeds up adoption of big data clusters and clouds

Infrastructure that supports big data comes from both the cloud and clusters. Enterprises can mix and match these seven infrastructure choices to meet their needs.

  • Premiered: 07/19/16
  • Author: Mike Matchett
  • Published: TechTarget: Search IT Operations
Topic(s): TBA Apache Spark TBA Spark TBA Mike Matchett TBA Cloud TBA cloud cluster TBA cluster TBA Big Data TBA big data analytics TBA MapReduce TBA Business Intelligence TBA BI TBA MLlib TBA High Performance TBA hadoop cluster TBA HDFS TBA Hadoop Distributed File System TBA IBM TBA Hortonworks TBA Cloudera TBA capacity management TBA Performance Management TBA API TBA SAN TBA storage area networks TBA CAPEX TBA DataDirect Networks TBA HPC TBA Lustre TBA Virtualization TBA VM
news

Big Data Storage Solutions: Options Abound

Hadoop, Spark and other big data analysis tools all have one thing in common: they need some form of big data storage to hold the vast quantities of data that they crunch through. The good news is that big data storage options are proliferating.

  • Premiered: 08/09/16
  • Author: Taneja Group
  • Published: InfoStor
Topic(s): TBA Hadoop TBA Spark TBA Big Data TBA big data storage TBA DAS TBA Compute TBA cluster TBA flexibility TBA Mike Matchett TBA Hadoop Distributed File System TBA HDFS TBA NFS TBA MapReduce TBA API TBA SAN TBA NAS TBA TCO TBA DDN TBA EMC TBA EMC Isilon TBA Isilon TBA SDS TBA software-defined TBA software-defined storage TBA ViPR TBA DriveScale TBA hScaler TBA Cisco TBA HPE TBA IBM
news

Machine learning and data science workloads ignite Apache Spark adoption

The use of Apache Spark is dramatically increasing as new workloads create more use cases.

  • Premiered: 11/08/16
  • Author: Taneja Group
  • Published: CBR Online
Topic(s): TBA Apache TBA Apache Spark TBA Spark TBA Machine Learning TBA Big Data TBA Storage TBA Cloudera TBA Mike Matchett TBA analytics TBA Hadoop TBA Cloud TBA Public Cloud TBA Private Cloud TBA IBM TBA MapReduce
news

Four big data and AI trends to keep an eye on

AI is making a comeback - and it's going to affect your data center soon.

  • Premiered: 11/17/16
  • Author: Mike Matchett
  • Published: TechTarget: Search IT Operations
Topic(s): TBA AI TBA Artificial Intelligence TBA Big Data TBA Data Center TBA Datacenter TBA Machine Learning TBA Apache TBA Apache Spark TBA Spark TBA Hadoop TBA MapReduce TBA latency TBA In-Memory TBA big data analytics TBA Business Intelligence TBA Python TBA Dataiku TBA Cask TBA ETL TBA data flow management TBA Virtualization TBA Storage TBA scale-up TBA scale-out TBA scalability TBA GPU TBA IBM TBA NVIDIA TBA Virtual Machine TBA VM
news

Open source strategies bring benefits, but don't rush in

Before your organization can reap the benefits of open source, it's important to understand your options and map out a plan that will guarantee success.

  • Premiered: 09/21/17
  • Author: Mike Matchett
  • Published: TechTarget: Search Data Center
Topic(s): TBA Mike Matchett TBA Storage TBA Open Source TBA Big Data TBA big data analytics TBA analytics TBA ROI TBA DataWorks TBA Dataworks Summit TBA Hortonworks TBA Apache TBA Apache Hadoop TBA Hadoop TBA Security TBA Cloudera TBA MapR TBA HDFS TBA Hadoop Distributed File System TBA MapReduce TBA SQL TBA Microsoft TBA IBM TBA big data storage TBA converged TBA convergence