UPDATES

WELCOME TO BIGDATATRENDZ      WELCOME TO CAMO      Architectural Patterns for Near Real-Time Data Processing with Apache Hadoop      Working with Apache Spark: Or, How I Learned to Stop Worrying and Love the Shuffle     

Eco System

Eco System

Big Data ecosystem is evolving at a very rapid pace and it's difficult to keep track of the changes. The ecosystem provides a lot of choices (open source vs proprietary, free vs commercial, batch vs streaming). For a new-bee, it not only takes good amount of time and effort to get familiar with a framework, but it's also perplexing where to start.

Hadoop has got a lot of attention and many start with Hadoop, but Hadoop is not the solution for everything. Let's take graph processing, Hama and Giraph (though in incubating) are better then Hadoop for it. This page attempts to give an idea of the ecosystem around Big Data.

Here are some of the useful articles/blogs to get started with the Hadoop ecosystem.
 
Sqoop

HBase

Giraph

Oozie

Flume

Pig

Introduction to HDFS Erasure Coding in Apache Hadoop

Thanks to blog contributors from Cloudera Erasure coding, a new feature in HDFS, can reduce storage overhead by approximately 50% compar...