WELCOME TO BIGDATATRENDZ      WELCOME TO CAMO      Architectural Patterns for Near Real-Time Data Processing with Apache Hadoop      Working with Apache Spark: Or, How I Learned to Stop Worrying and Love the Shuffle     

Eco System

Eco System

Big Data ecosystem is evolving at a very rapid pace and it's difficult to keep track of the changes. The ecosystem provides a lot of choices (open source vs proprietary, free vs commercial, batch vs streaming). For a new-bee, it not only takes good amount of time and effort to get familiar with a framework, but it's also perplexing where to start.

Hadoop has got a lot of attention and many start with Hadoop, but Hadoop is not the solution for everything. Let's take graph processing, Hama and Giraph (though in incubating) are better then Hadoop for it. This page attempts to give an idea of the ecosystem around Big Data.

Here are some of the useful articles/blogs to get started with the Hadoop ecosystem.






How-to: Use Parquet with Impala, Hive, Pig, and MapReduce

Source: Cloudera Blog The CDH software stack lets you use your tool of choice with the Parquet file format – – offering the benefits of ...