Hadoop Resources

HadoopResources


"Cluster Computing and MapReduce Lecture" series in YouTube 


http://code.google.com/edu/parallel/mapreduce-tutorial.html 

What is Hadoop?

http://radar.oreilly.com/2012/02/what-is-apache-hadoop.html
http://gigaom.com/cloud/what-it-really-means-when-someone-says-hadoop
http://www.ibm.com/developerworks/data/library/techarticle/dm-1209hadoopbigdata/  

What is HDFS?
 
The paper covers most of the HDFS features except for the HDFS federation which was introduced in 0.23 release and HDFS High Availability feature which will be included in the coming Hadoop release 0.24.

HDFS as comic for the young.

HDFS Federation was introduced in 0.23 release to have multiple NameNodes in a cluster.

About HDFS from `The Architecture of Open Source Applications`.  
MapReduce Algorithms


Hadoop HelloWorld

http://hadoop.apache.org/common/docs/r0.20.205.0/mapred_tutorial.html

Setting up a Hadoop Cluster (Ubuntu)

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/  

Setting up Hadoop (Windows)

http://hortonworks.com/blog/hadoop-in-windows/
http://hortonworks.com/blog/installing-hadoop-on-windows/

http://v-lad.org/Tutorials/Hadoop/00%20-%20Intro.html
http://blogs.msdn.com/b/avkashchauhan/

Benchmarking and Stress Testing an Hadoop Cluster

http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/

http://web.ics.purdue.edu/~fahmad/benchmarks.htm 

Testing Hadoop Jobs

http://www.cloudera.com/blog/2009/07/advice-on-qa-testing-your-mapreduce-jobs/

Hadoop Tutorial

Books

Hadoop - The Definitive Guide (would recommend it - my review here)
Pro Hadoop (Didn't get a chance)

BSP vs MapReduce - http://arxiv.org/abs/1203.2081  

General (uncategorized)

Popular posts from this blog

Architectural Patterns for Near Real-Time Data Processing with Apache Hadoop

INTEGRATE SPARKR AND R FOR BETTER DATA SCIENCE WORKFLOW

How-to: Ingest Email into Apache Hadoop in Real Time for Analysis