Hadoop Installation on Single Machine

To Download and Install Hadoop, the prerequisites are
1. Linux based OS 64-bit OS like
            Fedora ... etc
I preferred to use Ubuntu 12.04LTS, later 14.04 LTS(upcomming version)

2. JAVA 1.6 or 1.7 JDK

Go to Downloads folder
> cd Downloads

Un-zip the hadoop tar file
>sudo tar xzf hadoop-1.1.2.tar.gz

I created a folder in /home/hduser/
>mkdir Installations

Move the Hadoop Un-Zip folder to Installations Directory, pointing as Hadoop
>sudo mv /home/hduser/Downloads/hadoop-1.2.1 hadoop

Giving some permissions to hadoop folder
>sudo addgroup hadoop
>sudo chown -R hduser:hadoop hadoop

Restart the terminal inorder to get .bashrc file with some content

>gksudo gedit .bashrc

add the following lines to end of this page

# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/jdk1.6.0_45/

# Set Hadoop-related environment variables
export HADOOP_HOME=/home/hduser/Installations/hadoop-1.2.1
# Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"

# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
# Requires installed 'lzop' command.
lzohead () {
    hadoop fs -cat $1 | lzop -dc | head -1000 | less

# Add Hadoop bin/ directory to PATH

Save the file and exit the terminal.

Now its time to modify the configuratins in core-site.xml,  hdfs-site.xml, mapred-site.xml, hadoop-env.sh

In hadoop-env.sh
export JAVA_HOME as in snap
In core-site.xml file, add text below, between tags

  A base for other temporary directories.

  The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.

In hdfs-site.xml,

We need to set replication factor as follows

  Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.

In mapred-site.xml, Configure mapred as follows

  The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.

save all the files and close

Open the terminal and go for following steps for first use.

>bin/hadoop namenode -format
                after getting jps, you will find 5 daemons which cluster is ready

goto web browser for GUI for Hadoop


In coming blog, i am going to give Multi-Cluster Setup 

Happy Hadooping !!


Popular posts from this blog

Cloudera Data Hub: Where Agility Meets Control

Architectural Patterns for Near Real-Time Data Processing with Apache Hadoop

How-to: Use Parquet with Impala, Hive, Pig, and MapReduce

Big Data Trendz