HADOOP - Installation setup

October 04, 2013

Prerequisites

Hadoop requires a working Java 1.5+ (aka Java 5) installation. However, using Java 1.6/1.7 (aka Java 6/7) is recommended for running Hadoop. Please refer to jdk installation instructions here.

Dedicated user for Hadoop system

A dedicated Hadoop user will help Hadoop installation from other software applications and user accounts running on the same machine.

umasarath@ubuntu:~$ sudo addgroup hadoop
umasarath@ubuntu:~$ sudo adduser --ingroup hadoop hduser

The above two commands will add "hduser" user and "hadoop" group.

Hadoop Installation

Download Hadoop from the Apache Download Mirrors and extract the contents of the Hadoop package to a location of your choice. The folder I chosen was the hduser home folder. Extract the downloaded file in /home/hduser folder and make sure the file should be extracted in hduser login.

hduser@ubuntu:~$ sudo tar xzf hadoop-1.2.1.tar.gz
hduser@ubuntu:~$ sudo mv hadoop-1.2.1 hadoop

Configuration of Hadoop

Once the Hadoop installation is completed, Hadoop environment file should be configured with Java by adding the Java path to its file. File modifications are done as shown below.

hduser@ubuntu:~$ cd hadoop/conf
hduser@ubuntu:~$ vi hadoop-env.sh
# The java implementation to use.  Add the below line to the existing file.
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/

Now Hadoop installation and configuration was done!

Let’s run it without any arguments to see its usage documentation.

hduser@ubuntu:~$ cd hadoop
hduser@ubuntu:/home/hduser/hadoop$ bin/hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
namenode -format  format the DFS filesystem
secondarynamenode  run the DFS secondary namenode
namenode   run the DFS namenode
datanode   run a DFS datanode
dfsadmin   run a DFS admin client
fsck    run a DFS filesystem checking utility
fs    run a generic filesystem user client
balancer   run a cluster balancing utility
jobtracker   run the MapReduce job Tracker node
pipes    run a Pipes job
tasktracker   run a MapReduce task Tracker node
job    manipulate MapReduce jobs
version   print the version
jar <jar>   run a jar file
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME <src>* <dest> create a hadoop archive
daemonlog   get/set the log level for each daemon
or
CLASSNAME   run the class named CLASSNAME
Most commands print help when invoked w/o parameters.

Bingo! Your installation setup is done. Now you are ready for running your first example program in Hadoop.

Search This Blog

Do You Make These Mistakes?