PrerequisitesHadoop requires a working Java 1.5+ (aka Java 5) installation. However, using Java 1.6/1.7 (aka Java 6/7) is recommended for running Hadoop. Please refer to jdk installation instructions here.
Dedicated user for Hadoop system
A dedicated Hadoop user will help Hadoop installation from other software applications and user accounts running on the same machine.
umasarath@ubuntu:~$ sudo addgroup hadoop umasarath@ubuntu:~$ sudo adduser --ingroup hadoop hduser
The above two commands will add "hduser" user and "hadoop" group.
Hadoop InstallationDownload Hadoop from the Apache Download Mirrors and extract the contents of the Hadoop package to a location of your choice. The folder I chosen was the hduser home folder. Extract the downloaded file in /home/hduser folder and make sure the file should be extracted in hduser login.
hduser@ubuntu:~$ sudo tar xzf hadoop-1.2.1.tar.gz hduser@ubuntu:~$ sudo mv hadoop-1.2.1 hadoop
Configuration of Hadoop
Once the Hadoop installation is completed, Hadoop environment file should be configured with Java by adding the Java path to its file. File modifications are done as shown below.
hduser@ubuntu:~$ cd hadoop/conf hduser@ubuntu:~$ vi hadoop-env.sh # The java implementation to use. Add the below line to the existing file. export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Now Hadoop installation and configuration was done!
Let’s run it without any arguments to see its usage documentation.
hduser@ubuntu:~$ cd hadoop hduser@ubuntu:/home/hduser/hadoop$ bin/hadoop Usage: hadoop [--config confdir] COMMAND where COMMAND is one of: namenode -format format the DFS filesystem secondarynamenode run the DFS secondary namenode namenode run the DFS namenode datanode run a DFS datanode dfsadmin run a DFS admin client fsck run a DFS filesystem checking utility fs run a generic filesystem user client balancer run a cluster balancing utility jobtracker run the MapReduce job Tracker node pipes run a Pipes job tasktracker run a MapReduce task Tracker node job manipulate MapReduce jobs version print the version jar <jar> run a jar file distcp <srcurl> <desturl> copy file or directories recursively archive -archiveName NAME <src>* <dest> create a hadoop archive daemonlog get/set the log level for each daemon or CLASSNAME run the class named CLASSNAME Most commands print help when invoked w/o parameters.
Bingo! Your installation setup is done. Now you are ready for running your first example program in Hadoop.