Configuration steps for Single-Node Hadoop Cluster

Steps required to set up distributed, single-node and multi-node Apache Hadoop cluster backed by the Hadoop Distributed File System (HDFS), running on Ubuntu Linux (Ubuntu 12.04) desktop

1. Sun Java 7 Installation

Hadoop requires a working java 1.5 or above installation. However, here java 1.7 is used for running Hadoop.

h1_installing jdk1.7

After installing, make a quick check whether Sun’s JDK is correctly setup by writing following command

anyuser@ubuntu:~#  java -version.

2. Configuring SSH

Hadoop requires SSH access to manage its nodes, i.e. remote machine plus local machine to use Hadoop on it. For single node it is necessary to configure SSH access to localhost for the user.

h2_Configuring_ssh

Next, it required to create an RSA key pair with an empty password. So every time Hadoop interacts with nodes, it does not need any password.

Then enable SSH access to local machine with the new key created key:

user@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Final step is to test the SSH setup by connecting local machine to Hadoop nodes.

anyuser@ubuntu:~$ ssh localhost

3. Disabling IPv6

One problem with IPv6 on Ubuntu is that using 0.0.0.0 for various  networking-related hadoop configuration options will result in Hadoop binding to IPv6 address on Ubuntu box. It is better to disable if it was not in use.

h5_disabling ipv6

4. Hadoop Installation

Download Hadoop from Apache Download Mirrors and extract the contents of the Hadoop package to a location of choice. Make sure to change the owner of all the files to user.

$ sudo tar xzf  hadoop-1.0.3.tar.gz

$ sudo mv hadoop-1.0.3 hadoop

 

5. Update $HOME/.bashrc

Add the following lines shown in figure to the end of $HOME/.bashrc file of user.

h7_bashrcfile

Configuration

Hadoop-env.sh file

Change

conf/hadoop-env.sh

  1. # The java implementation to use, required.
  2. # export JAVA_HOME=/usr/lib/jdk1.7-sun

To

Conf/hadoop-env.sh

  1. # The java implementation to use, required.
  2. # export JAVA_HOME=/usr/lib/jvm/java-7-sun

Conf/*-site.xml

Here, configuration of directory will be there where, hadoop will store its data files, the network ports it listens to, etc.

h10_core-sitexml

 

Conf/mapred-site.xml

h12_mapredsitexml

 

Conf/hdfs-site.xml

h11_hdfssitexml

 

6. Formatting the HDFS file-system via the namenode

The first step to starting up the Hadoop installation is formatting the Hadoop file-system which is implemented on top of local file-system of cluster. It is needed to do when set up Hadoop cluster first time.

Do not format the running Hadoop file-system as it makes to lose all the data currently in the cluster. It will delete all the Hadoop files, configuration, name-node and even all data nodes. It will make you to configure Hadoop again.

To format the file-system, run the following command.

user@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode –format

The output will have following look.

h13_formatting namenode

 

7. Starting single node cluster

Run the command.

user@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh

h14_startalltask

 

8. Stopping single node cluster

h15_stopping all

 

9. Running Map-Reduce job

h18_run the map reduce job

10. Hadoop Web Interfaces

Hadoop comes with several web interfaces which are by default available at these locations.

  1. http://localhost:50070/ – web UI of the NameNode daemon
  2. http://localhost:50030/ – web UI of the JobTracker daemon
  3. http://localhost:50060 – web UI of the TaskTracker daemon

h21_hadoop web interfaces

If you are doing research or dissertation  in Hadoop, go through the following links.

ijarcet.org/wp-content/uploads/IJARCET-VOL-3-ISSUE-2-354-356.pdf

  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

Krishna Yadav

Krishna Yadav is an Entrepreneur and founder of Infibusiness Solution, since 2015. He mostly engage in programming and writing technical blogs on startup, seo and google tools. Subscribe for your daily dose of tech tips and website design guidance.

14 thoughts on “Configuration steps for Single-Node Hadoop Cluster

Leave a Reply

Your email address will not be published.