recent posts

banner image

Installing Hadoop 3.0.0 Single Node Cluster on Ubuntu


Hi Hadoop observers,

     This article is sharing information with you about the simple installation of Hadoop 3.0.0 single node. Follow the steps clearly so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).


 

Prerequisites


  • UBUNTU (Ubuntu 17.10)
  • JAVA JDK 8 (minimum) or JDK 9 or OpenJDK 8
  • SSH and Pdsh

Step 1: Installing Java

1 – Oracle JAVA SDK

$ java -version

If you don’t have Java installed on your system, use one of following link to install it first.
First, we should add Oracle repository as follow:

$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
Install JAVA 8 (JDK 8u161) on Ubuntu
$ sudo apt-get install oracle-java8-installer
$ java -version
Install JAVA9 (JDK 9) on Ubuntu
$ sudo apt-get install oracle-java9-installer
$ java -version

2 – Open SDK
$ sudo apt-get update
$ sudo apt-get install openjdk-8-jdk

Installing Java


Step 2: Configuring SSH

SSH is used for remote login. SSH is required in Hadoop to manage its nodes. It must have configured in Hadoop user environment (The user whom will using Hadoop).
$ sudo apt-get install ssh
$ sudo apt-get install pdsh
$ sudo service ssh status
 
Generate Key Pairs
$ ssh-keygen -t rsa -P ""
 
The result of above line will be: 
Generating public/private rsa key pair.

Enter file in which to save the key (/home/elgindy/.ssh/id_rsa):

Created directory '/home/elgindy/.ssh'.

Your identification has been saved in /home/elgindy/.ssh/id_rsa.

Your public key has been saved in /home/elgindy/.ssh/id_rsa.pub.

The key fingerprint is:

9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2 elgindy@ubuntu

The key's randomart image is:

[...snipp...]
 
Then Configure passwordless ssh then Change the permission of file that contains the key
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
Finally check ssh to the localhost
$ ssh localhost



Step 3: Configure Hadoop

 First download Hadoop 3.x from the following link
https://hadoop.apache.org/releases.html
 
or download the last version till now
http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.0.0/hadoop-3.0.0.tar.gz

 
 
 
After download go to the folder or path containing Hadoop-3.0.0.tar.gz then extract its content using right click then extract here or using terminal as follow:
 $ tar -xzf hadoop-3.0.0.tar.gz
We will install Hadoop in Home path so, we should move extracted folder to a new folder in Home path
$ sudo mv hadoop-3.0.0/ /usr/local/Hadoop/
$ sudo chown -R elgindy:elgindy /usr/local/Hadoop/
 

Step 4: Update $HOME/.bashrc

$ gedit ~/.bashrc
 
Edit .bashrc file is located in user’s home directory and adds following parameters:
 
#Elgindy HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export HADOOP_HOME=/usr/local/Hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
#HADOOP VARIABLES END




Save bachrc then reload it using:
$ source ~/.bashrc

Step 5: Setup Hadoop Configuration

As we know, Core hadoop is made up of 5 daemons viz. NameNode(NN), SecondaryNameNode(SN), DataNode(NN),ResourceManager(RM), NodeManager(NM). We need to modify the config files in the conf folder.Below are the files required for respective daemons.

NAMENODE
core-site.xml
RESOURCE MANAGER
mapred-site.xml
SECONDARY NAMENODE

DATANODE
slaves
NODEMANAGER
 slaves & yarn-site.xml

Ports used by Hadoop Daemons

Remote Procedure Call (RPC) is a protocol that one program can use to request a service from a program located in another computer in a network without having to understand network details.
WEB which is denoted in the table is the WEB port number.

Hadoop Daemons
RPC Port
WEB – UI
NameNode
9000
9870
SecondaryNameNode

50090
DataNode
50010
50075
Resource Manager
8030
8088
Node Manager
8040
8042
 
First, we need to create a temp folder for Hadoop operations 
$ sudo mkdir -p /app/hadoop/tmp
$ sudo chown -R elgindy:elgindy /app/hadoop/tmp
# ...and if you want to tighten up security, chmod from 755 to 750...
$ sudo chmod 750 /app/hadoop/tmp
Then move to the following Path for configuring hadoop
$ cd /usr/local/Hadoop/etc/hadoop/
Then open hadoop-env.sh
$ gedit hadoop-env.sh
Set JAVA_HOME environment variable. Change the JAVA path as per install on your system.
export JAVA_HOME=/usr/lib/jvm/java-8-oracle




$ gedit core-site.xml

Then write the next lines between configuration.
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description>
</property>

$ gedit hdfs-site.xml

Then write the next lines between configuration.
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

$ gedit mapred-site.xml

Then write the next lines between configuration.
<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

$ gedit yarn-site.xml

Then write the next lines between configuration.
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

Step 6: Formatting the HDFS filesystem via the NameNode

The first step to running your Hadoop for the first time is formatting the Hadoop filesystem which is implemented on top of the local filesystem of your “cluster”.
First moving to Hadoop path
$ cd /usr/local/Hadoop
Then apply the following line (Do not format a running Hadoop filesystem as you will lose all the data currently in the cluster (in HDFS))
$ bin/hdfs namenode -format





Step 7: Starting your single-node cluster

Start All Hadoop Related Services.
(Starting Daemon’s For NN,DN and SNN )
$ sbin/start-dfs.sh 
(Starting Daemon’s For RM and NM )
$ sbin/start-yarn.sh
Or you can run all Hadoop by the following line
$ sbin/start-all.sh

Hint:
If the upper lines gave an error at the time of start HDFS services then use:
$ echo "ssh" | sudo tee /etc/pdsh/rcmd_default
Then run the above lines again.



To check running of Hadoop just writing
$ jps
The result will be:

18096 NodeManager
17440 DataNode
17952 ResourceManager
17649 SecondaryNameNode
18451 Jps
17294 NameNode




Hadoop NameNode started on port 9870 default. Access your server on port 9870 in web browser.





Now access port 8088 for getting the information about cluster and all applications
http://localhost:8088/




Access port 50090 for getting details about secondary namenode.
http://localhost:50090/
Access port 50075 to get details about DataNode
http://localhost:50075/



Step 8: Stopping your single-node cluster

Stop All Hadoop Related Services.
(Stopping Daemon’s For NN,DN and SNN )

$ sbin/stop-dfs.sh 
(Starting Daemon’s For RM and NM )

$ sbin/stop-yarn.sh

Or you can run all Hadoop by the following line


$ sbin/stop-all.sh
  


The next lesson will installing Hadoop 3.0.0 Mylti-Node.

Installing Hadoop 3.0.0 Single Node Cluster on Ubuntu Installing Hadoop 3.0.0 Single Node Cluster on Ubuntu Reviewed by Mahmoud Elgindy on 2/03/2018 01:40:00 م Rating: 5

ليست هناك تعليقات:

يتم التشغيل بواسطة Blogger.