How to Install and Configure Apache Hadoop on Ubuntu 20.04

Apache Hadoop is an open-source, Java-based software platform that manages data processing and storage for big data applications. Hadoop works by distributing large data sets and analytics jobs across nodes in a computing cluster, breaking them down into smaller workloads that can be run in parallel. Hadoop can process structured and unstructured data and scale up reliably from a single server to thousands of machines.

Update the system

Update the system packages with the latest version with the following command and reboot the system once updated.

  apt-get update -y

Installing Java

Apache Hadoop is an application based on JAVA programming, need to install JAVA with the following command.

  apt-get install default-jdk default-jre -y


root@crowncloud:~# apt-get install default-jdk default-jre -y
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  alsa-topology-conf alsa-ucm-conf ca-certificates-java
  default-jdk-headless default-jre-headless
  fonts-dejavu-extra java-common libasound2 libasound2-data
  libatk-wrapper-java libatk-wrapper-java-jni libgif7
  libice-dev libpcsclite1 libpthread-stubs0-dev libsm-dev
  libx11-dev libxau-dev libxcb1-dev libxdmcp-dev libxt-dev
  openjdk-11-jdk openjdk-11-jdk-headless openjdk-11-jre
  openjdk-11-jre-headless x11proto-dev xorg-sgml-doctools

Verify the JAVA version once the installation is done.

  java -version


root@crowncloud:~# java -version
openjdk version "11.0.12" 2021-07-20
OpenJDK Runtime Environment (build 11.0.12+7-Ubuntu-0ubuntu3)
OpenJDK 64-Bit Server VM (build 11.0.12+7-Ubuntu-0ubuntu3, mixed mode, sharing)

Creating Hadoop user

Create Hadoop User and Setup Passwordless SSH for Hadoop user run the following command to create Hadoop user.

adduser hadoop


root@crowncloud:~# adduser hadoop
Adding user `hadoop' ...
Adding new group `hadoop' (1001) ...
Adding new user `hadoop' (1001) with group `hadoop' ...
Creating home directory `/home/hadoop' ...
Copying files from `/etc/skel' ...

Switch to Hadoop user once the user has been created.

  su - hadoop

Run the following command to generate the SSH key.

  ssh-keygen -t rsa


hadoop@crowncloud:~$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/hadoop/.ssh/id_rsa
Your public key has been saved in /home/hadoop/.ssh/
The key fingerprint is:
The key's randomart image is:
+---[RSA 3072]----+
|            .    |
|           . .   |
|    .   . .   .  |
|   o o . . ..+   |
|    *   S . ++oo |
|   + o + = . =@.=|
|  o + + +   *+.X.|
|   o o o  . +++  |
|      o....+ Eo. |

You have to add the public key of your computer to the authorized_keys file of the computer also give permission to the authorized_keys file.

cat ~/.ssh/ >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys

Verify the passwordless SSH connection with the following command.

  ssh Server's_IP_Address

Install Hadoop

Switch Hadoop user and download the latest version of Hadoop using the following "wget" command.

su - hadoop



hadoop@crowncloud:~$     wget
--2021-09-13 12:35:33--
Resolving (,,, ...
Connecting to (||:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 605187279 (577M) [application/x-gzip]
Saving to: ‘hadoop-3.3.1.tar.gz

Extract the downloaded "tar" file with the following command.

  tar -xvzf hadoop-3.2.1.tar.gz

Next, switch back to root user for the below commands. We will move the extracted files to a specific directory.

  su root
  cd /home/hadoop
  mv hadoop-3.3.0 /usr/local/hadoop

/home/hadoop path will differ in case you have a different username.

Create the log directory to store the "Apache Hadoop" logs.

  mkdir /usr/local/hadoop/logs

Change the ownership of /usr/local/hadoop directory to Hadoop and switch back to Hadoop user.

  chown -R hadoop:hadoop /usr/local/hadoop
  su hadoop

Enter the edit mode to ".bashrc" and define the Hadoop environment variables by adding the following content to the end of the file.

vi ~/.bashrc

And add the following configuration to the end of the file.

export HADOOP_HOME=/usr/local/hadoop
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

Run the following command to activate the added environment variables.

  source ~/.bashrc

Configure Hadoop

Next, switch back to Hadoop user. If you are new to Hadoop and want to explore basic commands or test applications, you can configure Hadoop on a single node. Configure Java Environment Variables.

Next, you will need to define Java environment variables in to configure YARN, HDFS, MapReduce, and Hadoop-related project settings.

To locate the correct path of Java by using the following command.

which javac


hadoop@crowncloud:~$ which javac

Next, find the OpenJDK directory with the following command.

  readlink -f /usr/bin/javac


hadoop@crowncloud:~$ readlink -f /usr/bin/javac

Next, edit the file and define the Java path.

vi $HADOOP_HOME/etc/hadoop/

And add the following configuration to the end of the file.

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64 
export HADOOP_CLASSPATH+=" $HADOOP_HOME/lib/*.jar"

Need to download the Javax activation file by running the following command.

cd /usr/local/hadoop/lib
sudo wget


root@crowncloud:/usr/local/hadoop/lib#  sudo wget
--2021-09-13 12:56:33--
Resolving (
Connecting to (||:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 56674 (55K) [application/java-archive]
Saving to: ‘javax.activation-api-1.2.0.jar’

javax.activatio 100%[======>]  55.35K  --.-KB/s    in 0.002s  

2021-09-13 12:56:33 (23.1 MB/s) - ‘javax.activation-api-1.2.0.jar’ saved [56674/56674]

Next, Verify the Hadoop version.

hadoop version


root@crowncloud:/usr/local/hadoop/lib# hadoop version
Hadoop 3.3.1
Source code repository -r a3b9c37a397ad4188041dd80621bdeefc46885f2
Compiled by ubuntu on 2021-06-15T05:13Z
Compiled with protoc 3.7.1
From source with checksum 88a4ddb2299aca054416d6b7f81ca55
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.3.1.jar

Configure core-site.xml File

To set up Hadoop you need to specify the URL for your NameNode as following.

vi $HADOOP_HOME/etc/hadoop/core-site.xml

And add the following configuration to the end of the file.

      <description>The default file system URI</description>

Configure hdfs-site.xml File

Need to define a location for storing node metadata, image file, and edit log file. Configure the file by defining the NameNode and DataNode storage directories.

Before configure create a directory for storing node metadata.

mkdir -p /home/hadoop/hdfs/{namenode,datanode}
chown -R hadoop:hadoop /home/hadoop/hdfs

Edit the hdfs-site.xml file and define the location of the directory as follows.

vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml

And add the following configuration to the end of the file.




Configure mapred-site.xml File

Use the following command to access the mapred-site.xml file and define MapReduce values.

vi $HADOOP_HOME/etc/hadoop/mapred-site.xml

And add the following configuration to the end of the file.


Configure yarn-site.xml File

You would need to edit the yarn-site.xml file and define YARN related settings.

vi $HADOOP_HOME/etc/hadoop/yarn-site.xml

And add the following configuration to the end of the file.


Format HDFS NameNode

It is important to format the NameNode before starting Hadoop services for the first time.

hdfs namenode -format


root@crowncloud:/usr/local/hadoop/lib# hdfs namenode -format
2021-09-13 13:05:08,749 INFO namenode.NameNode: STARTUP_MSG: 
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host =
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 3.3.1
STARTUP_MSG:   classpath = /usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/commons-net-3.6.jar:/usr/local/hadoop/

Start the Hadoop Cluster

First, start the NameNode and DataNode with the following command.


Starting namenodes on [] /usr/local/hadoop/bin/../libexec/ line 1848: /tmp/ Permission denied

Next, start the YARN resource and node managers by typing.


Starting resourcemanager
Starting nodemanagers

Verify if all the daemons are active and running as Java processes.



Access Hadoop Web Interface

Navigate your localhost URL or IP to access Hadoop NameNode: http://your-server-ip:9870


Navigate your localhost URL or IP to access individual DataNodes: http://your-server-ip:9864


Navigate your localhost URL or IP to access the YARN Resource Manager: http://your-server-ip:8088