Add nodes to DFS/YARN cluster

In previous post I created a HDFS/YARN cluster with only 4 nodes. Now I’ve requested more resource, it’s time to extend the capacity.

Here I plan to add another 4 hosts to the cluster, both HDFS and YARN. Please node that if the host profile(mostly RAM) is not the same, you should change yarn.nodemanager.resource.memory-mb in etc/hadoop/yarn-site.xml properly.

1. environment setup

First of all, let’s setup the environment for each host.

a. enable passwordless access for new added hosts;

b. copy hadoop-2.7.4.tar.gz, jdk-7u71-linux-x64.tar.gz and jdk-8u144-linux-x64.tar.gz to all hosts, and unzip them;

tar -zxf hadoop-2.7.4.tar.gz ; 
tar -zxf jdk-7u71-linux-x64.tar.gz ; 
tar -zxf jdk-8u144-linux-x64.tar.gz;

c. create local directories

 mkdir -p /mnt/dfs/namenode /mnt/dfs/data /mnt/yarn/nm-local-dir /mnt/yarn/nm-log /mnt/dfs/journal;  
chown -R stack:stack /mnt/dfs/ /mnt/yarn

d. add profile /etc/profile.d/hadoop.sh

HADOOP_HOME=/home/stack/hadoop-2.7.4
export HADOOP_HOME
HADOOP_PREFIX=/home/stack/hadoop-2.7.4
export HADOOP_PREFIX
export HADOOP_CONF_DIR=/home/stack/hadoop-2.7.4/etc/hadoop
export YARN_CONF_DIR=/home/stack/hadoop-2.7.4/etc/hadoop

2. DFS/YARN configuration

Since all my hosts have the same size, I only need to add new host list to etc/hadoop/slaves and copy all configurations to new hosts.

3. enable DataNode/NodeManager services

$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode

$HADOOP_HOME/sbin/yarn-daemon.sh --config $HADOOP_HOME/etc/hadoop start nodemanager

Now you should see the new nodes in both HDFS and YARN cluster.