Assumes:-
Starting from clean basic Ubuntu AMI.
If running free instances from AWS.
Then IP addresses may get changed; needs to be updated each time in files.
Using Putty from windows for connecting to AWS instance.
For convenience list host names below
Master
MasterID :- ec2-54-174-46-166.compute-1.amazonaws.com
Slaves
Slave1 :- ec2-54-165-137-226.compute-1.amazonaws.com
Slave2 :- ec2-54-173-232-142.compute-1.amazonaws.com
PEM file :-
awssecuritykey.pem
PPK file :-
PuttyKey.ppk
Connecting using Putty to AWS instance:-
1) Convert awssecuritykey.pem into PuttyKey.ppk using PuttyGen.
2) Open Putty
a) Goto session in hostname enter username@ID of master. port 22
b) Goto connection --> SSH --> Auth --> In box for Private key for authentication "Browse"; browse and select PPK key generated in first step.
it may ask to store server's host key in authenticated hosts. select "yes" after checking.
#Run commands on each node (master and slave)
sudo apt-get update
sudo apt-get upgrade
#install JDK
sudo apt-get install openjdk-7-jdk
# Copy your PEM files from local folder to all your instances. This is to ensure SSH connection between servers is authenticated.
# Using PSCP command; available from Putty folder. WinSCP can also be used which have GUI from windows.
pscp -i D:\Downloads\PuttyKey.ppk D:\Downloads\awssecuritykey.pem ubuntu@ec2-54-174-46-166.compute-1.amazonaws.com:/home/ubuntu/.ssh
pscp -i D:\Downloads\PuttyKey.ppk D:\Downloads\awssecuritykey.pem ubuntu@ec2-54-165-137-226.compute-1.amazonaws.com:/home/ubuntu/.ssh
pscp -i D:\Downloads\PuttyKey.ppk D:\Downloads\awssecuritykey.pem ubuntu@ec2-54-173-232-142.compute-1.amazonaws.com:/home/ubuntu/.ssh
#To fix this problem, we need to issue following commands
chmod 644 authorized_keys
# Quick Tip: If you set the permissions to ‘chmod 644', you get a file that can be written by you, but can only be read by the rest of the world.
chmod 400 awssecuritykey.pem
#Quick Tip: chmod 400 is a very restrictive setting giving only the file onwer read-only access. No write / execute capabilities for the owner, and no permissions what-so-ever for anyone else.
#To use ssh-agent and ssh-add, follow the steps below:
#go to directory where these PEM files are stored. enter following commands one after another. following commands shoudl be repeaseted each time terminal is opened.
eval `ssh-agent`
ssh-add awssecuritykey.pem
#checking localhost for ssh
ssh localhost # if no error is observed then; no issues.
#At the Unix prompt, enter: eval `ssh-agent`Note: Make sure you use the backquote ( ` ), located under the tilde ( ~ ), rather than the single quote ( ' ).
#Enter the command: ssh-add hadoopec2cluster.pem
#if you notice .pem file has “read-only” permission now and this time it works for us.
#checking remote SSH
#<your-amazom-public-URL>; if you are on master then try conecting slave1 or slave2. and viceversa
ssh ubuntu@ec2-54-174-46-166.compute-1.amazonaws.com
# download and installing hadoop on each node. follwoign command will download in /home folder
wget http://apache.mirror.gtcomm.net/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz
# extracting
tar -xvf hadoop-1.2.1.tar.gz
# renaming for convenience
mv hadoop-1.2.1 hadoop
# optionally remove
rm hadoop-1.2.1.tar.gz
# updating path (.bashrc for ubuntu)
update .bashrc
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_INSTALL=/home/ubuntu/hadoop
export HADOOP_HOME=/home/ubuntu/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
#Update following files only on master node only. or Update on local PC. If updated on local PC; then it should be transferred to master and slaves through PSCP/WInSCP.
#Hadoop_env.sh
#search for JAVA_HOME parameter. update it with Java home path
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
#core site
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://ec2-54-173-232-142.compute-1.amazonaws.com:8020</value>
<final>true</final>
</property>
</configuration>
#hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
#mapred-site.xml
----------
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://ec2-54-172-198-182.compute-1.amazonaws.com:8021</value>
<final>true</final>
</property>
</configuration>
#========================================================
transfer above configuration files to hadoop/slaves
# from master to slaves
cd hadoop/conf
scp hadoop-env.sh core-site.xml hdfs-site.xml mapred-site.xml ubuntu@ec2-54-165-137-226.compute-1.amazonaws.com:/home/ubuntu/hadoop/conf
scp hadoop-env.sh core-site.xml hdfs-site.xml mapred-site.xml ubuntu@ec2-54-174-46-166.compute-1.amazonaws.com:/home/ubuntu/hadoop/conf
# masters ; if secondary namenode should start form other node then mention "ID" in this file here; or leave blank.
# on slave machines keep masters file blank.
#slaves file
# on Master machine; mention both "IP ID" in slaves file.
ec2-54-165-137-226.compute-1.amazonaws.com
ec2-54-173-232-142.compute-1.amazonaws.com
# on slaves machine; only that machines IP ID should be entered.
on slave 1; "slaves" file will have "ec2-54-165-137-226.compute-1.amazonaws.com" this line only.
on slave 2; "slaves" file will have "ec2-54-173-232-142.compute-1.amazonaws.com" this line only.
#format file-system
hadoop namenode –format
# on Master node :- goto bin directory. On ubuntu simply typing and entering works.
cd hadoop/bin
start-all.sh
check health status by
name node
http://ec2-54-173-232-142.compute-1.amazonaws.com:50070/dfshealth.jsp
jobtracker
http://ec2-54-173-232-142.compute-1.amazonaws.com:50030/jobtracker.jsp
slave node status
http://ec2-54-174-46-166.compute-1.amazonaws.com:50060/tasktracker.jsp
http://ec2-54-165-137-226.compute-1.amazonaws.com:50060/tasktracker.jsp
No comments:
Post a Comment