Monday, November 24, 2014

installing hadoop on aws free instance


Assumes:-
Starting from clean basic Ubuntu AMI.
If running free instances from AWS.
Then IP addresses may get changed; needs to be updated each time in files.
Using Putty from windows for connecting to AWS instance.


For convenience list host names below
Master
MasterID :- ec2-54-174-46-166.compute-1.amazonaws.com
Slaves
Slave1 :- ec2-54-165-137-226.compute-1.amazonaws.com
Slave2 :- ec2-54-173-232-142.compute-1.amazonaws.com

PEM file :-
awssecuritykey.pem

PPK file :-
PuttyKey.ppk

Connecting using Putty to AWS instance:-
1) Convert awssecuritykey.pem into PuttyKey.ppk using PuttyGen.
2) Open Putty
a) Goto session in hostname enter username@ID of master. port 22
b) Goto connection --> SSH --> Auth --> In box for Private key for authentication "Browse"; browse and select PPK key generated in first step.
it may ask to store server's host key in authenticated hosts. select "yes" after checking.


#Run commands on each node (master and slave)
sudo apt-get update
sudo apt-get upgrade

#install JDK
sudo apt-get install openjdk-7-jdk

# Copy your PEM files from local folder to all your instances. This is to ensure SSH connection between servers is authenticated.
# Using PSCP command; available from Putty folder. WinSCP can also be used which have GUI from windows.

pscp -i D:\Downloads\PuttyKey.ppk D:\Downloads\awssecuritykey.pem ubuntu@ec2-54-174-46-166.compute-1.amazonaws.com:/home/ubuntu/.ssh

pscp -i D:\Downloads\PuttyKey.ppk D:\Downloads\awssecuritykey.pem ubuntu@ec2-54-165-137-226.compute-1.amazonaws.com:/home/ubuntu/.ssh

pscp -i D:\Downloads\PuttyKey.ppk D:\Downloads\awssecuritykey.pem ubuntu@ec2-54-173-232-142.compute-1.amazonaws.com:/home/ubuntu/.ssh


#To fix this problem, we need to issue following commands

chmod 644 authorized_keys

# Quick Tip: If you set the permissions to ‘chmod 644', you get a file that can be written by you, but can only be read by the rest of the world.

chmod 400 awssecuritykey.pem

#Quick Tip: chmod 400 is a very restrictive setting giving only the file onwer read-only access. No write / execute capabilities for the owner, and no permissions what-so-ever for anyone else.

#To use ssh-agent and ssh-add, follow the steps below:
#go to directory where these PEM files are stored. enter following commands one after another. following commands shoudl be repeaseted each time terminal is opened.

eval `ssh-agent`
ssh-add awssecuritykey.pem

#checking localhost for ssh
ssh localhost # if no error is observed then; no issues.


#At the Unix prompt, enter: eval `ssh-agent`Note: Make sure you use the backquote ( ` ), located under the tilde ( ~ ), rather than the single quote ( ' ).
#Enter the command: ssh-add hadoopec2cluster.pem
#if you notice .pem file has “read-only” permission now and this time it works for us.
#checking remote SSH

#<your-amazom-public-URL>; if you are on master then try conecting slave1 or slave2. and viceversa
ssh ubuntu@ec2-54-174-46-166.compute-1.amazonaws.com


# download and  installing hadoop on each node. follwoign command will download in /home folder
wget http://apache.mirror.gtcomm.net/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz

# extracting
tar -xvf hadoop-1.2.1.tar.gz

# renaming for convenience
mv hadoop-1.2.1 hadoop

# optionally remove
rm hadoop-1.2.1.tar.gz


# updating path (.bashrc for ubuntu)

update .bashrc
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_INSTALL=/home/ubuntu/hadoop
export HADOOP_HOME=/home/ubuntu/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin


#Update following files only on master node only. or Update on local PC. If updated on local PC; then it should be transferred to master and slaves through PSCP/WInSCP.

#Hadoop_env.sh
#search for JAVA_HOME parameter. update it with Java home path

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

#core site
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://ec2-54-173-232-142.compute-1.amazonaws.com:8020</value>
<final>true</final>
</property>
</configuration>


#hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>

</configuration>




#mapred-site.xml
----------

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>


<property>
<name>mapred.job.tracker</name>
<value>hdfs://ec2-54-172-198-182.compute-1.amazonaws.com:8021</value>
<final>true</final>
</property>

</configuration>

#========================================================

transfer above configuration files to hadoop/slaves

# from master to slaves

cd hadoop/conf

scp hadoop-env.sh core-site.xml hdfs-site.xml mapred-site.xml ubuntu@ec2-54-165-137-226.compute-1.amazonaws.com:/home/ubuntu/hadoop/conf

scp hadoop-env.sh core-site.xml hdfs-site.xml mapred-site.xml ubuntu@ec2-54-174-46-166.compute-1.amazonaws.com:/home/ubuntu/hadoop/conf


# masters ; if secondary namenode should start form other node then mention "ID" in this file here; or leave blank.
# on slave machines keep masters file blank.

#slaves file
# on Master machine; mention both "IP ID" in slaves file.
ec2-54-165-137-226.compute-1.amazonaws.com
ec2-54-173-232-142.compute-1.amazonaws.com

# on slaves machine; only that machines IP ID should be entered.
on slave 1; "slaves" file will have "ec2-54-165-137-226.compute-1.amazonaws.com" this line only.
on slave 2; "slaves" file will have "ec2-54-173-232-142.compute-1.amazonaws.com" this line only.

#format file-system
hadoop namenode –format

# on Master node :- goto bin directory. On ubuntu simply typing and entering works.

cd hadoop/bin
start-all.sh



check health status by
name node
http://ec2-54-173-232-142.compute-1.amazonaws.com:50070/dfshealth.jsp

jobtracker
http://ec2-54-173-232-142.compute-1.amazonaws.com:50030/jobtracker.jsp

slave node status
http://ec2-54-174-46-166.compute-1.amazonaws.com:50060/tasktracker.jsp
http://ec2-54-165-137-226.compute-1.amazonaws.com:50060/tasktracker.jsp

No comments:

Post a Comment