Install Hadoop On Aws Free Tier

  1. Installing Hadoop On Aws Free Tier
  2. Aws Hadoop S3
  3. Install Hadoop Client
  4. Install Hadoop On Aws Free Tier List
  5. Hadoop Vs Aws
  6. Install Hadoop On Aws Free Tier Account

Now we can deploy hadoop on amazon ec2 using whirr. Cd whirr-0.8.1/ bin/whirr launch-cluster -config /hadoop-ec2.properties. It will start 2 micro instances on amazon EC2. Then it will install java and hadoop. On one instance, we’ll have the hadoop namenode and the hadoop jobtracker and on the other instance we’ll have the tasktracker. AWS “free-tier” services are divided into three segments 1. Short term Free Trials 3. 12 months Free There are 170+ services and about 85 of them are offered in the free tier.

The goal of this blog is to automate Hadoop Multi-Node Cluster installation and configuration on Amazon EC2 instances. If thats what you are looking for then read on.

WARNING!!! The cluster is only for practice purposes!! Its not highly secure. If you want a highly secure cluster then you have to apply more strict security settings. The security settings are purposely kept low so as to deploy the cluster smoothly without any errors.

We will be building a 5 node cluster on Amazon EC2 t2.micro instances. The Hadoop version we will be using is 1.2.1 and Operating System will be Ubuntu Server 14.04 LTS (HVM). The username is ubuntu for all 5 machines.

The hostnames of the 5 machines will be nn for namenode, 2nn for secondary namenode, d1 for datanode 1, d2 for datanode 2 and d3 for datanode 3.

Client

This is how the Hadoop Daemons will run on the 5 instances:

Namenode and Jobtracker will run on nn.

Secondary Namenode will run on 2nn.

TaskTracker and DataNode daemons will run on d1, d2 and d3.

Before starting you will have to download these two tools (PuTTY and PuTTYGEN), hadoop bash script and ec2 config file:

1. PuTTY: http://the.earth.li/~sgtatham/putty/latest/x86/putty.exe

2. PuTTYGEN: http://the.earth.li/~sgtatham/putty/latest/x86/puttygen.exe

3. Hadoop Bash Script: https://drive.google.com/file/d/0B2T8Pye0P7e5Qm93cWwwN3ZOUUk/view?usp=sharing

4. Hadoop hosts Config: https://drive.google.com/file/d/0B2T8Pye0P7e5bUVPdlFoWXhUNjQ/view?usp=sharing

Lets start with the multi-node cluster.

STEP 1: Logging-in into your Amazon Web Services account

If you do not have an AWS account then make one. You will have to add a credit/debit card and send a fax copy of any government ID proof of yours to Amazon. It will take 1 or 2 days to activate. After logging into your amazon web services account you need to do two things:

A. Create a Keypair for logging into your instances.

B. Create a security group.

Important!! Name your Amazon Private key as “Key1″ so that everything goes smooth through the tutorial. If you want you can name it as you like but just carefully edit the Bash script according to the name of your key.

Below is the video on how to create a key pair and a security group:

STEP 2: Creating the EC2 Instances

Installing Hadoop On Aws Free Tier

We require 5 instances. We will choose the t2.micro instance and default 8GB EBS (Elastic Block Storage) drive for each instance. The Operating System we will use on all the instances is Ubuntu Server 14.04 LTS. The Bash Script will/might not work if you choose any other Operating System other than Ubuntu. The total EBS usage will be 40GB out of which 30GB is free under free tier scheme. The remaining 10GB will be charged at a negligible rate of $0.1 per GB per month. According to my experience Amazon EBS is charged on hourly basis and not monthly. These two discussions also seems to confirm it: http://stackoverflow.com/questions/5468535/amazon-ebs-pricing-monthly-daily-hourlyhttp://serverfault.com/questions/197379/amazon-ebs-charges-calculation

If you dont know much about free tier please carefully take a look on all the things that are free under the free tier scheme (Link: http://aws.amazon.com/free/). As far as our tutorial is considered t2.micro instance upto 750hrs/month , 30 GB of EBS, Ubuntu Server 14.04 and 5GB of S3 Storage are free (As of January 2015).

Upload the Key1.pem file to the s3 bucket. And replace the private-key-link and private-key-name in the EC2-Launch-Config-Multinode file. It is shown in the video.

Here is the video on how to create the 5 instances:

Important!!! Do not forget to paste the bash script in User Data in Advance Details while configuring the t2.micro instance.

Wait till you get 2/2 checks for all 5 instances in the Status Checks column. It will take around 3-5 mins.

Step 3: Logging-in into machines and configuring

Aws Hadoop S3

In this step we will login in to all the 5 instances. We will add and edit the hosts, hostname, masters and slaves config files. Here is the video for step 3 and step 4:

Tip!!! Right Clicking on the PuTTY terminal will paste the text which is there in the clipboard. While copy pasting the commands select one more line to send “enter” along with the commands. Carefully follow all the steps in the same order for smooth deployment.

You can check the cluster summary from your browser over here:

Install hadoop on aws free tier account

1. Namenode summary at [Namenode(nn) Public DNS]:50070

2. Jobtracker summary at [Namenode(nn) Public DNS]:50030

Step 4: Starting the Cluster and exploring hadoop

In this step we will fire up the hadoop daemons and start exploring.

Install Hadoop Client

Hadoop Shell commands: http://hadoop.apache.org/docs/r0.18.3/hdfs_shell.html#cat

Google for use cases of multi-node hadoop cluster. Let me know in the comments if you get stuck anywhere in the tutorial. I will help you resolve the issues.

Install Hadoop On Aws Free Tier List

I hope this blog was informative for you. And I would like to thank you for reading it.

Hadoop Vs Aws

-Mohammad Yusuf Ghazi

Install Hadoop On Aws Free Tier Account

19.07555172.996987