Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hadoop 2.6.1 HDFS, YARN and MapReduce working successfully #4

Open
wants to merge 24 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
c1caf37
Deleted unnecessary files
hemenkapadia Apr 9, 2015
0c4e114
Working vagrantfile without provisioning. Moved directories under puppet
hemenkapadia Apr 9, 2015
95a77fa
Working librarian-puppet
hemenkapadia Apr 10, 2015
315a81d
Rename base-hadoop.pp file to main.pp
hemenkapadia Apr 10, 2015
20955b6
Working puppet provisioning. Hadoop user, group and home dir creation…
hemenkapadia Apr 11, 2015
3270788
Refactor vagrantfile. Working VM setup and puppet site provisioning
hemenkapadia Apr 15, 2015
a6391cf
Working Hadoop. Still need to work on conf files per 2.6.0 standard
hemenkapadia Apr 16, 2015
9144f7d
Commeting Hadoop module. All set for installation using Ambari
hemenkapadia Jun 17, 2015
7740aa0
Merge pull request #1 from hemenkapadia/hemen_initial
hemenkapadia Jun 17, 2015
519cdb0
Working ssh communication and Java install. Fixes issue #4 and issue #4
hemenkapadia Oct 19, 2015
aed6834
Merge branch 'master' of https://github.com/hemenkapadia/vagrant-hado…
hemenkapadia Oct 19, 2015
cd33ddc
Merge pull request #5 from hemenkapadia/hemen_initial
hemenkapadia Oct 19, 2015
dd8bca0
Organizing branch
hemenkapadia Oct 19, 2015
9e30125
Completed Hadoop code, minor issues pending
hemenkapadia Oct 20, 2015
e7be244
working HDFS
hemenkapadia Oct 21, 2015
a8f6bd1
Working Hadoop MapReduce
hemenkapadia Oct 22, 2015
312a7d2
Fixes #7. Hadoop 2.6.1 successfully running on cluster
hemenkapadia Oct 22, 2015
99cac8d
Merge pull request #9 from hemenkapadia/hadoop
hemenkapadia Oct 22, 2015
3377c56
Fixes issue #8 and some of the gaps mentioned in issue #7
hemenkapadia Nov 4, 2015
88a5e12
minor typo in readme
hemenkapadia Nov 4, 2015
6bd4a80
Minor chnages to the Readme file
hemenkapadia Nov 4, 2015
9b3d8ab
Fixes #15 - ubuntu 14.04 LTS, upgraded puppet 4.x that comes with it.
hemenkapadia May 14, 2016
1c7386e
Fixes #14 - Upgrade Java and hadoop to latest versions
hemenkapadia May 14, 2016
cbb6cce
Readme update
hemenkapadia May 14, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
.vagrant/
.vagrant/
*.zip
*.log
1 change: 0 additions & 1 deletion .ruby-version

This file was deleted.

1 change: 0 additions & 1 deletion .vagrant.v1.1401182991

This file was deleted.

130 changes: 0 additions & 130 deletions AWS/Vagrantfile

This file was deleted.

103 changes: 90 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,107 @@
vagrant-hadoop-cluster
======================

Deploying hadoop in a virtualized cluster in simple steps
Motivation
----------

These are the files that support the blogpost http://cscarioni.blogspot.co.uk/2012/09/setting-up-hadoop-virtual-cluster-with.html
* Hassel free setup of a 4 node VM hadoop cluster
* to understand the workings of hadoop in an actual cluster environment
* to provide a platform to quickly evaluate projects in the Hadoop ecosystem
* and to be able to do this on a single machine

For using them.
System Requirements
-------------------

Simply clone the repository, then
At a minimum you need a system with good processing power and minimum 8 GB of RAM, 16 GB or more is recommended. If you have sufficient RAM, then it is recommended you increase the RAM alloted to each node machine in the Vagrantfile.

`gem install vagrant `
Installation
------------

`vagrant box add base-hadoop http://files.vagrantup.com/lucid64.box`
### Step 1. System Set-up

Go ahead and get the following tools installed on your system (Referred to as `HOST` throughout the document)

Maybe generate your own ssh-keygen pair of keys.. and replace them in the files id_rsa and id_rsa.pub in the modules/hadoop/files directory. Or for testing copy the provided `id_rsa` and `id_rsa.pub` into your `.ssh` directory.
1. [Git](http://git-scm.com/downloads)
2. [Virtualbox](https://www.virtualbox.org/wiki/Downloads)
3. [Vagrant](https://www.vagrantup.com/downloads.html)


`vagrant up`
### Step 2. Clone the GitHub repository

then
Follow these commands to clone the vagrant-hadoop-cluster repository on the `HOST`

`vagrant ssh master`
git clone https://github.com/hemenkapadia/vagrant-hadoop-cluster

`cd /opt/hadoop-xxx/bin`
### Step 3. Start the cluster

`./hadoop namenode -format`
Follow these commands to provision the hadoop cluster

`./start-all`
cd vagrant-hadoop-cluster
vagrant up

This will take some time to provision 4 nodes.

### Step 4. Format namenode and start services

vagrant ssh master.local

Once you are in the VM guest master.local, enter the following commands

sudo su - hadoop
hdfs namenode -format
start-dfs.sh
start-yarn.sh

### Step 5. Validate below services are running

hekapadi@HEKAPADI-W7-3 ~/Workspaces/Personal/vagrant-hadoop-cluster (master)
$ vagrant ssh master.local
Welcome to Ubuntu 12.04.5 LTS (GNU/Linux 3.13.0-32-generic x86_64)

* Documentation: https://help.ubuntu.com/
New release '14.04.2 LTS' available.
Run 'do-release-upgrade' to upgrade to it.

Last login: Tue Nov 3 14:22:50 2015 from 10.0.2.2
vagrant@master:~$ sudo su - hadoop
hadoop@master:~$ jps
18951 ResourceManager
15454 Jps
14634 SecondaryNameNode
14447 NameNode
hadoop@master:~$ ssh hadoop1.local jps
12671 DataNode
12773 NodeManager
13206 Jps
hadoop@master:~$ ssh hadoop2.local jps
12778 NodeManager
13211 Jps
12676 DataNode
hadoop@master:~$ ssh hadoop3.local jps
12670 DataNode
12772 NodeManager
13206 Jps

Web Consoles
------------
Once the services are up and running on the master as well as the slave, as noted in the test mentioed in the above, the followin web consoles are available for login

* HDFS - [http://192.168.48.10:50070/dfshealth.html#tab-overview]
* YARN - http://192.168.48.10:8088/cluster/cluster

System Details
--------------

Note - Items not hyperlinked are not yet installed, but are in the plan

1. [Ubuntu 14.04 64 bit](https://atlas.hashicorp.com/puppetlabs/boxes/ubuntu-14.04-64-puppet)
2. [OpenJDK 8](http://openjdk.java.net/install/index.html)
3. [Hadoop 2.7.2](http://hadoop.apache.org/docs/r2.7.2/)
4. CRAN R and RStudio Server Community Edition
5. Spark
6. Crunch/Cascading/CDAP Cask
7. Hbase
8. Pig
9. Flume and Kafka
10. Storm / Flink
11. Oozie
118 changes: 79 additions & 39 deletions Vagrantfile
Original file line number Diff line number Diff line change
@@ -1,41 +1,81 @@
# -*- mode: ruby -*-
# vi: set ft=ruby :

Vagrant::Config.run do |config|
config.vm.box = "base-hadoop"
config.vm.customize [
"modifyvm", :id,
"--memory", "1024"
]
config.vm.provision :puppet do |puppet|
puppet.manifests_path = "manifests"
puppet.manifest_file = "base-hadoop.pp"
puppet.module_path = "modules"
end

config.vm.define :backup do |box|
box.vm.network :hostonly, "10.10.0.51"
box.vm.host_name = "backup"
end

config.vm.define :hadoop1 do |hadoop1_config|
hadoop1_config.vm.network :hostonly, "10.10.0.53"
hadoop1_config.vm.host_name = "hadoop1"
end

config.vm.define :hadoop2 do |hadoop2_config|
hadoop2_config.vm.network :hostonly, "10.10.0.54"
hadoop2_config.vm.host_name = "hadoop2"
end

config.vm.define :hadoop3 do |hadoop3_config|
hadoop3_config.vm.network :hostonly, "10.10.0.55"
hadoop3_config.vm.host_name = "hadoop3"
end

config.vm.define :master do |master_config|
master_config.vm.network :hostonly, "10.10.0.52"
master_config.vm.host_name = "master"
end
# Reference

# https://github.com/patrickdlee/vagrant-examples/blob/master/example7/Vagrantfile
# https://github.com/calo81/vagrant-hadoop-cluster
# http://www.highlyscalablesystems.com/3597/hadoop-installation-tutorial-hadoop-2-x/
# http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.15/bk_cluster-planning-guide/content/typical-hadoop-cluster-hardware.html
# https://github.com/Cascading/vagrant-cascading-hadoop-cluster/blob/2.7/Vagrantfile
# https://github.com/apache/bigtop/tree/master/bigtop-deploy/vm/vagrant-puppet-vm
# http://codiply.com/blog/hadoop-2-6-0-cluster-setup-on-ubuntu-14-04 - Good one for hadoop file config


# Warning

# VT-x/AMD-V is required to be enabled in your bios to boot a 64 bit guest machine
# To check if your CPU supports virtualization refer https://www.grc.com/securable.htm
# If it does, enable the same in BIOS


VAGRANTFILE_API_VERSION = "2"

# master runs HDFS NameNode, YARN ResourceManager, HBase Master (optional)
# slaves run HDFS DataNode, YARN NodeManager, HBase RegionServers (optional)

# increase ram as needed, add additional slave nodes if needed.

nodes = [
{ :type => 'master',
:hostname => 'master.local',
:ip => '192.168.48.10',
:cpus => '1',
:ram => '4096' },

{ :type => 'slave',
:hostname => 'hadoop1.local',
:ip => '192.168.48.11',
:cpus => '1',
:ram => '2048' },

{ :type => 'slave',
:hostname => 'hadoop2.local',
:ip => '192.168.48.12',
:cpus => '1',
:ram => '2048' },

{ :type => 'slave',
:hostname => 'hadoop3.local',
:ip => '192.168.48.13',
:cpus => '1',
:ram => '2048' },

]

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|

nodes.each do |node|
config.vm.define node[:hostname] do |node_config|
node_config.vm.box = "puppetlabs/ubuntu-14.04-64-puppet"
node_config.vm.box_version = "1.0.3" # optional, can be removed to default to latest version
node_config.vm.box_check_update = false # disabling for now
node_config.vm.host_name = node[:hostname]
node_config.vm.network "private_network", ip: node[:ip]
node_config.vm.provider :virtualbox do |vb|
vb.customize ["modifyvm", :id, "--cpus", node[:cpus]]
vb.customize ["modifyvm", :id, "--memory", node[:ram]]
# vb.gui = true # uncomment this line to debug virtual machine startup issues
end
# Shell provisioning - bootstrap for puppet
# install - git, ruby and librarian-puppet
node_config.vm.provision :shell, :path => 'shell/main.sh'
# Puppet provisioning
node_config.vm.provision "puppet" do |puppet|
#puppet.options = "--verbose --debug" # uncomment to enable verbose mode
puppet.environment_path = "puppet/environments"
puppet.environment = "dev"
puppet.manifests_path = "puppet/environments/dev/manifests"
puppet.manifest_file = node[:type] + ".pp"
puppet.module_path = "puppet/modules"
end
end
end
end
Loading