Skip to content

Latest commit

 

History

History
118 lines (70 loc) · 2.95 KB

README.md

File metadata and controls

118 lines (70 loc) · 2.95 KB

Hadoop

if package is not showing than we can make that folder mark directory as root folder and we will see the package option.

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. We can run hadoop using docker.

set up hadoop cluster

mapreduce program link: youtube

questions: Link

/home/shreshthajit/docker-hadoop# docker cp ../Desktop/hadoop-mapreduce-examples-2.7.1-sources.jar 013d76107704:hadoop-mapreduce-examples-2.7.1-sources.jar first go to root user:

sudo su

docker problems: Problem

to stop running container :
docker kill contaninerID

to remove a container:
docker rm containerID

to start and stop docker-compose:
docker-compose stop
docker-compose start

First We will setup docker in ubuntu:

commnad:

sudo apt install docker.io

docker --version

sudo systemctl status docker

sudo systemctl enable --now docker

sudo systemctl status docker

sudo docker run hello-world  ///this command will create a image called  hello-world

docker images

install hadoop:

docker-compose --version

docker-machine ---version

docker run -d -p 80:80 --name myserver nginx

visit http://localhost to view the homepage of your new server.

download this: link

next command:

docker-compose up -d

docker ps

Go to link to view the current status of the system from the namenode.

Testing hadoop culster.

docker exec -it namenode bash

mkdir input

echo "Hello World" >input/f1.txt

echo "Hello Docker" >input/f2.txt

hadoop fs -mkdir -p input

hdfs dfs -put ./input/* input

docker container ls //this command will have to run into the docker folder

word count file link: link

docker cp ../hadoop-mapreduce-examples-2.7.1-sources.jar cb0c13085cd3:hadoop-mapreduce-examples-2.7.1-sources.jar

to see the files use this command:

hadoop jar hadoop-mapreduce-examples-2.7.1-sources.jar org.apache.hadoop.examples.WordCount input output