This repository contains a Dockerfile to set up a single-node Hadoop HDFS container using Docker. The container allows you to run a NameNode and a DataNode, exposing the HDFS web UI on port 9870
. Additionally, you can use the provided Python script to upload and read files from HDFS.
- Docker Desktop installed on your computer
Follow these steps to install Docker Desktop on your machine:
- Download Docker Desktop for Windows from Docker Hub.
- Run the installer and follow the on-screen instructions.
- After installation, launch Docker Desktop, and ensure it is running.
- Install Docker by following the instructions on the Docker documentation.
git clone https://github.com/ragegen/hadoop-python-docker.git
cd hadoop-python-docker
docker build -t hadoop-python .
docker run -d --name hadoop-python -p 9870:9870 hadoop-python
-
The
-d
flag runs the container in detached mode. -
The
--name
flag assigns a name (hadoop-python) to the container. -
The
-p 9870:9870
flag maps port 9870 from the container to 9870 on the host machine, making the HDFS web UI accessible athttp://localhost:9870
.
docker exec -it hadoop-python bash -c 'echo "Hello, HDFS! This is a test file." > /example.txt'
docker exec -it hadoop-python python3 script.py
Uploading example.txt to /user/hadoop/example.txt in HDFS...
Upload completed successfully.
Reading the contents of /user/hadoop/example.txt from HDFS:
File content:
Hello, HDFS! This is a test file.
Navigate to http://localhost:9870
in your browser and browse to /user/hadoop/
. You should see example.txt
listed in the directory.
docker stop hadoop-python
docker rm hadoop-python
docker rmi hadoop-python
This project is licensed under the MIT License - see the LICENSE file for details.