Skip to content

Commit

Permalink
Merge pull request #146 from big-data-europe/3.2.1-hadoop3.2
Browse files Browse the repository at this point in the history
Add support for Spark 3.2.1-hadoop3.2
  • Loading branch information
GezimSejdiu authored Jul 1, 2022
2 parents 67a8214 + d7838c2 commit 7050229
Show file tree
Hide file tree
Showing 21 changed files with 31 additions and 34 deletions.
17 changes: 9 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Docker images to:
<details open>
<summary>Currently supported versions:</summary>

* Spark 3.2.1 for Hadoop 3.2 with OpenJDK 8 and Scala 2.12
* Spark 3.2.0 for Hadoop 3.2 with OpenJDK 8 and Scala 2.12
* Spark 3.1.2 for Hadoop 3.2 with OpenJDK 8 and Scala 2.12
* Spark 3.1.1 for Hadoop 3.2 with OpenJDK 8 and Scala 2.12
Expand Down Expand Up @@ -51,15 +52,15 @@ Add the following services to your `docker-compose.yml` to integrate a Spark mas
version: '3'
services:
spark-master:
image: bde2020/spark-master:3.2.0-hadoop3.2
image: bde2020/spark-master:3.2.1-hadoop3.2
container_name: spark-master
ports:
- "8080:8080"
- "7077:7077"
environment:
- INIT_DAEMON_STEP=setup_spark
spark-worker-1:
image: bde2020/spark-worker:3.2.0-hadoop3.2
image: bde2020/spark-worker:3.2.1-hadoop3.2
container_name: spark-worker-1
depends_on:
- spark-master
Expand All @@ -68,7 +69,7 @@ services:
environment:
- "SPARK_MASTER=spark://spark-master:7077"
spark-worker-2:
image: bde2020/spark-worker:3.2.0-hadoop3.2
image: bde2020/spark-worker:3.2.1-hadoop3.2
container_name: spark-worker-2
depends_on:
- spark-master
Expand All @@ -77,7 +78,7 @@ services:
environment:
- "SPARK_MASTER=spark://spark-master:7077"
spark-history-server:
image: bde2020/spark-history-server:3.2.0-hadoop3.2
image: bde2020/spark-history-server:3.2.1-hadoop3.2
container_name: spark-history-server
depends_on:
- spark-master
Expand All @@ -92,12 +93,12 @@ Make sure to fill in the `INIT_DAEMON_STEP` as configured in your pipeline.
### Spark Master
To start a Spark master:

docker run --name spark-master -h spark-master -d bde2020/spark-master:3.2.0-hadoop3.2
docker run --name spark-master -h spark-master -d bde2020/spark-master:3.2.1-hadoop3.2

### Spark Worker
To start a Spark worker:

docker run --name spark-worker-1 --link spark-master:spark-master -d bde2020/spark-worker:3.2.0-hadoop3.2
docker run --name spark-worker-1 --link spark-master:spark-master -d bde2020/spark-worker:3.2.1-hadoop3.2

## Launch a Spark application
Building and running your Spark application on top of the Spark cluster is as simple as extending a template Docker image. Check the template's README for further documentation.
Expand All @@ -117,11 +118,11 @@ It will also setup a headless service so spark clients can be reachable from the

Then to use `spark-shell` issue

`kubectl run spark-base --rm -it --labels="app=spark-client" --image bde2020/spark-base:3.2.0-hadoop3.2 -- bash ./spark/bin/spark-shell --master spark://spark-master:7077 --conf spark.driver.host=spark-client`
`kubectl run spark-base --rm -it --labels="app=spark-client" --image bde2020/spark-base:3.2.1-hadoop3.2 -- bash ./spark/bin/spark-shell --master spark://spark-master:7077 --conf spark.driver.host=spark-client`

To use `spark-submit` issue for example

`kubectl run spark-base --rm -it --labels="app=spark-client" --image bde2020/spark-base:3.2.0-hadoop3.2 -- bash ./spark/bin/spark-submit --class CLASS_TO_RUN --master spark://spark-master:7077 --deploy-mode client --conf spark.driver.host=spark-client URL_TO_YOUR_APP`
`kubectl run spark-base --rm -it --labels="app=spark-client" --image bde2020/spark-base:3.2.1-hadoop3.2 -- bash ./spark/bin/spark-submit --class CLASS_TO_RUN --master spark://spark-master:7077 --deploy-mode client --conf spark.driver.host=spark-client URL_TO_YOUR_APP`

You can use your own image packed with Spark and your application but when deployed it must be reachable from the workers.
One way to achieve this is by creating a headless service for your pod and then use `--conf spark.driver.host=YOUR_HEADLESS_SERVICE` whenever you submit your application.
2 changes: 1 addition & 1 deletion base/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ ENV INIT_DAEMON_BASE_URI http://identifier/init-daemon
ENV INIT_DAEMON_STEP spark_master_init

ENV BASE_URL=https://archive.apache.org/dist/spark/
ENV SPARK_VERSION=3.2.0
ENV SPARK_VERSION=3.2.1
ENV HADOOP_VERSION=3.2

COPY wait-for-step.sh /
Expand Down
2 changes: 1 addition & 1 deletion build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

set -e

TAG=3.2.0-hadoop3.2
TAG=3.2.1-hadoop3.2

build() {
NAME=$1
Expand Down
4 changes: 2 additions & 2 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
version: '3'
services:
spark-master:
image: bde2020/spark-master:3.2.0-hadoop3.2
image: bde2020/spark-master:3.2.1-hadoop3.2
container_name: spark-master
ports:
- "8080:8080"
- "7077:7077"
environment:
- INIT_DAEMON_STEP=setup_spark
spark-worker-1:
image: bde2020/spark-worker:3.2.0-hadoop3.2
image: bde2020/spark-worker:3.2.1-hadoop3.2
container_name: spark-worker-1
depends_on:
- spark-master
Expand Down
2 changes: 1 addition & 1 deletion examples/maven/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM bde2020/spark-maven-template:3.2.0-hadoop3.2
FROM bde2020/spark-maven-template:3.2.1-hadoop3.2

LABEL MAINTAINER="Gezim Sejdiu <[email protected]>"

Expand Down
2 changes: 1 addition & 1 deletion examples/maven/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,5 @@ To run the application, execute the following steps:
```
3. Run the Docker container:
```bash
docker run --rm --network dockerspark_default --name spark-maven-example bde2020/spark-maven-example:3.2.0-hadoop3.2
docker run --rm --network dockerspark_default --name spark-maven-example bde2020/spark-maven-example:3.2.1-hadoop3.2
```
2 changes: 1 addition & 1 deletion examples/maven/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<scala.version>2.12.13</scala.version>
<scala.binary.version>2.12</scala.binary.version>
<spark.version>3.1.1</spark.version>
<spark.version>3.2.1</spark.version>
</properties>

<dependencies>
Expand Down
2 changes: 1 addition & 1 deletion examples/python/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM bde2020/spark-python-template:3.2.0-hadoop3.2
FROM bde2020/spark-python-template:3.2.1-hadoop3.2

COPY wordcount.py /app/

Expand Down
2 changes: 1 addition & 1 deletion examples/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,5 @@ To run the application, execute the following steps:
```
3. Run the Docker container:
```bash
docker run --rm --network dockerspark_default --name pyspark-example bde2020/spark-python-example:3.2.0-hadoop3.2
docker run --rm --network dockerspark_default --name pyspark-example bde2020/spark-python-example:3.2.1-hadoop3.2
```
2 changes: 1 addition & 1 deletion history-server/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM bde2020/spark-base:3.2.0-hadoop3.2
FROM bde2020/spark-base:3.2.1-hadoop3.2

LABEL maintainer="Gezim Sejdiu <[email protected]>, Giannis Mouchakis <[email protected]>"

Expand Down
4 changes: 2 additions & 2 deletions k8s-spark-cluster.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ spec:
spec:
containers:
- name: spark-master
image: bde2020/spark-master:3.2.0-hadoop3.2
image: bde2020/spark-master:3.2.1-hadoop3.2
imagePullPolicy: Always
ports:
- containerPort: 8080
Expand All @@ -70,7 +70,7 @@ spec:
spec:
containers:
- name: spark-worker
image: bde2020/spark-worker:3.2.0-hadoop3.2
image: bde2020/spark-worker:3.2.1-hadoop3.2
imagePullPolicy: Always
ports:
- containerPort: 8081
2 changes: 1 addition & 1 deletion master/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM bde2020/spark-base:3.2.0-hadoop3.2
FROM bde2020/spark-base:3.2.1-hadoop3.2

LABEL maintainer="Gezim Sejdiu <[email protected]>, Giannis Mouchakis <[email protected]>"

Expand Down
2 changes: 1 addition & 1 deletion submit/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM bde2020/spark-base:3.2.0-hadoop3.2
FROM bde2020/spark-base:3.2.1-hadoop3.2

LABEL maintainer="Gezim Sejdiu <[email protected]>, Giannis Mouchakis <[email protected]>"

Expand Down
2 changes: 1 addition & 1 deletion template/maven/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM bde2020/spark-submit:3.2.0-hadoop3.2
FROM bde2020/spark-submit:3.2.1-hadoop3.2

LABEL maintainer="Gezim Sejdiu <[email protected]>, Giannis Mouchakis <[email protected]>"

Expand Down
2 changes: 1 addition & 1 deletion template/maven/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ If you overwrite the template's `CMD` in your Dockerfile, make sure to execute t

#### Example Dockerfile
```
FROM bde2020/spark-maven-template:3.2.0-hadoop3.2
FROM bde2020/spark-maven-template:3.2.1-hadoop3.2
MAINTAINER Erika Pauwels <[email protected]>
MAINTAINER Gezim Sejdiu <[email protected]>
Expand Down
2 changes: 1 addition & 1 deletion template/python/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM bde2020/spark-submit:3.2.0-hadoop3.2
FROM bde2020/spark-submit:3.2.1-hadoop3.2

LABEL maintainer="Gezim Sejdiu <[email protected]>, Giannis Mouchakis <[email protected]>"

Expand Down
2 changes: 1 addition & 1 deletion template/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ If you overwrite the template's `CMD` in your Dockerfile, make sure to execute t

#### Example Dockerfile
```
FROM bde2020/spark-python-template:3.2.0-hadoop3.2
FROM bde2020/spark-python-template:3.2.1-hadoop3.2
MAINTAINER You <[email protected]>
Expand Down
2 changes: 1 addition & 1 deletion template/sbt/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM bde2020/spark-submit:3.2.0-hadoop3.2
FROM bde2020/spark-submit:3.2.1-hadoop3.2

LABEL maintainer="Gezim Sejdiu <[email protected]>, Giannis Mouchakis <[email protected]>"

Expand Down
6 changes: 1 addition & 5 deletions template/sbt/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,11 +62,7 @@ the `/template.sh` script at the end.
#### Example Dockerfile

```
<<<<<<< HEAD:template/sbt/README.md
FROM bde2020/spark-sbt-template:3.2.0-hadoop3.2
=======
FROM bde2020/spark-scala-template:3.2.0-hadoop3.2
>>>>>>> cd4cab298d8e63ecaf488ffaf80ed5f6df5d5384:template/scala/README.md
FROM bde2020/spark-sbt-template:3.2.1-hadoop3.2
MAINTAINER Cecile Tonglet <[email protected]>
Expand Down
2 changes: 1 addition & 1 deletion template/sbt/build.sbt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
scalaVersion := "2.12.14"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % "3.2.0" % "provided"
"org.apache.spark" %% "spark-sql" % "3.2.1" % "provided"
)
2 changes: 1 addition & 1 deletion worker/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM bde2020/spark-base:3.2.0-hadoop3.2
FROM bde2020/spark-base:3.2.1-hadoop3.2

LABEL maintainer="Gezim Sejdiu <[email protected]>, Giannis Mouchakis <[email protected]>"

Expand Down

0 comments on commit 7050229

Please sign in to comment.