Starting from the 0.3.0 version Pipeline supports managed Kubernetes clusters on Google Cloud GKE as well.
Another change introduced in 0.3.0
is that Pipeline requires GitHub OAuth authentication similar to the CI/CD flow. With that we have a single mechanism to authenticate both Pipeline and CI/CD flow instead of what it was before where Pipeline required basic authentication while the CI/CD flow GitHub OAuth authentication.
For simplicity the instruction steps are presented through an example specifically how to hook a Spark application into a CI/CD workflow to run it on managed Kubernetes on Google Cloud (GKE).
It's assumed that the source of the Spark application is stored in GitHub.
The Pipeline Control Plane takes care of creating a Kubernetes cluster and executing the steps of the CI/CD flow on one of the supported cloud provider (AWS, Azure, Google Cloud).
To hook your Spark application into BanzaiCloud CI/CD flow the following steps are required:
Both Pipeline and CI/CD flow requires GitHub OAuth authentication and for this a OAuth application must be setup on GitHub.
Setup your Pipeline GitHub OAuth application according to this guilde
Deploy Control Plane
using Pipeline Control Plane Launcher to one of the supported cloud providers where you would like to run your CI/CD flow.
-
Take note of the PublicIP of the host where
Control Plane
was deployed. We refer to this as the PublicIP ofControl Plane
: -
Go back to the earlier created GitHub OAuth application and modify it. Set the
Authorization callback URL
field according to OAuth Application Authorization Callback
The steps of the workflow executed by the CI/CD flow are described in the .pipeline.yml
file that must be placed under the root directory of the source code of the Spark application. The file has to be pushed into the GitHub repo along with the source files of the application.
There is an example Spark application spark-pi-example that can be used for trying out the CI/CD pipeline.
Note: Fork this repository into your own repository for this purpose!.
For setting up your own spark application for the workflow you can start from one of the .pipeline.yml
configuration file templates from spark-pi-example
and customize it. Notice that there are separate templates for the different cloud providers. Pick the one that corresponds for the cloud provider you're using and create your .pipeline.yml
from it.
The following sections may need to be customized:
-
the cluster where your application executed
create_cluster: ... cluster_name: "[[your-cluster-name]]"
The cluster name must be a match of regex '(?:a-z?)' (only alphanumerics and '-' are allowed; it must start with a letter and end with an alphanumeric, and must be no longer than 40 characters)
-
the command for building your application
remote_build: ... original_commands: - mvn clean package
-
the Main class of your application
run: ... spark_submit_options: ... class: banzaicloud.SparkPi
-
the name of your spark application
run: ... spark_submit_configs: ... spark.app.name: sparkpi
-
the application artifact
This is the relative path to the
jar
of your Spark application. This is thejar
generated by the build commandrun: ... spark_submit_app_args: - target/spark-pi-1.0-SNAPSHOT.jar
-
the application arguments
run: ... spark_submit_app_args: - 1000
Navigate to http://{control_plane_public_ip}/auth/github/login
in your web browser and grant access for the organizations that contain the GitHub repositories that you want to hook into the CI/CD workflow. Then click authorize access.
If the login and authorization succeeds the user is redirected to the CI/CD UI.
All the services of the Pipeline may take some time to fully initialize, thus the page may not load at first. Please give it some time and retry.
Navigate to http://{control_plane_public_ip}/
- this will bring you to the CI/CD user interface. Select Repositories
from top left menu. This will list all the repositories that the Pipeline has access to. In case no repositories are listed at all click the Synchronize
menu item under the Repositories
menu.
Select repositories desired to be hooked to the CI/CD flow.
For the hooked repositories set the following secrets :
-
plugin_endpoint
- specifyhttp://{control_plane_public_ip}/pipeline/api/v1
-
plugin_token
- this is the token needed to access the Pipeline API. To obtain this token see how to acquire the access token
Modify the source code of your Spark application, commit the changes and push it to the repository on GitHub. The Pipeline gets notified through GitHub webhooks about the commits and will trigger the flow described in the .pipeline.yml
file of the watched repositories.
The running CI/CD jobs can be monitored and managed at http://{control_plane_public_ip}/account/repos
In order to check the logs of the CI/CD workflow steps, click on the desired commit message on the UI.
Once configured the Spark application will be built, deployed and executed for every commit pushed to the project's repository. The progress of the workflow can be followed by clicking on the small orange dot beside the commit on the GitHub UI.
Our git repos with example projects that contain pipeline workflow configurations: