In this repository we are stepping through the implementation of a CI/CD ML pipeline using AWS Glue for data processing, Amazon SageMaker for training, versioning, and hosting Real-Time endpoints, and Jenkins CI/CD pipelines for orchestrating the Workflow. Through the usage of AWS CLI APIs for SageMaker, and AWS CLI APIs for AWS Glue we are showing how to implement CI/CD ML pipelines for processing data using AWS Glue, training ML models using Amazon SageMaker Training, deploying ML models using Amazon SageMaker Hosting Services, or perform batch inference by using Amazon SageMaker Batch Transform.
Everything can be tested by using the following frameworks:
Setup the ML environment by deploying the CloudFormation templates described as below:
1.00-ml-environment: This template is deploying the necessary resources for the Amazon SageMaker Resources, such as Amazon KMS Key and Alias, Amazon S3 Bucket for storing code and ML model artifacts, Amazon SageMaker Model Registry for versioning ML models, IAM policies and roles for Amazon SageMaker and for Jenkins AWS Profile. Parameters:
- KMSAlias: Name of the the KMS alias. Optional
- ModelPackageGroupDescription: Description for the Amazon SageMaker Model Package Group. Optional
- ModelPackageGroupName: Name for the Amazon SageMaker Model Package Group. Mandatory
- S3BucketName: Amazon S3 Bucket name. Mandatory
- ml-build-train/algorithms: This folder contains the scripts necessary for performing data processing
- ml-build-train/algorithms/processing-glue: This folder contains the Python scripts for processing data
- ml-build-train/algorithms/processing-glue-spark: This folder contains the Python scripts for processing data by using Spark with Python (PySpark)
- ml-build-train/mlpipelines
- ml-build-train/mlpipelines/training: This is a Jenkinsfile example to be used for creating the Jenkins Pipeline for training
- ml-inference-deploy/mlpipelines
- ml-inference-deploy/mlpipelines/deploy: This is a Jenkinsfile example to be used for creating the Jenkins Pipeline for deploying Amazon SageMaker Endpoints by using the latest approved model taken from the Amazon SageMaker Model Registry
- ml-inference-deploy/mlpipelines/inference: This is a Jenkinsfile example to be used for creating the Jenkins Pipeline for running a batch inference job by using Amazon SageMaker Batch Transform by using the latest approved model taken from the Amazon SageMaker Model Registry
In this section, we are setting up a local Jenkins environment for testing the ML pipelines. Please follow the README for running Jenkins by using the provided Dockerfile in a container.
For creating the Jenkins Pipeline:
Create a Jenkins pipeline for the specific purpose by copying the content from
- ml-build-train/mlpipelines/training
- ml-inference-deploy/mlpipelines/deploy
- ml-inference-deploy/mlpipelines/deploy
Create a Jenkins pipeline by pointing to a Jenkinsfile directly from the Git repository:
In this example we shared how to implement end to end pipelines for Machine Learning workloads using Jenkins, by using APIs with AWS CLI for interacting with AWS Glue, Amazon SageMaker for processing, training, and versioning ML models, for creating real-time endpoints or perform batch inference using Amazon SageMaker.
If you have any comments, please contact:
Bruno Pistone [email protected]
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.