ML Pipelines with AWS Glue and Amazon SageMaker using Jenkins

In this repository we are stepping through the implementation of a CI/CD ML pipeline using AWS Glue for data processing, Amazon SageMaker for training, versioning, and hosting Real-Time endpoints, and Jenkins CI/CD pipelines for orchestrating the Workflow. Through the usage of AWS CLI APIs for SageMaker, and AWS CLI APIs for AWS Glue we are showing how to implement CI/CD ML pipelines for processing data using AWS Glue, training ML models using Amazon SageMaker Training, deploying ML models using Amazon SageMaker Hosting Services, or perform batch inference by using Amazon SageMaker Batch Transform.

Everything can be tested by using the following frameworks:

AWS CLI
Amazon SageMaker DeepAR
Jenkins

Reference Architecture

Training pipeline

Deployment pipeline

Environment Setup

Setup the ML environment by deploying the CloudFormation templates described as below:

1.00-ml-environment: This template is deploying the necessary resources for the Amazon SageMaker Resources, such as Amazon KMS Key and Alias, Amazon S3 Bucket for storing code and ML model artifacts, Amazon SageMaker Model Registry for versioning ML models, IAM policies and roles for Amazon SageMaker and for Jenkins AWS Profile. Parameters:

KMSAlias: Name of the the KMS alias. Optional
ModelPackageGroupDescription: Description for the Amazon SageMaker Model Package Group. Optional
ModelPackageGroupName: Name for the Amazon SageMaker Model Package Group. Mandatory
S3BucketName: Amazon S3 Bucket name. Mandatory

Source Code

Build and Train ML models

ml-build-train/algorithms: This folder contains the scripts necessary for performing data processing
- ml-build-train/algorithms/processing-glue: This folder contains the Python scripts for processing data
- ml-build-train/algorithms/processing-glue-spark: This folder contains the Python scripts for processing data by using Spark with Python (PySpark)
ml-build-train/mlpipelines
- ml-build-train/mlpipelines/training: This is a Jenkinsfile example to be used for creating the Jenkins Pipeline for training

Inference and Deploy ML models

ml-inference-deploy/mlpipelines
- ml-inference-deploy/mlpipelines/deploy: This is a Jenkinsfile example to be used for creating the Jenkins Pipeline for deploying Amazon SageMaker Endpoints by using the latest approved model taken from the Amazon SageMaker Model Registry
- ml-inference-deploy/mlpipelines/inference: This is a Jenkinsfile example to be used for creating the Jenkins Pipeline for running a batch inference job by using Amazon SageMaker Batch Transform by using the latest approved model taken from the Amazon SageMaker Model Registry

Jenkins Environment

In this section, we are setting up a local Jenkins environment for testing the ML pipelines. Please follow the README for running Jenkins by using the provided Dockerfile in a container.

Setup pipeline

For creating the Jenkins Pipeline:

Create Job

Create Pipeline

Define Jenkinsfile

Create a Jenkins pipeline for the specific purpose by copying the content from

ml-build-train/mlpipelines/training
ml-inference-deploy/mlpipelines/deploy
ml-inference-deploy/mlpipelines/deploy

Define Jenkinsfile from Git repository

Create a Jenkins pipeline by pointing to a Jenkinsfile directly from the Git repository:

Conclusion

In this example we shared how to implement end to end pipelines for Machine Learning workloads using Jenkins, by using APIs with AWS CLI for interacting with AWS Glue, Amazon SageMaker for processing, training, and versioning ML models, for creating real-time endpoints or perform batch inference using Amazon SageMaker.

If you have any comments, please contact:

Bruno Pistone bpistone@amazon.com

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ML Pipelines with AWS Glue and Amazon SageMaker using Jenkins

Reference Architecture

Training pipeline

Deployment pipeline

Environment Setup

Source Code

Build and Train ML models

Inference and Deploy ML models

Jenkins Environment

Setup pipeline

Create Job

Create Pipeline

Define Jenkinsfile

Define Jenkinsfile from Git repository

Conclusion

Security

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

ML Pipelines with AWS Glue and Amazon SageMaker using Jenkins

Reference Architecture

Training pipeline

Deployment pipeline

Environment Setup

Source Code

Build and Train ML models

Inference and Deploy ML models

Jenkins Environment

Setup pipeline

Create Job

Create Pipeline

Define Jenkinsfile

Define Jenkinsfile from Git repository

Conclusion

Security

License