Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Image Builder implementation #2904

Merged
merged 17 commits into from
Dec 3, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
196 changes: 196 additions & 0 deletions docs/book/component-guide/image-builders/aws.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
---
bcdurak marked this conversation as resolved.
Show resolved Hide resolved
description: Building container images with AWS CodeBuild
---

# AWS Image Builder

The AWS image builder is an [image builder](./image-builders.md) flavor provided by the ZenML `aws` integration that uses [AWS CodeBuild](https://aws.amazon.com/codebuild) to build container images.

### When to use it

You should use the AWS image builder if:

* you're **unable** to install or use [Docker](https://www.docker.com) on your client machine.
* you're already using AWS.
* your stack is mainly composed of other AWS components such as the [S3 Artifact Store](../artifact-stores/s3.md) or the [Sagemaker Orchestrator](../orchestrators/sagemaker.md).
stefannica marked this conversation as resolved.
Show resolved Hide resolved
stefannica marked this conversation as resolved.
Show resolved Hide resolved

### How to deploy it

{% hint style="info" %}
Would you like to skip ahead and deploy a full ZenML cloud stack already,
including the AWS image builder? Check out the
[in-browser stack deployment wizard](../../how-to/infrastructure-deployment/stack-deployment/deploy-a-cloud-stack.md),
the [stack registration wizard](../../how-to/infrastructure-deployment/stack-deployment/register-a-cloud-stack.md),
or [the ZenML AWS Terraform module](../../how-to/infrastructure-deployment/stack-deployment/deploy-a-cloud-stack-with-terraform.md)
for a shortcut on how to deploy & register this stack component.
{% endhint %}
bcdurak marked this conversation as resolved.
Show resolved Hide resolved

### How to use it

To use the AWS image builder, you need:

* The ZenML `aws` integration installed. If you haven't done so, run:

```shell
zenml integration install aws
```
* An [S3 Artifact Store](../artifact-stores/s3.md) where the build context will be uploaded, so AWS CodeBuild can access it.
* Recommended: an [AWS container registry](../container-registries/aws.md) where the built image will be pushed. The AWS CodeBuild service can also work with other container registries, but [explicit authentication](#authentication-methods) must be enabled in this case.
* An [AWS CodeBuild project](https://aws.amazon.com/codebuild) created in the AWS account and region where you want to build the Docker images, preferably in the same region as the ECR container registry where images will be pushed (if applicable). The CodeBuild project configuration is largely irrelevant, as ZenML will override most of the default settings for each build, but the following are some recommended default configuration values:
* **Source Type**: `Amazon S3`
* **Bucket**: The same S3 bucket used by the ZenML S3 Artifact Store.
* **S3 folder**: any value (e.g. `codebuild`);
* **Environment Type**: `Linux Container`
* **Environment Image**: `bentolor/docker-dind-awscli`
* **Privileged Mode**: `false`

The user must take care that the **Service Role** attached to the CodeBuild project also has the necessary permissions to access the S3 bucket to read objects and the ECR container registry to push images (if applicable):

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion"
],
"Resource": "arn:aws:s3:::<BUCKET_NAME>/*"
},
{
"Effect": "Allow",
"Action": [
"ecr:BatchGetImage",
"ecr:DescribeImages",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:InitiateLayerUpload",
"ecr:UploadLayerPart",
"ecr:CompleteLayerUpload",
"ecr:PutImage"
],
"Resource": "arn:aws:ecr:<REGION>:<ACCOUNT_ID>:repository/<REPOSITORY_NAME>"
},
{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken"
],
"Resource": "*"
},
]
}
```

* Recommended: grant ZenML access to trigger AWS CodeBuild builds by registering an [AWS Service Connector](../../how-to/infrastructure-deployment/auth-management/aws-service-connector.md) with the proper credentials and permissions, as covered in the [Authentication Methods](aws.md#authentication-methods) section. If not provided, then the AWS credentials will be inferred from the environment where the pipeline is triggered.
stefannica marked this conversation as resolved.
Show resolved Hide resolved

We can register the image builder and use it in our active stack:

```shell
zenml image-builder register <IMAGE_BUILDER_NAME> \
--flavor=aws \
--code_build_project=<CODEBUILD_PROJECT_NAME>

# Register and activate a stack with the new image builder
zenml stack register <STACK_NAME> -i <IMAGE_BUILDER_NAME> ... --set
```

You also need to set up [authentication](aws.md#authentication-methods) required to access the CodeBuild AWS service.

#### Authentication Methods

Integrating and using an AWS Image Builder in your pipelines is not possible without employing some form of authentication. If you're looking for a quick way to get started locally, you can use the _Local Authentication_ method. However, the recommended way to authenticate to the AWS cloud platform is through [an AWS Service Connector](../../how-to/infrastructure-deployment/auth-management/aws-service-connector.md). This is particularly useful if you are configuring ZenML stacks that combine the AWS Image Builder with other remote stack components also running in AWS.

{% tabs %}
{% tab title="Implicit Authentication" %}
This method uses the implicit AWS authentication available _in the environment where the ZenML code is running_. On your local machine, this is the quickest way to configure an AWS Image Builder. You don't need to supply credentials explicitly when you register the AWS Image Builder, as it leverages the local credentials and configuration that the AWS CLI stores on your local machine. However, you will need to install and set up the AWS CLI on your machine as a prerequisite, as covered in [the AWS CLI documentation](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html), before you register the AWS Image Builder.

{% hint style="warning" %}
Stacks using the AWS Image Builder set up with local authentication are not portable across environments. To make ZenML pipelines fully portable, it is recommended to use [an AWS Service Connector](../../how-to/infrastructure-deployment/auth-management/aws-service-connector.md) to authenticate your AWS Image Builder to the AWS cloud platform.
{% endhint %}
{% endtab %}

{% tab title="AWS Service Connector (recommended)" %}
To set up the AWS Image Builder to authenticate to AWS and access the AWS CodeBuild services, it is recommended to leverage the many features provided by [the AWS Service Connector](../../how-to/infrastructure-deployment/auth-management/aws-service-connector.md) such as auto-configuration, best security practices regarding long-lived credentials and reusing the same credentials across multiple stack components.

If you don't already have an AWS Service Connector configured in your ZenML deployment, you can register one using the interactive CLI command. You also have the option to configure an AWS Service Connector that can be used to access more than just the AWS CodeBuild service:

```sh
zenml service-connector register --type aws -i
```

A non-interactive CLI example that leverages [the AWS CLI configuration](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) on your local machine to auto-configure an AWS Service Connector for the AWS CodeBuild service:

```sh
zenml service-connector register <CONNECTOR_NAME> --type aws --resource-type aws-generic --auto-configure
```

{% code title="Example Command Output" %}
```
$ zenml service-connector register aws-generic --type aws --resource-type aws-generic --auto-configure
Successfully registered service connector `aws-generic` with access to the following resources:
┏━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┓
┃ RESOURCE TYPE │ RESOURCE NAMES ┃
┠────────────────┼────────────────┨
┃ 🔵 aws-generic │ eu-central-1 ┃
┗━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┛
```
{% endcode %}

> **Note**: Please remember to grant the entity associated with your AWS credentials permissions to access the CodeBuild API and to run CodeBuilder builds (e.g. the [CodeBuild Editor IAM role](https://cloud.google.com/build/docs/iam-roles-permissions#predefined\_roles)). The AWS Service Connector supports [many different authentication methods](../../how-to/infrastructure-deployment/auth-management/aws-service-connector.md#authentication-methods) with different levels of security and convenience. You should pick the one that best fits your use case.

If you already have one or more AWS Service Connectors configured in your ZenML deployment, you can check which of them can be used to access generic AWS resources like the one required for your AWS Image Builder by running e.g.:

```sh
zenml service-connector list-resources --resource-type aws-generic
```

{% code title="Example Command Output" %}
```
The following 'aws-generic' resources can be accessed by service connectors configured in your workspace:
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┓
┃ CONNECTOR ID │ CONNECTOR NAME │ CONNECTOR TYPE │ RESOURCE TYPE │ RESOURCE NAMES ┃
┠──────────────────────────────────────┼────────────────┼────────────────┼────────────────┼────────────────┨
┃ 7113ba9b-efdd-4a0a-94dc-fb67926e58a1 │ aws-generic │ 🔶 aws │ 🔶 aws-generic │ eu-central-1 ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┛
```
{% endcode %}

After having set up or decided on an AWS Service Connector to use to authenticate to AWS, you can register the AWS Image Builder as follows:

```sh
zenml image-builder register <IMAGE_BUILDER_NAME> \
--flavor=aws \
--code_build_project=<CODEBUILD_PROJECT_NAME> \
--connector <CONNECTOR_ID>
```

To connect an AWS Image Builder to an AWS Service Connector at a later point, you can use the following command:

```sh
zenml image-builder connect <IMAGE_BUILDER_NAME> --connector <CONNECTOR_ID>
```

{% code title="Example Command Output" %}
```
$ zenml image-builder connect aws-image-builder --connector aws-generic
Successfully connected image builder `aws-image-builder` to the following resources:
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┓
┃ CONNECTOR ID │ CONNECTOR NAME │ CONNECTOR TYPE │ RESOURCE TYPE │ RESOURCE NAMES ┃
┠──────────────────────────────────────┼────────────────┼────────────────┼────────────────┼────────────────┨
┃ 7113ba9b-efdd-4a0a-94dc-fb67926e58a1 │ aws-generic │ 🔶 aws │ 🔶 aws-generic │ eu-central-1 ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┛
```
{% endcode %}

As a final step, you can use the AWS Image Builder in a ZenML Stack:

```sh
# Register and set a stack with the new image builder
zenml stack register <STACK_NAME> -i <IMAGE_BUILDER_NAME> ... --set
```
{% endtab %}
{% endtabs %}

<figure><img src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" alt="ZenML Scarf"><figcaption></figcaption></figure>
1 change: 1 addition & 0 deletions docs/book/component-guide/image-builders/image-builders.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ image builders are provided by integrations:
| [LocalImageBuilder](local.md) | `local` | _built-in_ | Builds your Docker images locally. |
| [KanikoImageBuilder](kaniko.md) | `kaniko` | `kaniko` | Builds your Docker images in Kubernetes using Kaniko. |
| [GCPImageBuilder](gcp.md) | `gcp` | `gcp` | Builds your Docker images using Google Cloud Build. |
| [AWSImageBuilder](aws.md) | `aws` | `aws` | Builds your Docker images using AWS Code Build. |
| [Custom Implementation](custom.md) | _custom_ | | Extend the image builder abstraction and provide your own implementation |

If you would like to see the available flavors of image builders, you can use the command:
Expand Down
2 changes: 1 addition & 1 deletion examples/llm_finetuning/.copier-answers.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Changes here will be overwritten by Copier
_commit: 2024.11.08-2-gece1d46
_commit: 2024.11.28
_src_path: gh:zenml-io/template-llm-finetuning
bf16: true
cuda_version: cuda11.8
Expand Down
7 changes: 5 additions & 2 deletions src/zenml/image_builders/base_image_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
from zenml.logger import get_logger
from zenml.stack import Flavor, StackComponent
from zenml.stack.stack_component import StackComponentConfig
from zenml.utils.archivable import ArchiveType

if TYPE_CHECKING:
from zenml.container_registries import BaseContainerRegistry
Expand Down Expand Up @@ -100,6 +101,7 @@ def build(
def _upload_build_context(
build_context: "BuildContext",
parent_path_directory_name: str,
archive_type: ArchiveType = ArchiveType.TAR_GZ,
) -> str:
"""Uploads a Docker image build context to a remote location.

Expand All @@ -109,6 +111,7 @@ def _upload_build_context(
the build context to. It will be appended to the artifact
store path to create the parent path where the build context
will be uploaded to.
archive_type: The type of archive to create.

Returns:
The path to the uploaded build context.
Expand All @@ -119,15 +122,15 @@ def _upload_build_context(

hash_ = hashlib.sha1() # nosec
with tempfile.NamedTemporaryFile(mode="w+b", delete=False) as f:
build_context.write_archive(f, use_gzip=True)
build_context.write_archive(f, archive_type)

while True:
data = f.read(64 * 1024)
if not data:
break
hash_.update(data)

filename = f"{hash_.hexdigest()}.tar.gz"
filename = f"{hash_.hexdigest()}.{archive_type.value}"
filepath = f"{parent_path}/{filename}"
if not fileio.exists(filepath):
logger.info("Uploading build context to `%s`.", filepath)
Expand Down
23 changes: 7 additions & 16 deletions src/zenml/image_builders/build_context.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
from zenml.io import fileio
from zenml.logger import get_logger
from zenml.utils import io_utils, string_utils
from zenml.utils.archivable import Archivable
from zenml.utils.archivable import Archivable, ArchiveType

logger = get_logger(__name__)

Expand Down Expand Up @@ -69,28 +69,19 @@ def dockerignore_file(self) -> Optional[str]:
return None

def write_archive(
self, output_file: IO[bytes], use_gzip: bool = True
self,
output_file: IO[bytes],
archive_type: ArchiveType = ArchiveType.TAR_GZ,
) -> None:
"""Writes an archive of the build context to the given file.

Args:
output_file: The file to write the archive to.
use_gzip: Whether to use `gzip` to compress the file.
archive_type: The type of archive to create.
"""
from docker.utils import build as docker_build_utils

files = self.get_files()
extra_files = self.get_extra_files()

context_archive = docker_build_utils.create_archive(
fileobj=output_file,
root=self._root,
files=sorted(files.keys()),
gzip=use_gzip,
extra_files=list(extra_files.items()),
)
super().write_archive(output_file, archive_type)

build_context_size = os.path.getsize(context_archive.name)
build_context_size = os.path.getsize(output_file.name)
if (
self._root
and build_context_size > 50 * 1024 * 1024
Expand Down
3 changes: 3 additions & 0 deletions src/zenml/integrations/aws/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
AWS_CONNECTOR_TYPE = "aws"
AWS_RESOURCE_TYPE = "aws-generic"
S3_RESOURCE_TYPE = "s3-bucket"
AWS_IMAGE_BUILDER_FLAVOR = "aws"

class AWSIntegration(Integration):
"""Definition of AWS integration for ZenML."""
Expand All @@ -59,12 +60,14 @@ def flavors(cls) -> List[Type[Flavor]]:
"""
from zenml.integrations.aws.flavors import (
AWSContainerRegistryFlavor,
AWSImageBuilderFlavor,
SagemakerOrchestratorFlavor,
SagemakerStepOperatorFlavor,
)

return [
AWSContainerRegistryFlavor,
AWSImageBuilderFlavor,
SagemakerStepOperatorFlavor,
SagemakerOrchestratorFlavor,
]
Expand Down
6 changes: 6 additions & 0 deletions src/zenml/integrations/aws/flavors/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@
AWSContainerRegistryConfig,
AWSContainerRegistryFlavor,
)
from zenml.integrations.aws.flavors.aws_image_builder_flavor import (
AWSImageBuilderConfig,
AWSImageBuilderFlavor,
)
from zenml.integrations.aws.flavors.sagemaker_orchestrator_flavor import (
SagemakerOrchestratorConfig,
SagemakerOrchestratorFlavor,
Expand All @@ -29,6 +33,8 @@
__all__ = [
"AWSContainerRegistryFlavor",
"AWSContainerRegistryConfig",
"AWSImageBuilderConfig",
"AWSImageBuilderFlavor",
"SagemakerStepOperatorFlavor",
"SagemakerStepOperatorConfig",
"SagemakerOrchestratorFlavor",
Expand Down
Loading