generative-ai-cdk-constructs-samples/samples/llm-on-govcloud-sagemaker at main · lesliedanielraj/generative-ai-cdk-constructs-samples

History

Name		Name	Last commit message	Last commit date
parent directory ..
assets		assets
bin		bin
lib		lib
.gitignore		.gitignore
.npmignore		.npmignore
README.md		README.md
cdk.json		cdk.json
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

README.md

LLM on Amazon SageMaker, compatible with AWS GovCloud

An AWS implementation of a Machine Learning (ML) Large Language Model (LLM) hosting, compatible with GovCloud.

Built With

Getting Started

Prerequisites

An AWS account with:
- An increased quota to deploy one ml.g4dn.12xlarge instance for endpoint usage
- An IAM User or Role with AdministratorAccess policy granted (we recommend restricting access as needed)
The AWS CLI installed
- export the AWS_REGION environment variable
Typescript, npm, and the AWS CDK cli installed (Required versions found in the package.json file)

Deployment

Note: The Amazon SageMaker Endpoint can incur significant cost if left running, be sure to monitor your billing, and destroy the stack via the Clean Up section when done experimenting.

Configure your AWS credentials, either by exporting your role credentials to the current terminal, or configuring your AWS CLI profile

Next you can run the following commands:

npm install
npm run build
cdk bootstrap (Only for first time running cdk in the account)
cdk deploy

Once the deployment is completed, you can navigate to SageMaker Notebook Instances and open the notebook Falcon40BNotebook-XXXXXXXXXX where the X's are randomly generated. From there you can run the notebook cells.

Clean Up

If you have created additional Jupyter notebooks in SageMaker you can download them from the SageMaker notebook instance's IDE before destroying the stack.

When complete you can run:

cdk destroy

To delete all resources you created.

Architecture

Architecture Overview

To deploy this application, we leverage HuggingFace's prebuilt Text Generation Inference (TGI) Falcon-40b docker image with HuggingFaceSageMakerEndpoint construct(deploy hugging face model to Amazon SageMaker) from @cdklabs/generative-ai-cdk-constructs.

The AWS Deep Learning Containers (DLCs) provide the set of Docker images which can be deployed on Amazon SageMaker. This creates a scalable, secure, hosted endpoint for real time inference.

We deploy a SageMaker notebook instance in a private subnet and allow outbound internet connectivity, while controlling inbound connectivity. To enable notebook to AWS Service Endpoint communication, we then use VPC Endpoints powered by AWS PrivateLink. The benefit of using AWS PrivateLink is it allows SageMaker notebook instances to access the SageMaker real-time inference endpoint over the private network IP space.

Architecture Details & Technologies

HuggingFace TGI and Falcon-40b LLM

Hugging Face's TGI provides a seamless way to deploy LLMs for real-time text generation. It bundles prebuilt Docker containers that handle hosting infrastructure so users can focus on their applications and use-cases.

Falcon-40b features advanced text generation and comprehension capabilities. Boasting 178 billion parameters, Falcon-40b is one of the largest publicly available models. Trained on 1.5 trillion text tokens across English, German, Spanish, French, and other languages, Falcon-40b can fluently generate, summarize, and translate text.

SageMaker real-time inference endpoints enable low-latency, high-throughput hosting of machine learning models for real-time inference. By using Amazon SageMaker, we can take advantage of the operational efficiencies of using AWS infrastructure and eliminating the undifferentiated heavy-lifting. Amazon SageMaker handles provisioning servers, scaling, monitoring, and availability freeing up the data scientists to work with LLMs.

Amazon SageMaker notebook instances provide a managed and familiar environment, purpose-built for developing and evaluating ML models. Amazon SageMaker provides a painless and cost effective sandbox to prototype capabilities.

Multiple instance types give data scientists flexibility to test small demos or fine-tune LLMs on significant datasets.

Networking components and features of AWS like AWS PrivateLink allow administrators to control private connectivity between VPCs and AWS services securely on AWS without traversing the public internet. This helps enable secure LLM experimentation with datasets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-on-govcloud-sagemaker

llm-on-govcloud-sagemaker

README.md

LLM on Amazon SageMaker, compatible with AWS GovCloud

Built With

Getting Started

Prerequisites

Deployment

Clean Up

Architecture

Architecture Overview

Architecture Details & Technologies

HuggingFace TGI and Falcon-40b LLM

Files

llm-on-govcloud-sagemaker

Directory actions

More options

Directory actions

More options

Latest commit

History

llm-on-govcloud-sagemaker

Folders and files

parent directory

README.md

LLM on Amazon SageMaker, compatible with AWS GovCloud

Built With

Getting Started

Prerequisites

Deployment

Clean Up

Architecture

Architecture Overview

Architecture Details & Technologies

HuggingFace TGI and Falcon-40b LLM