Skip to content

marckarp/sagemaker-catboost-mme

Repository files navigation

SageMaker CatBoost Multi-Model Endpoint

This repo depicts how to make use of a custom container to host multiple CatBoost models on a SageMaker Multi-Model-Endpoint.

The catboost-mme.ipynb contains the steps to build and push the custom image to ECR, deploy the SageMaker Endpoint and make inference against the Multi-Model-Endpoint.

The container folder contains the files needed for the custom image.

├── container
│   ├── dockerd-entrypoint.py
│   ├── Dockerfile
│   └── model_handler.py
  • dockerd-entrypoint.py is the entry point script that will start the multi model server.
  • Dockerfile contains the container definition that will be used to assemble the image. This include the packages that need to be installed.
  • model_handler.py is the script that will contain the logic to load up the model and make inference.

Benchmarking and load testing:

Load tests

All tests conducted on a single ml.m5.xlarge.

1) Uncompressed 569KB model in memory test ~460TPS

End to end:

Response time percentiles (approximated)
 Type     Name                                                                              50%    66%    75%    80%    90%    95%    98%    99%  99.9% 99.99%   100% # reqs
--------|----------------------------------------------------------------------------|---------|------|------|------|------|------|------|------|------|------|------|------|
 custom_protocol_boto3 sagemaker_client_invoke_endpoint                                      30     32     34     35     39     43     50     56     85    280   2100 137879

Model and Overhead Latency (p99) and Invocations (Sum) - 1min: metric1

2) Uncompressed 70MB model in memory test ~238TPS

End to end:

Response time percentiles (approximated)
 Type     Name                                                                              50%    66%    75%    80%    90%    95%    98%    99%  99.9% 99.99%   100% # reqs
--------|----------------------------------------------------------------------------|---------|------|------|------|------|------|------|------|------|------|------|------|
 custom_protocol_boto3 sagemaker_client_invoke_endpoint                                     59     64     67     69     75     80     87     93    220    940   1000  71230

Model and Overhead Latency (p99) and Invocations (Sum) - 1min: metric1

Code profiling (Big model)

Function Initial run time (ms) Subsequent run time (ms)
perf __init__ 0.000953674 -
perf initialize 258.2206726 -
perf handle_out 0.001907349 0.00166893
perf preprocess 0.005245209 0.005483627
perf inference 20.75648308 3.942251205
perf postprocess 0.031471252 0.021219254
perf handle in 32.42993355 12.28523254

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published