Having created the A/B Testing Deployment Pipeline, this operations manual provides instructions on how to run your A/B Testing experiment.
Successful A/B Testing for machine learning models requires measuring how effective predictions are against end users. It is important to be able to identify users consistently and be able to attribute success actions against the model predictions back to users.
A/B Testing can be applied to a number of use cases for which you have defined a success or conversion
metric for predictions returned form an ML model.
Examples include:
- User Click Through on advertisement predicted based on browsing history and geo location.
- User Dwell Time for personalized content based exceeds relevancy threshold.
- User Opens marketing email with personalized subject line.
- User Watches recommended video for more than 30 seconds based on viewing history.
- User Purchases a product upgrade being offered an pricing discount.
Conversion rates will vary for each use case, so successful models will be measure as a percentage improvement (eg 5%) over a baseline, or previous best model.
The A/B Deployment Pipeline provides an additional stage after Endpoint deployment to register the endpoint for A/B Testing.
This Register stage invokes a lambda by providing an event that includes the enpdoint_name
along with configuration to select models from the registry based on the testing strategy.
Following is a list of the testing strategies available.
WeightedSampling
- Random selection based in initial variant weights. Also be select duringwarmup
phase.EpsilonGreedy
- Simple strategy picks a random variant a fraction of the time based onepsilon
.UCB1
- Smart strategy explores variants with upper confidence bounds until uncertainty drops.ThompsonSampling
- Smart strategy picks random points from beta distributions to exploit variants.
The configuration is stored in the CodeCommit source repository by stage name eg dev-config.json
for the dev
stage, and has the following parameters
stage_name
- The stage suffix for the SageMaker endpoint eg.dev
instance_count
- The number of instances to deploy per variantinstance_type
- The type of instance to deploy per variant.strategy
- The algorithm strategy for selecting user model variants.epsilon
- The epsilon parameter used by theEpsilonGreedy
strategy.warmup
- The number of invocations to warm up before applying the strategy.
In addition to the above, you must specify the champion
and challenger
model variants for the deployment.
These will be loaded from the two Model Package Groups in the registry that include the project name and suffixed with champion
or challenger
for example project name ab-testing-pipeline
these model package groups in the sample notebook:
Latest Approved Versions
To deploy the latest approved approved champion
model, and the latest N
approved challenger
models, you can provide the single challenger_variant_count
parameter eg:
{
"stage_name": "dev",
"strategy": "ThompsonSampling",
"instance_count": 1,
"instance_type": "ml.t2.large",
"challenger_variant_count": 1
}
Alternatively, such as the case for production environments, you way prefer to select specific approved model package versions. In this case you can specify the model_package_version
for both the champion_variant_config
and for one or more challenger_variant_config
configuration entries.
You also have the option of overriding one or both of the instance_count
and instance_type
parameters for each variant.
Specific Versions
{
"stage_name": "prod",
"strategy": "ThompsonSampling",
"warmup": 100,
"instance_count": 2,
"instance_type": "ml.c5.large",
"champion_variant_config": {
"model_package_version": 1,
"variant_name": "Champion",
"instance_count": 3,
"instance_type": "ml.m5.xlarge"
},
"challenger_variant_config": [
{
"model_package_version": 1,
"variant_name": "Challenger1",
"instance_type": "ml.c5.xlarge"
},
{
"model_package_version": 2,
"variant_name": "Challenger2",
"instance_count": 1
}
]
}
The API has two endpoints invocation
and conversion
both of which take a JSON
payload and return a JSON
response.
The invocation API requires an endpoint_name
. It also expects a user_id
input parameter to identify the user, if none is provided a new user_id
in the form of a UUID will be generated and return in response.
curl -X POST -d '<<request>>' https://<<domain>>.execute-api.<<region>>.amazonaws.com/<<stage>>/invocation
Request:
{
"endpoint_name": "sagemaker-ab-testing-pipeline-dev",
"user_id": "user_1",
"content_type": "application/json",
"data": "{\"instances\": [\"Excellent Item This is the perfect media device\"]}"
}
The response will return the invoked endpoint_variant
that return the predictions as well as algorithm strategy
and target_variant
selected.
Response:
{
"endpoint_name": "sagemaker-ab-testing-pipeline-dev",
"user_id": "user_1",
"strategy": "ThompsonSampling",
"target_variant": "Challenger1",
"endpoint_variant": "Challenger1",
"inference_id": "5aa61fe8-70d7-4eed-9419-8f4efc33662d",
"predictions": [{"label": ["__label__NotHelpful"], "prob": [0.735004723072052]}]
}
You can provide a manual override for the endpoint_variant
by specifying this the request payload.
Request:
{
"endpoint_name": "sagemaker-ab-testing-pipeline-dev",
"endpoint_variant": "Champion",
"user_id": "user_1",
"content_type": "application/json",
"data": "{\"instances\": [\"Excellent Item This is the perfect media device\"]}"
}
The response will output a strategy
of "Manual" which will be logged.
Response:
{
"endpoint_name": "sagemaker-ab-testing-pipeline-dev",
"user_id": "user_1",
"strategy": "Manual",
"target_variant": "Champion",
"endpoint_variant": "Champion",
"inference_id": "5aa61fe8-70d7-4eed-9419-8f4efc33662d",
"predictions": [{"label": ["__label__NotHelpful"], "prob": [0.735004723072052]}]
}
In the event of an error reaching the DynamoDB tables for user assignment of variant metrics, the API will still continue invoke the SageMaker endpoint.
The response will return a strategy
of "Fallback" along with an empty target_variant
and the actual invoked endpoint_variant
which will be logged in there are no issues writing to Kinesis.
Response:
{
"endpoint_name": "sagemaker-ab-testing-pipeline-dev",
"user_id": "user_1",
"strategy": "Fallback",
"target_variant": null,
"endpoint_variant": "Challenger2",
"inference_id": "5aa61fe8-70d7-4eed-9419-8f4efc33662d",
"predictions": [{"label": ["__label__NotHelpful"], "prob": [0.735004723072052]}]
}
The conversion API requires an endpoint_name
and a user_id
.
You can optionally provide the inference_id
which was returned from the original invocation request to allow correlation when querying the logs in S3.
The reward
parameters is a floating point number that defaults to 1.0
if not provided.
curl -X POST -d '<<request>>' https://<<domain>>.execute-api.<<region>>.amazonaws.com/<<stage>>/conversion
Request:
{
"endpoint_name": "sagemaker-ab-testing-pipeline-dev",
"user_id": "user_1",
"inference_id": "5aa61fe8-70d7-4eed-9419-8f4efc33662d",
"reward": 1.0
}
The response will return the endpoint_variant
assigned to the user.
Response:
{
"endpoint_name": "sagemaker-ab-testing-pipeline-dev",
"user_id": "user_1",
"strategy": "ThompsonSampling",
"endpoint_name": "Challenger1",
"endpoint_variant": "Challenger1",
"inference_id": "5aa61fe8-70d7-4eed-9419-8f4efc33662d",
"reward": 1.0
}
AWS CloudWatch metrics are published to every time metrics are updated in Amazon DynamoDB.
The following metrics are recorded against dimensions EndpointName
and VariantName
in namespace aws/sagemaker/Endpoints/ab-testing
Invocations
Conversions
Rewards
The API Lambda functions are instrumented with AWS X-Ray so you can inspect the latency for all downstream services including
- DynamoDB
- Amazon SageMaker
- Kinesis Firehose