Pipelines are configured from pre-defined components, each of which can be declared as configuration.
All pipelines require the following as a minimum:
Defines the backend target server. Current supported endpoints are Azure Open AI and Open AI.
All endpoints are wrapped with a Polly Policy. We
- Retry on 429 errors
- Circuit break if an endpoint consistently fails
- Will setup a BulkHead to limit concurrency to the endpoint (leave off the MaxConcurrency if you don't want this)
Property | Description |
---|---|
LanguageEndpoint | Full URL to an Azure Open AI Endpoint |
ModelMappings | Maps incoming model names to backend model names. |
EnforceMappedModels | If true, only models in the ModelMappings will be allowed. |
AuthenticationType | The type of authentication to use. apikey or entra or entrapassthrough |
AuthenticationKey | The key to use for authentication (when AuthenticationType is apikey). |
MaxConcurrency | The maximum number of concurrent requests to the endpoint. |
AutoPopulateEmptyUserId | If true, the UserId will be populated with the incoming User Name if it is empty. |
If AuthenticationType is set to
entra
AICentral will use DefaultAzureCredential to obtain a JWT scoped tohttps://cognitiveservices.azure.com
If AuthenticationType is set to
entrapassthrough
AICentral will expect, and forward the incoming JWT Bearer Token straight through to Azure Open AI
{
"Type": "AzureOpenAIEndpoint",
"Name": "name-to-refer-to-later",
"Properties": {
"LanguageEndpoint": "required-full-uri-to-azure-open-ai-service",
"ModelMappings": {
"incoming-model-name": "backend-model-name",
"not-required": "default-to-pass-model-name-through"
},
"EnforceMappedModels": true,
"AuthenticationType": "ApiKey|Entra|EntraPassThrough",
"AuthenticationKey": "required-when-using-ApiKey",
"MaxConcurrency": 5,
"AutoPopulateEmptyUserId": true
}
}
Property | Description |
---|---|
ModelMappings | Maps incoming model names to backend model names. |
EnforceMappedModels | If true, only models in the ModelMappings will be allowed. |
ApiKey | Open AI API Key. |
MaxConcurrency | The maximum number of concurrent requests to the endpoint. |
AutoPopulateEmptyUserId | If true, the UserId will be populated with the incoming User Name if it is empty. |
{
"Type": "OpenAIEndpoint",
"Name": "name-to-refer-to-later",
"Properties": {
"ModelMappings": {
"incoming-model-name": "backend-model-name",
"not-required": "default-to-pass-model-name-through"
},
"ApiKey": "required",
"Organization": "optional",
"MaxConcurrency": 5,
"AutoPopulateEmptyUserId": true
}
}
Endpoint Selectors define clusters of Endpoints, along with logic for choosing which and when to use.
We ship 4 Endpoint Selectors:
- Direct proxy through to an existing endpoint
This is the only endpoint selector for Azure Open AI that supports image generation. Azure Open AI uses an asynchronous poll to wait for image generation so we must guarantee affinity to an Azure Open AI service.
Property | Description |
---|---|
Endpoint | An Endpoint name as declared in the Endpoint Configuration Collection . |
{
"Type": "SingleEndpoint",
"Name": "my-name",
"Properties": {
"Endpoint": "endpoint-name-from-earlier"
}
}
- Picks an endpoint at random and tries it.
- If we fail, we pick from the remaining ones.
- And so-on, until we get a response, or fail.
Property | Description |
---|---|
Endpoints | An array of Endpoint names as declared in the Endpoint Configuration Collection . |
{
"Type": "RandomCluster",
"Name": "my-name",
"Properties": {
"Endpoints": [
"endpoint-name-from-earlier",
"another-endpoint-name-from-earlier",
"yet-another-endpoint-name-from-earlier"
]
}
}
- For the Priority services
- Picks an endpoint at random and tries it.
- If we fail, we pick from the remaining ones.
- And so-on, until we get a response, or fail.
- If we failed, repeat for the fallback services
Property | Description |
---|---|
PriorityEndpoints | An array of Endpoint names as declared in the Endpoint Configuration Collection to try. |
FallbackEndpoints | An array of Endpoint names as declared in the Endpoint Configuration Collection to fallback to. |
{
"Type": "Prioritised",
"Name": "my-name",
"Properties": {
"PriorityEndpoints": [
"endpoint-name-from-earlier",
"another-endpoint-name-from-earlier"
],
"FallbackEndpoints": [
"yet-another-endpoint-name-from-earlier",
"and-yet-another-endpoint-name-from-earlier"
]
}
}
This runs a rolling average of the duration to call the downstream OpenAI endpoints. It will over time prioritise the fastest endpoints. The implementation maintains the duration of the last 10 requests to an endpoint, and executes your request trying the quickest first.
The strategy measures the overall response time. This works better when your Request and Response tokens are of a similar size.
Property | Description |
---|---|
Endpoints | An array of Endpoint names as declared in the Endpoint Configuration Collection to try. |
{
"Type": "LowestLatency",
"Name": "my-name",
"Properties": {
"Endpoints": [
"endpoint-name-from-earlier",
"another-endpoint-name-from-earlier"
]
}
}
To support more complex Endpoint Selectors we support referencing an Endpoint Selector from another Endpoint Selector.
The implementation relies on the order of your Selectors. You can only reference selectors that have been defined earlier. This sample shows a Lowest Latency endpoint used for the priority endpoints in a Prioritised endpoint selector.
{
"AICentral": {
"Endpoints": [ "... define endpoints" ],
"EndpointSelectors": [
{
"Type": "LowestLatency",
"Name": "lowest-latency-group",
"Properties": {
"Endpoints": [
"endpoint-name-from-earlier",
"another-endpoint-name-from-earlier"
]
}
},
{
"Type": "Prioritised",
"Properties": {
"PriorityEndpoints": [
"lowest-latency-group" //references the lowest-latency-group defined before this
],
"FallbackEndpoints": [
"yet-another-endpoint-name-from-earlier",
"and-yet-another-endpoint-name-from-earlier"
]
}
},
{
}
],
"Pipelines": [
{
"Name": "MyPipeline",
"Host": "<host-name-we-listen-for-requests-on>",
"EndpointSelector": "name-from-above"
}
]
}
}
Property | Required | Description |
---|---|---|
Name | Yes | Friendly Name of the pipeline |
Host | Yes | The HostName to listen to for incoming requests to this pipeline |
EndpointSelector | Yes | The Endpoint Selector strategy to use as defined in your EndpointSelectors config section |
AuthProvider | Yes | Auth strategy to protect the Pipeline, as defined in your AuthProviders config section |
OpenTelemetryConfig.Transmit | Yes | True to emit additional Open Telemetry metrics (useful for scenarios such as ChargeBack) |
OpenTelemetryConfig.AddClientNameTag | Yes | True to add the Client Name tag to OTel telemetry |
OpenTelemetryConfig.AddClientNameTag | Yes | True to add the Client Name tag to OTel telemetry |
Steps | No | An array of Step names to run before the request is forwarded to the backend. |
{
"Name": "MyPipeline",
"Host": "<host-name-we-listen-for-requests-on>",
"EndpointSelector": "name-from-above",
"AuthProvider": "name-from-above",
"OpenTelemetryConfig": {
"Transmit": true,
"AddClientNameTag": true
},
"Steps": [
"step-name-from-earlier",
"another-step-name-from-earlier"
]
}
Using Endpoints and Endpoint Selectors we can create a pipeline like this:
{
"AICentral": {
"Endpoints": [ "... as above" ],
"EndpointSelectors": [ "... as above" ],
"Pipelines": [
{
"Name": "MyPipeline",
"Host": "<host-name-we-listen-for-requests-on>",
"EndpointSelector": "name-from-EndpointSelectors-config-section",
"AuthProvider": "name-from-AuthProviders-config-section"
}
]
}
}
To enable OTel metrics on a pipeline, add this section
AddClientNameTag adds the consumers name to the OTel metrics. This will enable chargeback scenarios across your Pipelines.
The examples shown will capture Telemetry and send it to Azure Monitor. Use your Open Telemetry collector of choice for other destinations.
{
"AICentral": {
"Endpoints": [ "... as above" ],
"EndpointSelectors": [ "... as above" ],
"Pipelines": [
{
"Name": "MyPipeline",
"Host": "<host-name-we-listen-for-requests-on>",
"EndpointSelector": "name-from-above",
"OpenTelemetryConfig": {
"AddClientNameTag": true,
"Transmit": true
}
}
]
}
}
dotnet add package Azure.Monitor.OpenTelemetry.AspNetCore
builder.Services
.AddOpenTelemetry()
.WithMetrics(metrics =>
{
metrics.AddMeter(ActivitySource.AICentralTelemetryName);
})
.UseAzureMonitor();
Check out this dashboard for inspiration on how to visualise your metrics.
To enable additional AICentrl traces in your Open Telemetry distributed tracing
builder.Services
.AddOpenTelemetry()
.WithTracing(tracing =>
{
tracing.AddSource(ActivitySource.AICentralTelemetryName);
})
We support adding authentication to incoming clients in 3 ways.
No auth is applied to incoming requests. This is useful if we use EntraPassThrough for our backend endpoints. The user will present a token issued for an Azure Open AI service, which will be accepted or rejected by the backend service.
Property | Required | Description |
---|---|---|
Name | Yes | Name to refer to the step from a Pipeline |
Type | Yes | AllowAnonymous |
{
"AICentral": {
"AuthProviders": [
{
"Type": "AllowAnonymous",
"Name": "no-auth"
}
],
"Endpoints": [
{
"Name": "MyPipeline",
"Host": "<host-name-we-listen-for-requests-on>",
"EndpointSelector": "name-from-above",
"AuthProvider": "no-auth"
}
]
}
}
Uses standard Azure Active Directory Authentication to assert a valid JWT.
Currently we support authorisation using AAD Roles.
Property | Required | Description |
---|---|---|
Name | Yes | Name to refer to the step from a Pipeline |
Type | Yes | Entra |
Entra.xxx | Yes | Standard Microsoft.Identity.Web Configuration section |
Requirements.Roles | No | Role claim to assert on the incoming validated JWT |
{
"AICentral": {
"AuthProviders": [
{
"Type": "Entra",
"Name": "simple-aad",
"Properties": {
"Entra": {
"ClientId": "<my-client-id>",
"TenantId": "<my-tenant-id>",
"Instance": "https://login.microsoftonline.com/",
"Audience": "<custom-audience>"
},
"Requirements" : {
"Roles": ["required-roles", "can-be-many"]
}
}
}
],
"Endpoints": [
{
"Name": "MyPipeline",
"Host": "<host-name-we-listen-for-requests-on>",
"EndpointSelector": "name-from-above",
"AuthProvider": "simple-aad"
}
]
}
}
You can specify clients, along with a pair of keys, and authenticate your pipelines using them. The keys are sent in the api-key header and replace the provider's key.
Property | Required | Description |
---|---|---|
Name | Yes | Name to refer to the step from a Pipeline |
Type | Yes | ApiKey |
Clients | Yes | Array of allowed Clients |
Client.ClientName | Yes | Name to assign to the incoming callee |
Client.Key1 | Yes | ApiKey valid for the consumer to pass |
Client.Key2 | Yes | Second ApiKey valid for the consumer to pass |
{
"AICentral": {
"AuthProviders": [
{
"Type": "ApiKey",
"Name": "apikey",
"Properties": {
"Clients" : [
{
"ClientName" : "Consumer-1",
"Key1": "dfhaskjhdfjkasdhfkjsdf",
"Key2": "23sfdkjhcijshjkfhsdkjfsd"
},
{
"ClientName" : "Consumer-2",
"Key1": "szcvjhkhkjhjkfsdf",
"Key2": "vkjhsdfjkhkjnkjhjksdf"
}
]
}
}
],
"Endpoints": [
{
"Name": "MyPipeline",
"Host": "<host-name-we-listen-for-requests-on>",
"EndpointSelector": "name-from-above",
"AuthProvider": "apikey"
}
]
}
}
AI Central can act as a Token Provider. The tokens are bound to a Consumer, Pipelines, and a time window.
Use this to facilitate a Hackathon without blowing your budget!
Property | Required | Description |
---|---|---|
Name | Yes | Name to refer to the step from a Pipeline |
Type | Yes | AICentralJWT |
TokenIssuer | Yes | Issuer to set / require on JWTs |
AdminKey | Yes | A secret that can be provided to create JWTs |
ValidPipelines | Yes | Dictionary of Pipeline Names the token is valid for, with the Deployments it is valid for (can be a wildcard) |
{
"AICentral": {
"AuthProviders": [
{
"Type": "AICentralJWT",
"Name": "hackathon",
"Properties": {
"TokenIssuer": "https://hackathon.auth.graeme.com",
"AdminKey": "<hard-to-guess-api-key>",
"ValidPipelines": {
"MyPipeline": ["Deployment1", "Deployment2"],
"MyPipeline2": ["*"]
}
}
}
],
"Endpoints": [
{
"Name": "MyPipeline",
"Host": "<host-name-we-listen-for-requests-on>",
"EndpointSelector": "name-from-above",
"AuthProvider": "hackathon"
},
{
"Name": "MyPipeline2",
"Host": "<host-name-we-listen-for-requests-on>",
"EndpointSelector": "name-from-above",
"AuthProvider": "hackathon-2"
}
]
}
}
# The above pipeline will expose an endpoint that can mint JWT's
curl -X POST https://<host-name-we-listen-for-requests-on>/aicentraljwt/<auth-provider-name>/tokens \
-H "api-key=<hard-to-guess-api-key>" \
-d "{ \"names\": [\"Consumer-1\", \"Consumer-2\", ...], \"ValidPipelines\": [\"MyPipeline\", ...], \"ValidFor\": \"00:24:00\" }"
A pipeline can run multiple steps. We currently provide steps for:
- Azure Monitor Logging
- Asp.Net Core Windowed Rate Limiting
- Token Based Rate Limiting
Property | Required | Description |
---|---|---|
Name | Yes | Name to refer to the step from a Pipeline |
Type | Yes | TokenBasedRateLimiting |
LimitType | Yes | PerConsumer to limit to each Consumer. PerAICentralEndpoint to protect the entire Endpoint |
MetricType | Yes | Tokens or Requests. |
Options.Window | Yes | How long to count for before resetting the counter |
Options.PermitLimit | Yes | How high to let the counter go before returning 429's to the Consumer |
{
"AICentral": {
"GenericSteps": [
{
"Type": "TokenBasedRateLimiting",
"Name": "token-rate-limiter",
"Properties": {
"LimitType": "PerConsumer|PerAICentralEndpoint",
"MetricType": "Tokens",
"Options": {
"Window": "00:00:10",
"PermitLimit": 100
}
}
},
{
"Type": "AspNetCoreFixedWindowRateLimiting",
"Name": "window-rate-limiter",
"Properties": {
"LimitType": "PerConsumer|PerAICentralEndpoint",
"MetricType": "Requests",
"Options": {
"Window": "00:00:10",
"PermitLimit": 100
}
}
}
],
"Endpoints": [
{
"Name": "MyPipeline",
"Host": "<host-name-we-listen-for-requests-on>",
"Steps": [
"token-rate-limiter",
"window-rate-limiter"
]
}
]
}
}
Requires the AICentral.Extensions.AzureMonitor package
dotnet package add AICentral.Extensions.AzureMonitor package
builder.Services.AddAICentral(
builder.Configuration,
startupLogger: new SerilogLoggerProvider(logger).CreateLogger("AICentralStartup"),
additionalComponentAssemblies:
[
typeof(AzureMonitorLoggerFactory).Assembly //AI Central Azure Monitor extension assembly
]);
Property | Required | Description |
---|---|---|
Name | Yes | Name to refer to the step from a Pipeline |
Type | Yes | TokenBasedRateLimiting |
WorkspaceId | Yes | Id of the Workspace from Azure |
Key | Yes | Key to post data to the Workspace. |
LogPrompt | Yes | True to log the text from the prompt |
LogResponse | Yes | True to log the text from the response |
{
"AICentral": {
"GenericSteps": [
{
"Type": "AzureMonitorLogger",
"Name": "azure-monitor-logger",
"Properties": {
"WorkspaceId": "<workspace-id>",
"Key": "<key>>",
"LogPrompt": true,
"LogResponse": true
}
}
],
"Endpoints": [
{
"Name": "MyPipeline",
"Host": "<host-name-we-listen-for-requests-on>",
"Steps": [
"azure-monitor-logger"
]
}
]
}
}
AI Central is extensible. You can bring your own implementations of Steps, Endpoints, Endpoint Selectors, Auth Providers. The only thing we specify is a pipeline must:
- Trigger based on an incoming Host Header
- Choose an Endpoint Selector
We default certain properties if you don't provide them.
- We default to expecting Azure Open AI type requests
- We default to Entra Pass Through auth for Azure Open AI backends
- We default to Anonymous Auth (we don't validate tokens for Azure Open AI)
TODO; We are working on adding an extensibility sample.