Skip to content

Commit

Permalink
Merge branch 'main' into marvinbuss/add_ml_part
Browse files Browse the repository at this point in the history
  • Loading branch information
marvinbuss authored Feb 11, 2024
2 parents 7fbef18 + 9727d16 commit 3dcccae
Show file tree
Hide file tree
Showing 15 changed files with 227 additions and 77 deletions.
1 change: 0 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ repos:
- id: end-of-file-fixer
- id: trailing-whitespace
- id: check-json
- id: check-yaml
- id: pretty-format-json
args: ["--indent", "2", "--autofix", "--no-sort-keys"]
- repo: https://github.com/PyCQA/isort
Expand Down
28 changes: 17 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,42 @@
# Azure Enterprise AI & ML

This repository showcases how to do MLOps with Azure Machine Learning and the latest Azure CLI v2. The project showcases how project teams can create a secure machine learning environment using Infrastructure as Code (Terrform), how to structure their machine learning projects in git, how to build reusable components and pipelines and how to productionize these projects. The repository covers the following project phases:

1. Data Discovery,
2. Experimentation,
3. Training,
4. Deployment and
5. Monitoring.
Repository to showcase how to implement enterprise ready ML & AI use-cases on Azure.

## Architecture

![Architecture](/docs/images/architecture_single_environment.png)
The following infrastructure is created by this respository:

![Architecture](/docs/architecture.png)

The image above depicts the architecture of a single environment (dev/test/prod) within a Cloud Scale Data Landing Zone. Core network resources such as vnets, NSGs and Route Tables ae usually provided by a platform team and are used to connect the Azure PaaS services to the corporate network. Shared data lakes within the data landing zone are also used by the data science project team to consume data and produce new data products that can be shared with other teams internally. A shared application layer within the data landing zone can be used by the team to make use of other tools such as Azure Databricks for data engineering purposes.

All environment of such a data science setup should reside within the production data landing zone. Different resource groups within the production Data Landing Zone should be used to isolate them from one another. This is necessary, as data science teams usually require access to real production data for their use cases. Copying production data into a lower environment is usually also a problematic process or obfuscates the data significantly making it impossible to create meaningful results for data scientists.

## Workflow
## MLOps

This repository also showcases how to do MLOps with Azure Machine Learning and the latest Azure CLI v2. The project showcases how project teams can create a secure machine learning environment using Infrastructure as Code (Terrform), how to structure their machine learning projects in git, how to build reusable components and pipelines and how to productionize these projects. The repository covers the following project phases:

1. Data Discovery,
2. Experimentation,
3. Training,
4. Deployment and
5. Monitoring.

### Workflow

![Data Science Workflow](/docs/images/workflow.png)

The workflow diagram above shows the end-to-end lifecycle of a data science process as well as the iterative development processes that need to be followed.

## Model Promotion Scenarios
### Model Promotion Scenarios

![Model Promotion Process](/docs/images/model_promotion_scenarios.png)

In order to save cost and not retrain models constantly, Machine learning models can be promoted from a lower into a higher environment (from dev to test to prod) after a successful training run. However, the promotion process can vary significantly as it highly depends where and how the model is expected to run. In some scenarios, the model is expected to be promoted into an application environment, where the model can then be hosted on different compute environments such as Azure ML Online Endpoints or other services that can host containerized applications such as AKS, Azure Webs Apps or Azure Container Apps.

Other scenarios require the model to be hosted in the analytical data platform. Within the data platform, the model might be promoted into in a container platform to serve real-time requests via REST API Calls or be promoted as part of a batch process to score data based on a schedule or other kinds of triggers that kickstart the processing of a larger data frame. Similar to the application scenarios, Azure ML Online Endpoints, AKS, Azure Web Apps or Azure Container Apps can be used for hosting the model to serve REST API calls. For batch scenarios, the model may be promoted as part of an Azure ML Batch Endpoint or to a database environment to score data in batches directly within the database server.

## Model Promotion Architecture
### Model Promotion Architecture

In this repository, we are focussing on the scenario, where a real-time endpoint should be hosted in the data platform environment. Therefore, we are looking at the scenario, how a model trained in an Azure Machine Learning workspace in the dev environment can be promoted and rolled out into a higher environment using GitHub Actions, the Azure ML SDK v2 in combination with Bicep IaC.

Expand Down
19 changes: 15 additions & 4 deletions code/infra/locals.tf
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ locals {
"vscode006" = {
type = "FQDN"
category = "UserDefined"
destination = "raw.githubusercontent.com" // "/microsoft/vscode-tools-for-ai/master/azureml_remote_websocket_server/*"
destination = "raw.githubusercontent.com" # "/microsoft/vscode-tools-for-ai/master/azureml_remote_websocket_server/*"
status = "Active"
},
"vscode007" = {
Expand Down Expand Up @@ -272,7 +272,18 @@ locals {
sparkEnabled = true
sparkStatus = "Active"
}
}
},
# "${azurerm_monitor_private_link_scope.mpls.name}-queue" = { # Not supported in AML today
# type = "PrivateEndpoint"
# category = "UserDefined"
# status = "Active"
# destination = {
# serviceResourceId = azurerm_monitor_private_link_scope.mpls.id
# subresourceTarget = "azuremonitor"
# sparkEnabled = true
# sparkStatus = "Active"
# }
# }
}
search_service_machine_learning_workspace_outbound_rules = {
"${var.search_service_enabled ? azurerm_search_service.search_service[0].name : ""}-searchService" = {
Expand All @@ -288,12 +299,12 @@ locals {
}
}
open_ai_machine_learning_workspace_outbound_rules = {
"${var.open_ai_enabled ? azurerm_cognitive_account.cognitive_account[0].name : ""}-account" = {
"${var.open_ai_enabled ? azurerm_cognitive_account.cognitive_account_openai[0].name : ""}-account" = {
type = "PrivateEndpoint"
category = "UserDefined"
status = "Active"
destination = {
serviceResourceId = var.open_ai_enabled ? azurerm_cognitive_account.cognitive_account[0].id : ""
serviceResourceId = var.open_ai_enabled ? azurerm_cognitive_account.cognitive_account_openai[0].id : ""
subresourceTarget = "account"
sparkEnabled = true
sparkStatus = "Active"
Expand Down
8 changes: 4 additions & 4 deletions code/infra/machinelearningconnections.tf
Original file line number Diff line number Diff line change
Expand Up @@ -20,25 +20,25 @@ resource "azapi_resource" "machine_learning_workspace_connection_search" {
})
}

resource "azapi_resource" "machine_learning_workspace_connection_open_ai" {
resource "azapi_resource" "machine_learning_workspace_connection_openai" {
count = var.open_ai_enabled ? 1 : 0

type = "Microsoft.MachineLearningServices/workspaces/connections@2023-06-01-preview"
name = azurerm_cognitive_account.cognitive_account[0].name
name = azurerm_cognitive_account.cognitive_account_openai[0].name
parent_id = azurerm_machine_learning_workspace.machine_learning_workspace.id

body = jsonencode({
properties = {
authType = "ApiKey"
category = "AzureOpenAI"
credentials = {
key = azurerm_cognitive_account.cognitive_account[0].primary_access_key
key = azurerm_cognitive_account.cognitive_account_openai[0].primary_access_key
}
metadata = {
ApiVersion = "2023-07-01-preview"
ApiType = "azure"
}
target = "https://${azurerm_cognitive_account.cognitive_account[0].name}.openai.azure.com/"
target = "https://${azurerm_cognitive_account.cognitive_account_openai[0].name}.openai.azure.com/"
}
})
}
45 changes: 45 additions & 0 deletions code/infra/monitorprivatelinkscope.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
resource "azurerm_monitor_private_link_scope" "mpls" {
name = "${local.prefix}-ampls001"
resource_group_name = data.azurerm_resource_group.resource_group.name
tags = var.tags
}

resource "azurerm_monitor_private_link_scoped_service" "mpls_application_insights" {
name = "ampls-${azurerm_application_insights.application_insights.name}"
resource_group_name = azurerm_monitor_private_link_scope.mpls.resource_group_name
scope_name = azurerm_monitor_private_link_scope.mpls.name
linked_resource_id = azurerm_application_insights.application_insights.id
}

resource "azurerm_monitor_private_link_scoped_service" "mpls_log_analytics_workspace" {
name = "ampls-${azurerm_log_analytics_workspace.log_analytics_workspace.name}"
resource_group_name = azurerm_monitor_private_link_scope.mpls.resource_group_name
scope_name = azurerm_monitor_private_link_scope.mpls.name
linked_resource_id = azurerm_log_analytics_workspace.log_analytics_workspace.id
}

resource "azurerm_private_endpoint" "mpls_private_endpoint" {
name = "${azurerm_monitor_private_link_scope.mpls.name}-pe"
location = var.location
resource_group_name = azurerm_monitor_private_link_scope.mpls.resource_group_name
tags = var.tags

custom_network_interface_name = "${azurerm_monitor_private_link_scope.mpls.name}-nic"
private_service_connection {
name = "${azurerm_monitor_private_link_scope.mpls.name}-pe"
is_manual_connection = false
private_connection_resource_id = azurerm_monitor_private_link_scope.mpls.id
subresource_names = ["azuremonitor"]
}
subnet_id = data.azurerm_subnet.subnet.id
private_dns_zone_group {
name = "${azurerm_monitor_private_link_scope.mpls.name}-arecord"
private_dns_zone_ids = [
var.private_dns_zone_id_monitor,
var.private_dns_zone_id_oms_opinsights,
var.private_dns_zone_id_ods_opinsights,
var.private_dns_zone_id_automation_agents,
var.private_dns_zone_id_blob
]
}
}
66 changes: 28 additions & 38 deletions code/infra/openai.tf
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
resource "azurerm_cognitive_account" "cognitive_account" {
resource "azurerm_cognitive_account" "cognitive_account_openai" {
count = var.open_ai_enabled ? 1 : 0

name = "${local.prefix}-cog001"
location = var.location
name = "${local.prefix}-aoai001"
location = var.location_openai
resource_group_name = data.azurerm_resource_group.resource_group.name
tags = var.tags
identity {
Expand All @@ -11,8 +11,11 @@ resource "azurerm_cognitive_account" "cognitive_account" {

custom_subdomain_name = "${local.prefix}-cog001"
dynamic_throttling_enabled = false
fqdns = [
trimsuffix(replace(azurerm_storage_account.storage.primary_blob_endpoint, "https://", ""), "/")
fqdns = var.search_service_enabled ? [
trimsuffix(replace(azurerm_storage_account.storage.primary_blob_endpoint, "https://", ""), "/"),
"${azurerm_search_service.search_service[0].name}.search.windows.net"
] : [
trimsuffix(replace(azurerm_storage_account.storage.primary_blob_endpoint, "https://", ""), "/"),
]
kind = "OpenAI"
local_auth_enabled = true
Expand All @@ -21,40 +24,31 @@ resource "azurerm_cognitive_account" "cognitive_account" {
ip_rules = []
}
outbound_network_access_restricted = true
public_network_access_enabled = false
public_network_access_enabled = true
sku_name = "S0"
}

resource "azapi_resource" "cognitive_service_open_ai_model_ada" {
resource "azapi_update_resource" "cognitive_account_update" {
count = var.open_ai_enabled ? 1 : 0

type = "Microsoft.CognitiveServices/accounts/deployments@2023-05-01"
name = "text-embedding-ada-002"
parent_id = azurerm_cognitive_account.cognitive_account[0].id
type = "Microsoft.CognitiveServices/accounts@2023-10-01-preview"
resource_id = azurerm_cognitive_account.cognitive_account_openai[0].id

body = jsonencode({
sku = {
name = "Standard"
capacity = 60
}
properties = {
model = {
format = "OpenAI"
name = "text-embedding-ada-002"
version = "2"
networkAcls = {
bypass = "AzureServices"
}
raiPolicyName = "Microsoft.Default"
versionUpgradeOption = "OnceNewDefaultVersionAvailable"
}
})
}

resource "azapi_resource" "cognitive_service_open_ai_model_gtt_35" {
resource "azapi_resource" "cognitive_service_open_ai_model_ada" {
count = var.open_ai_enabled ? 1 : 0

type = "Microsoft.CognitiveServices/accounts/deployments@2023-05-01"
name = "gpt-35-turbo"
parent_id = azurerm_cognitive_account.cognitive_account[0].id
type = "Microsoft.CognitiveServices/accounts/deployments@2023-10-01-preview"
name = "text-embedding-ada-002"
parent_id = azurerm_cognitive_account.cognitive_account_openai[0].id

body = jsonencode({
sku = {
Expand All @@ -64,30 +58,26 @@ resource "azapi_resource" "cognitive_service_open_ai_model_gtt_35" {
properties = {
model = {
format = "OpenAI"
name = "gpt-35-turbo"
version = "0301"
name = "text-embedding-ada-002"
version = "2"
}
raiPolicyName = "Microsoft.Default"
versionUpgradeOption = "OnceNewDefaultVersionAvailable"
}
})

depends_on = [
azapi_resource.cognitive_service_open_ai_model_ada
]
}

data "azurerm_monitor_diagnostic_categories" "diagnostic_categories_cognitive_service" {
count = var.open_ai_enabled ? 1 : 0

resource_id = azurerm_cognitive_account.cognitive_account[0].id
resource_id = azurerm_cognitive_account.cognitive_account_openai[0].id
}

resource "azurerm_monitor_diagnostic_setting" "diagnostic_setting_cognitive_service" {
count = var.open_ai_enabled ? 1 : 0

name = "logAnalytics"
target_resource_id = azurerm_cognitive_account.cognitive_account[0].id
target_resource_id = azurerm_cognitive_account.cognitive_account_openai[0].id
log_analytics_workspace_id = azurerm_log_analytics_workspace.log_analytics_workspace.id

dynamic "enabled_log" {
Expand All @@ -111,23 +101,23 @@ resource "azurerm_monitor_diagnostic_setting" "diagnostic_setting_cognitive_serv
resource "azurerm_private_endpoint" "cognitive_service_private_endpoint" {
count = var.open_ai_enabled ? 1 : 0

name = "${azurerm_cognitive_account.cognitive_account[0].name}-pe"
name = "${azurerm_cognitive_account.cognitive_account_openai[0].name}-pe"
location = var.location
resource_group_name = azurerm_cognitive_account.cognitive_account[0].resource_group_name
resource_group_name = azurerm_cognitive_account.cognitive_account_openai[0].resource_group_name
tags = var.tags

custom_network_interface_name = "${azurerm_cognitive_account.cognitive_account[0].name}-nic"
custom_network_interface_name = "${azurerm_cognitive_account.cognitive_account_openai[0].name}-nic"
private_service_connection {
name = "${azurerm_cognitive_account.cognitive_account[0].name}-pe"
name = "${azurerm_cognitive_account.cognitive_account_openai[0].name}-pe"
is_manual_connection = false
private_connection_resource_id = azurerm_cognitive_account.cognitive_account[0].id
private_connection_resource_id = azurerm_cognitive_account.cognitive_account_openai[0].id
subresource_names = ["account"]
}
subnet_id = var.subnet_id
dynamic "private_dns_zone_group" {
for_each = var.private_dns_zone_id_open_ai == "" ? [] : [1]
content {
name = "${azurerm_cognitive_account.cognitive_account[0].name}-arecord"
name = "${azurerm_cognitive_account.cognitive_account_openai[0].name}-arecord"
private_dns_zone_ids = [
var.private_dns_zone_id_open_ai
]
Expand Down
22 changes: 19 additions & 3 deletions code/infra/roleassignments_openai.tf
Original file line number Diff line number Diff line change
@@ -1,7 +1,23 @@
resource "azurerm_role_assignment" "uai_role_assignment_storage_blob_reader" {
resource "azurerm_role_assignment" "cognitive_account_openai_role_assignment_storage_blob_contributor" {
count = var.open_ai_enabled ? 1 : 0

scope = azurerm_storage_account.storage.id
role_definition_name = "Storage Blob Data Reader"
principal_id = azurerm_cognitive_account.cognitive_account[0].identity[0].principal_id
role_definition_name = "Storage Blob Data Contributor"
principal_id = azurerm_cognitive_account.cognitive_account_openai[0].identity[0].principal_id
}

resource "azurerm_role_assignment" "cognitive_account_openai_role_assignment_search_index_data_reader" {
count = var.open_ai_enabled && var.search_service_enabled ? 1 : 0

scope = azurerm_search_service.search_service[0].id
role_definition_name = "Search Index Data Reader"
principal_id = azurerm_cognitive_account.cognitive_account_openai[0].identity[0].principal_id
}

resource "azurerm_role_assignment" "cognitive_account_openai_role_assignment_search_service_contributor" {
count = var.open_ai_enabled && var.search_service_enabled ? 1 : 0

scope = azurerm_search_service.search_service[0].id
role_definition_name = "Search Service Contributor"
principal_id = azurerm_cognitive_account.cognitive_account_openai[0].identity[0].principal_id
}
23 changes: 23 additions & 0 deletions code/infra/roleassignments_search.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
resource "azurerm_role_assignment" "search_role_assignment_storage_blob_contributor" {
count = var.search_service_enabled ? 1 : 0

scope = azurerm_storage_account.storage.id
role_definition_name = "Storage Blob Data Contributor"
principal_id = azurerm_search_service.search_service[0].identity[0].principal_id
}

resource "azurerm_role_assignment" "search_role_assignment_storage_reader_and_data_access" {
count = var.search_service_enabled ? 1 : 0

scope = azurerm_storage_account.storage.id
role_definition_name = "Reader and Data Access"
principal_id = azurerm_search_service.search_service[0].identity[0].principal_id
}

resource "azurerm_role_assignment" "search_role_assignment_openai_contributor" {
count = var.open_ai_enabled && var.search_service_enabled ? 1 : 0

scope = azurerm_cognitive_account.cognitive_account_openai[0].id
role_definition_name = "Cognitive Services OpenAI Contributor"
principal_id = azurerm_search_service.search_service[0].identity[0].principal_id
}
4 changes: 2 additions & 2 deletions code/infra/roleassignments_uai.tf
Original file line number Diff line number Diff line change
Expand Up @@ -77,9 +77,9 @@ resource "azurerm_role_assignment" "uai_role_assignment_search_service_contribut
principal_id = azurerm_user_assigned_identity.user_assigned_identity.principal_id
}

resource "azurerm_role_assignment" "uai_role_assignment_cognitive_account_contributor" {
resource "azurerm_role_assignment" "uai_role_assignment_cognitive_account_openai_contributor" {
count = var.open_ai_enabled ? 1 : 0
scope = azurerm_cognitive_account.cognitive_account[0].id
scope = azurerm_cognitive_account.cognitive_account_openai[0].id
role_definition_name = "Contributor"
principal_id = azurerm_user_assigned_identity.user_assigned_identity.principal_id
}
Expand Down
Loading

0 comments on commit 3dcccae

Please sign in to comment.