Skip to content

Commit

Permalink
Merge pull request #72 from mlinfra-io/add-mlflow-to-kubernetes
Browse files Browse the repository at this point in the history
add-mlflow-to-kubernetes
  • Loading branch information
aliabbasjaffri authored Mar 6, 2024
2 parents 762ff10 + 788563c commit db3d561
Show file tree
Hide file tree
Showing 11 changed files with 641 additions and 0 deletions.
22 changes: 22 additions & 0 deletions docs/code/aws/kubernetes.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,17 @@
`kubernetes` deploys MLOps `stack` on top of Cloud provider's kubernetes. In case of AWS, its EKS.


#### Complete

===+ "Simple Deployment Configuration"
```yaml
--8<-- "docs/examples/kubernetes/complete/aws-complete.yaml"
```
=== "Advanced Deployment Configuration"
```yaml
--8<-- "docs/examples/kubernetes/complete/aws-complete-advanced.yaml"
```

#### lakefs

===+ "Simple Deployment Configuration"
Expand All @@ -11,3 +22,14 @@
```yaml
--8<-- "docs/examples/kubernetes/lakefs/aws-lakefs-advanced.yaml"
```

#### mlflow

===+ "Simple Deployment Configuration"
```yaml
--8<-- "docs/examples/kubernetes/mlflow/aws-mlflow.yaml"
```
=== "Advanced Deployment Configuration"
```yaml
--8<-- "docs/examples/kubernetes/mlflow/aws-mlflow-advanced.yaml"
```
45 changes: 45 additions & 0 deletions examples/kubernetes/complete/aws-complete-advanced.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
name: aws-complete-k8s
provider:
name: aws
account_id: "793009824629"
region: "eu-central-1"
deployment:
type: kubernetes
config:
vpc:
create_database_subnets: true
enable_nat_gateway: true
one_nat_gateway_per_az: false
kubernetes:
k8s_version: "1.28"
cluster_endpoint_public_access: true
spot_instance: false
tags:
data_versioning: "lakefs"
node_groups:
- name: k8s-node-group
instance_types:
- t3.medium
desired_size: 1
min_size: 1
max_size: 3
disk_size: 20
stack:
- data_versioning:
name: lakefs
params:
remote_tracking: true
database_type: "postgres"
tags:
database_type: "postgres"
data_versioning: "lakefs"
remote_tracking: true
- experiment_tracking:
name: mlflow
params:
remote_tracking: true
mlflow_data_bucket_name: "mlflow-bucket"
tags:
database_type: "postgres"
experiment_tracking: "mlflow"
remote_tracking: true
12 changes: 12 additions & 0 deletions examples/kubernetes/complete/aws-complete.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
name: aws-complete-k8s
provider:
name: aws
account_id: "793009824629"
region: "eu-central-1"
deployment:
type: kubernetes
stack:
- data_versioning:
name: lakefs
- experiment_tracking:
name: mlflow
2 changes: 2 additions & 0 deletions examples/kubernetes/mlflow/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
* The vpc needs to have a `nat gateway configured <https://repost.aws/questions/QU8XmyDQZOQkq9SSHoIM3tJg/setting-up-an-eks-node-group-on-a-private-subnet>`_ to allow the nodegroups to be able to find the eks cluster
* You can choose between creating a single nat gatway or one nat gateway per az.
36 changes: 36 additions & 0 deletions examples/kubernetes/mlflow/aws-mlflow-advanced.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: aws-mlflow-k8s
provider:
name: aws
account_id: "793009824629"
region: "eu-central-1"
deployment:
type: kubernetes
config:
vpc:
create_database_subnets: true
enable_nat_gateway: true
one_nat_gateway_per_az: false
kubernetes:
k8s_version: "1.28"
cluster_endpoint_public_access: true
spot_instance: false
tags:
experiment_tracking: "mlflow"
node_groups:
- name: mlflow-node-group
instance_types:
- t3.medium
desired_size: 1
min_size: 1
max_size: 3
disk_size: 20
stack:
- experiment_tracking:
name: mlflow
params:
remote_tracking: true
mlflow_data_bucket_name: "mlflow-bucket"
tags:
database_type: "postgres"
experiment_tracking: "mlflow"
remote_tracking: true
10 changes: 10 additions & 0 deletions examples/kubernetes/mlflow/aws-mlflow.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
name: aws-mlflow-k8s
provider:
name: aws
account_id: "793009824629"
region: "eu-central-1"
deployment:
type: kubernetes
stack:
- experiment_tracking:
name: mlflow
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
mlflow does not provide official support for helm chart (see [here](https://github.com/mlflow/mlflow/issues/6118))
The other two candidates for deploying mlflow using helm charts are
- [mlflow community chart](https://github.com/community-charts/helm-charts/tree/main/charts/mlflow)
- [bitnami helm chart](https://github.com/bitnami/charts/tree/main/bitnami/mlflow)

mlflow community chart has not been maintained for over a year now.
It has better api for deployment compared to bitnami chart.
Deploying bitnami chart was so much pain that i decided to go ahead with community chart for now.
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
inputs:
- name: vpc_id
user_facing: false
description: VPC id
value: module.vpc.vpc_id
default: None
- name: vpc_cidr_block
user_facing: false
description: VPC CIDR block required for SG of RDS
value: module.vpc.vpc_cidr_block
default: None
- name: db_subnet_group_name
user_facing: false
description: DB Subnet group name
value: module.vpc.database_subnet_group
default: None
- name: oidc_provider_arn
user_facing: false
description: The ARN of the OIDC provider to use for authentication
value: module.eks.oidc_provider_arn
default: None
- name: oidc_provider
user_facing: false
description: The OIDC provider to use for authentication
value: module.eks.oidc_provider
default: None
- name: remote_tracking
user_facing: true
description: Deploys an external Postgres RDS server as backend store and S3 as artifact store for mlflow.
default: true
- name: rds_instance_class
user_facing: true
description: RDS instance class to deploy mlflow backend on
default: "db.t4g.medium"
- name: mlflow_chart_version
user_facing: true
description: mlflow Chart version. See here for more details; https://artifacthub.io/packages/helm/mlflow/mlflow
default: "1.0.8"
- name: service_account_namespace
user_facing: true
description: The namespace where the service account would be installed
default: mlflow
- name: service_account_name
user_facing: true
description: The name of the service account to use for mlflow
default: mlflow-sa
- name: mlflow_data_bucket_name
user_facing: true
description: mlflow S3 data bucket name
default: "mlflow-data-bucket"
- name: tags
user_facing: true
description: Tags for mlflow module
default:
data_versioning: "mlflow"
outputs:
clouds:
- aws
Loading

0 comments on commit db3d561

Please sign in to comment.