Releases
v0.0.1
v0.0.1 - 2023-11-02
Changelog
Features π
64c7a89 feat: Add accessmode, custom image, and image secret (#98 )
874df03 feat: add PodTemplate support in workspace (#96 )
e912c57 feat: Added Falcon 40B Deployment (#90 )
0223a50 feat: Include Falcon Deployment on KAITO (#80 )
4df4266 feat: Auto image build (#67 )
ce26b36 feat: Make sure machine name is unique (#84 )
b067e76 feat: Add GPU plugins to chart (#83 )
d1780be feat: Remove DADI code (#77 )
575c97c feat: Automated Preset Docker Image Building (#57 )
4801723 feat: Add Deployments for E2E Tests (#60 )
a7eb78c feat: Benchmarking falcon inference (#52 )
980c198 feat: Added Falcon Model Inference API (#51 )
e030ed1 feat: Merge kdm-preset-models into presets folder (#50 )
94a7509 feat: Change CRD to support model access mode (#49 )
c84563d feat: Add statefulsets for distrbuted model inference (#48 )
a7ea650 feat: Add preset inference struct and support storage (#47 )
fd11734 feat: Add validation checks for immutable fields (#44 )
87b69ec feat: Add validation webhook scaffolding code (#39 )
2a5372d feat: generalize check resource status function (#8 )
cfe49a9 revert: "feat: Add skaffold code for webhook (#35 )" (#36 )
3490bcc feat: Add skaffold code for webhook (#35 )
b4aed40 feat: Add Inference conditions (#24 )
bab4ec1 feat: Merge machine conditions (#18 )
98ef0e2 feat: Add default value if count not set (#16 )
dd21b97 feat: Add timeout for machine check status and return when SKU is not available (#13 )
09ef00f feat: Implement inference deployments for llama2 (#12 )
54b302e feat: Add more status for machine workflow (#8 )
4c86b7a feat: Add workflow status (#7 )
d23a5d7 feat: Implement Machine creation (#6 )
Bug Fixes π
098f032 fix: conditional run matrix (#129 )
b556e6a fix: small nits (#128 )
91a0de5 fix: GitHub runner name needs quotes (#127 )
d0ba4d5 fix: inference fault tolerance (#108 )
ab74683 fix: rename webhook secret name and remove leader election (#126 )
0a5a9a2 fix: remove unused RBAC permission for workspace create/delete (#124 )
9495158 fix: remove unused RBAC permissions and port (#122 )
4e5c9b1 fix: change chart to use MCR and revise README (#121 )
5356685 fix: revise README.md to add cluster name to chart (#120 )
ee48f1e fix: ensure check inference workload status (#112 )
1815baa fix: aimodelsregistry ACR Push on Dispatch (#110 )
1d8a582 fix: Use runner 0 for git checkout (#109 )
ffd96e6 fix: tag name (#105 )
e81db11 fix: filepath for inference file (#104 )
7447413 fix: change key to kaito from gpu-provisioner (#101 )
1844db2 fix: change Preset and Template to pointer fields (#99 )
47305bc fix: fix a bug in counting existing nodes (#88 )
14377f2 fix: make sure workspace gc cleans machines (#87 )
2cc68b8 fix: Update nvidia-device-plugin-ds.yaml (#86 )
8fdb4b2 fix: Update e2e-preset-test.yml (#85 )
b8ab612 fix: Update README.md (#82 )
084d633 fix: Update README.md (#81 )
530dfaf fix: Update the go package module (#58 )
7a3c057 fix: rename the pipeline script (#43 )
ea150fc fix: optimize the use of crd status (#41 )
7a8932e fix: Only update status when not matching (#40 )
a3e06b0 fix: include torch params in inference command (#38 )
24f4433 fix: Update kdm.io_workspaces.yaml (#7 )
3c69ae1 fix: add storage for llama 70b (#1 )
Code Refactoring π
Continuous Integration π
3921d44 ci: Support multi-arch images (#115 )
bf57498 ci: Add gpu-provisioner to the e2e pipeline (#103 )
4957501 ci: Add KAITO workspace e2e pipeline (#89 )
64e43d2 ci: Add Publish to ACR GitHub workflow (#42 )
0f14f67 ci: Fix checkout action
6f02a50 ci: Add unit tests Github workflow (#37 )
004417a ci: Update create tag and upload artifacts (#32 )
e8df268 ci: Add goreleaser.yml
2a1b7fc ci: Add upload/download artifact to get pr context (#23 )
469a5ad ci: Remove the workflow name from action-download-artifact action
2809d6b ci: Create helm chart pipeline and update release workflow (#5 )
Maintenance π§
413de7a chore: revise README to include quick start (#123 )
d42fa0b chore: bump google.golang.org/grpc from 1.55.0 to 1.56.3 (#106 )
7f6db90 chore: update the latest gpu-provisioner chart (#114 )
6a37070 chore: revise README.md to refine installation steps (#111 )
969d773 chore: update README for new installation guidence (#107 )
b1a1dc1 chore: follow Azure OSS codes (#102 )
8da899b chore: fix update conflict and clean up logs (#100 )
bb3fa2d chore: put everything in examples folder (#94 )
1be4d68 chore: rename accessmode type in CRD (#95 )
5eb9a59 chore: cleanup presets folder (#93 )
82b7398 chore: change k8sresources to resources (#92 )
71b59bb chore: Refactor calculating number of new machines (#91 )
421cc6d chore: bump thehanimo/pr-title-checker from 1.4.0 to 1.4.1 (#66 )
64daef4 chore: bump golang.org/x/net from 0.10.0 to 0.17.0 (#61 )
05df63b chore: bump step-security/harden-runner from 2.5.1 to 2.6.0 (#59 )
735226f chore: Organize preset code (#53 )
b78ae6d chore: bump goreleaser/goreleaser-action from 4 to 5 (#45 )
9e8082a chore: bump docker/login-action from 2.2.0 to 3.0.0 (#46 )
5c17266 chore: bump actions/checkout from 3 to 4 (#3 )
Revert Change βοΈ
b6aa55d revert: "release: update manifest and helm charts for v0.0.1 " (#33 )
a3e8ce2 revert: "release: update manifest and helm charts for v0.0.1 " (#29 )
a3b70a5 revert: "release: update manifest and helm charts for v0.0.1" (#27 )
a92ed79 revert: "release: update manifest and helm charts for v0.0.1" (#25 )
Security Fix π‘οΈ
Testing π
df5aa6b test: added ut for createAndValidateNode (#125 )
You canβt perform that action at this time.