Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot deploy to k8s on AArch64 nodes using manifests in repo #266

Open
alexcreasy opened this issue Aug 13, 2024 · 9 comments · May be fixed by #267
Open

Cannot deploy to k8s on AArch64 nodes using manifests in repo #266

alexcreasy opened this issue Aug 13, 2024 · 9 comments · May be fixed by #267
Labels
bug Something isn't working dependencies Pull requests that update a dependency file

Comments

@alexcreasy
Copy link
Contributor

Describe the bug
I'm unable to deploy ModelRegistry to k8s on my apple silicon MacBook when following the official instructions here. This appears to be due to the mysql container image (mysql:8.0.3) defined in the db overlay kustomization only being compatible with amd64 architecture.

To Reproduce
Steps to reproduce the behavior:

  1. Deploy to a k8s cluster running on AArch64 (in my case I used a kind cluster on an apple silicon mac) by running:
    kubectl apply -k "https://github.com/kubeflow/model-registry/manifests/kustomize/overlays/db?ref=v0.2.3-alpha"
  2. Run: kubectl get pods -n kubeflow
  3. Observe that the pod is in a CrashLoopBackoff
  4. run kubectl logs model-registry-db-xxxxxx (where xxxx is found from the output of the above command)
  5. Observe an error message similar to:
    Error from server (BadRequest): container "db-container" in pod "model-registry-db-7c84c4cfc8-cn4cx" is waiting to start: trying and failing to pull image

Expected behavior
DB pod should not get into a CrashLoopBackoff and should be able to pull the image.

Additional context
I've tried changing the version in the manifest to mysql:8.0.39 - the closest version with an AArch64 image and I'm able to get MR running as expected. Would you consider upgrading the mysql image in the manifests to allow ModelRegistry to be run on macs with apple silicon?

@alexcreasy alexcreasy added the bug Something isn't working label Aug 13, 2024
@alexcreasy alexcreasy linked a pull request Aug 13, 2024 that will close this issue
3 tasks
@alexcreasy
Copy link
Contributor Author

As it's only a one line change, I've created a PR for this here: #267

Obviously, this may not be an appropriate change, so no hard feelings if it needs to be closed!

@tarilabs tarilabs added the dependencies Pull requests that update a dependency file label Aug 14, 2024
@tarilabs
Copy link
Member

This is a symptom of a broader issue that you cannot install Kubeflow on a KinD/Minikube on Mac, and this is impacting Model Registry because the Google's MLMD image is not available except for x86, and Model Registry wraps MLMD by design, here:

image: gcr.io/tfx-oss-public/ml_metadata_store_server:1.14.0

Here are some relevant discussions in KF community:

See also:

- https://github.com/google/ml-metadata/issues/143#issuecomment-1860066218
- https://github.com/google/ml-metadata/pull/188
- https://github.com/kubeflow/pipelines/issues/10309
- https://github.com/google/ml-metadata/pull/166
- https://github.com/google/ml-metadata/issues/190

If you are looking for a way to develop with Model Registry locally on Mac, I recommend using the docker-compose in the root of the repository. I personally use it with Podman Desktop and works; happy to share tips if you encounter issues. That works because at that point you can leverage the Rosetta emulation for the container with the podman machine.

@alexcreasy
Copy link
Contributor Author

Thanks for the references, it's useful to understand the ongoing discussions around KubeFlow / MLMD etc.

I'm specifically looking to develop against model registry deployed standalone (i.e. without kubeflow) on k8s. This is necessary for the UI BFF work as I need to able to access the service, retrieve labels, annotations etc.

I found that by merely committing a slight version bump to the MySql image (i.e. to the closest version with an aarch64 image) I'm able to deploy locally to kind without issue in this manner.

I'm using the kustomize config on my own fork for now, which is no real hardship for me, but it would be useful to be able to write a guide for some other developers I'd like to onboard to this workflow without them having to use a manifest on my fork if possible.

@rareddy
Copy link
Contributor

rareddy commented Aug 16, 2024

This is a symptom of a broader issue that you cannot install Kubeflow on a KinD/Minikube on Mac, and this is impacting Model Registry because the Google's MLMD image is not available except for x86, and Model Registry wraps MLMD by design

I would treat this as separate issue in general with MLMD. Maybe we running into end of rope with MLMD and its idiosyncrasies we need to decide soon if we want get out of it before we move away from "alpha" release and simplify the architecture.

I know this will alienate the KFP team from integration with Model Registry which was original purpose at first hand why we choose the MLMD. I really do think we need to do a quick study with KFP team to really access the footprint of MLMD they are using to see what it takes to replace MLMD or drop the aspiration to support KFP and choose our own path.

BTW, what I am suggesting is to keep the schema side of MLMD but bring the DB access directly into the Model Registry REST server and remove additional container for it. We can consider new explicit layer to integrate with KFP, rather than what was our original intension implicitly weave into Model Registry. wdyt? @tarilabs @dhirajsb @rimolive

@tarilabs
Copy link
Member

I'm able to deploy locally to kind without issue in this manner.

@alexcreasy this is interesting, can't recall if I ever tried with KinD. So can you kindly confirm you are using:

  • a M-chip Mac
  • Mac OSX
  • Podman
  • KinD

and then the only issue for you was allegedly the MySQL version?

@alexcreasy
Copy link
Contributor Author

alexcreasy commented Aug 16, 2024

@tarilabs Yes, that's right, I'm using:

  • Macbook Pro 14" with m3 pro cpu
  • macOS Sonoma 14.6.1
  • Podman desktop v1.12.0 (Podman v5.2.0)
  • KinD 0.23.0

MR is currently using the 8.0.3 MySQL image -- you can see there's no arm64 build: https://hub.docker.com/layers/library/mysql/8.3/images/sha256-f9097d95a4ba5451fff79f4110ea6d750ac17ca08840f1190a73320b84ca4c62?context=explore

The latest 8.0.x image does have one: https://hub.docker.com/layers/library/mysql/8.0.39/images/sha256-7b4902b99989615deaa12a3af4e32f21e9b32a862d6856d121dd44ca71c166ed?context=explore

I haven't done any deep testing yet, I'm still finding my way around using the project, but to confirm, I was able to deploy MR to KinD locally on this MacBook without any errors showing on the deployed pods. I was then able to smoke test by curl the endpoint shown in the getting started guide and received a 200 response.

@tarilabs
Copy link
Member

this is awesome to hear, thank you @alexcreasy ; I believe I tried only similar combination before podman wired Rosetta, so I'm happy to hear nowadays it gets a lot simpler

dhirajsb pushed a commit to dhirajsb/model-registry-kfp that referenced this issue Aug 30, 2024
As now covered as Epic in JIRA
@tarilabs
Copy link
Member

I want to augment the comment in #267 (review)
with the information that the default vanilla installation uses a custom image of mysql from google container registry:

https://github.com/kubeflow/manifests/blob/a38c2be88fbafb0844c0231f0062e4b3719d4737/apps/pipeline/upstream/third-party/mysql/base/mysql-deployment.yaml#L51

👉 gcr.io/ml-pipeline/mysql:8.0.26

(source)

@dmartinol
Copy link

I ran a successful experiment on a fresh M3 Mac and then defined a procedure here, HTH

@tarilabs tarilabs linked a pull request Nov 22, 2024 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants