Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure kfp-ui can show logs from Argo #582

Open
kimwnasptd opened this issue Nov 13, 2024 · 2 comments
Open

Ensure kfp-ui can show logs from Argo #582

kimwnasptd opened this issue Nov 13, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@kimwnasptd
Copy link
Contributor

Context

This is in order to resolve canonical/bundle-kubeflow#1120

The KFP frontend has an environment variable ARGO_ARCHIVE_LOGS, that is used by the frontend to know if it should proxy logs from MinIO. More on this can be found in canonical/bundle-kubeflow#1120 (comment)

We'll need to introduce a new config option, that will be configuring this env var to True by default, to ensure the UI will be fetching logs from Argo by default.

An extra step we'll also need to do will be to have one more config option for disabling the GKE metadata, which was making the upstream container constantly restart canonical/bundle-kubeflow#1120 (comment)

What needs to get done

  1. Ensure we can set the ARGO_ARCHIVE_LOGS env var in kpf-ui
  2. Ensure we can set the DISABLE_GKE_METADATA env var in kfp-ui

Definition of Done

  1. The kfp-ui can fetch logs from MinIO, after applying configurations if needed
@kimwnasptd kimwnasptd added the enhancement New feature or request label Nov 13, 2024
Copy link

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-6542.

This message was autogenerated

@NohaIhab
Copy link
Contributor

Reproduce the error

I was able to reproduce the error by the following steps:

  1. Deploy kfp bundle latest/edge + kubeflow dashboard + dex-auth + oidc
  2. Create an experiment and run from the example pipeline Data passing pipeline
  3. Wait for the run to finish and view the logs from the ui -> logs are there
  4. Delete the argo workflow from the user namespace (this simulates the workflow being GCed, we can do the same by modifying the TTL_SECONDS_AFTER_WORKFLOW_FINISH env of kfp-persistence to a short time e.g. 60) then view the logs -> logs cannot be viewed with error message:
    Screenshot from 2024-11-20 14-43-01

Test the fix

To test the fix suggested in canonical/bundle-kubeflow#1120, I:

  1. Deployed kfp from feat: add envs to ensure kfp-ui can show logs #605 rebased on the branch from chore: Upgrade manifests to 2.3.0 #583. This way the bundle has the following changes:
  • kfp manifests and images upgraded to 2.3.0
  • new configs introduced to kfp-ui that control the following envs:
    • ARGO_ARCHIVE_LOGS defaulting to true
    • DISABLE_GKE_METADATA defaulting to true
  1. Followed the steps 2-4 from the above section

Results

logs cannot be viewed, with a different error message this time:
Screenshot from 2024-11-20 15-09-46

The error is Could not get main container logs: S3Error: The specified key does not exist.

Debugging

From the log above Could not get main container logs: S3Error: The specified key does not exist, it looks like an issue getting the persisted logs from the S3 Object storage i.e. MinIO

Looking at the logs from kfp-ui pod, the ml-pipeline-ui container has the following log:

024-11-20T15:27:01.852Z [ml-pipeline-ui] Getting logs for pod, tutorial-data-passing-tcvbx-system-container-impl-1663429819, from mlpipeline/artifacts/tutorial-data-passing-tcvbx/2024/11/20/tutorial-data-passing-tcvbx-system-container-impl-1663429819/main.log.

we can see there the request to minio is trying to fetch from the path mlpipeline/artifacts/tutorial-data-passing-tcvbx/2024/11/20/tutorial-data-passing-tcvbx-system-container-impl-1663429819/main.log

Now, let's get inside the minio container to see if we can find the persisted data, and if it's at the expected path in the bucket:

kubectl exec -it minio-0 -n kubeflow -- /bin/bash
Defaulted container "minio" out of: minio, juju-pod-init (init)
[root@minio-0 /]# ls data/
mlpipeline
[root@minio-0 /]# ls data/mlpipeline/
pipelines  tutorial-data-passing-qw857	tutorial-data-passing-skw8p  tutorial-data-passing-tcvbx  v2

we can observe there that the persisted data is indeed found there, but it's not at the expected path.
The logs from kfp-ui suggest they should be at mlpipeline/artifacts/tutorial-data-passing-tcvbx, meanwhile the data is located at mlpipeline/tutorial-data-passing-tcvbx. Additionally, the file structure is different under tutorial-data-passing-tcvbx.

Looking at the upstream changes from kfp 2.2 to 2.3, they have added a new env in the frontend's config.ts: ARGO_KEYFORMAT. It is set to 'artifacts/{{workflow.name}}/{{workflow.creationTimestamp.Y}}/{{workflow.creationTimestamp.m}}/{{workflow.creationTimestamp.d}}/{{pod.name}}', which is the new structure we see in the minio bucket.

This env ARGO_KEYFORMAT tells the kfp frontend what is the format that artifacts are stored with, the ui then formats the request to minio based on this format.

Seen also in upstream, in this comment it is mentioned that the value for ARGO_KEYFORMAT env must match the value of keyFormat specified in the Argo workflow-controller-configmap ConfigMap. We can see it was modified in the pipelines manifests for argo to match the env.

In our CKF, this ConfigMap is created by argo-controller charm using this template. It does not set the keyFormat field, which causes it to be set to the default. The default of the keyFormat field is documented in the upstream argo-workflows repo to be:

{{workflow.name}}/{{pod.name}}

so this is the format that is used by our argo-controller charm to organize the pipeline logs in the S3 storage.

And due to the change upstream, the kfp-ui 2.3 is now configured to fetch the logs with the new format, causing a mismatch with the argo-controller configuration in CKF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants