Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support model metrics for KServe (RHOAIENG-6560) (RHOAIENG-6561) #2913

Merged

Conversation

alexcreasy
Copy link
Contributor

@alexcreasy alexcreasy commented Jun 13, 2024

Closes: RHOAIENG-6560 RHOAIENG-6561

Description

Implements metrics graphs for supported Kserve runtimes. Implements empty state for when a specific serving runtime is not supported.

Graphs and queries are now defined by a config map created by the KServe operator.

odh-dashboard-opendatahub apps rosa vedant-metrics chbf p3 openshiftapps com_projects_models_metrics_model_mnist_performance (1)

odh-dashboard-opendatahub apps rosa vedant-metrics chbf p3 openshiftapps com_projects_models_metrics_model_mnist_performance (2)

odh-dashboard-opendatahub apps rosa vedant-metrics chbf p3 openshiftapps com_projects_models_metrics_model_mnist_performance (3)

odh-dashboard-opendatahub apps rosa vedant-metrics chbf p3 openshiftapps com_projects_models_metrics_model_mnist_performance

NOTE: There are some outstanding bugs in the backend:

  • The definition provided for the requests chart is invalid, right now we can only verify the dashboard handles this gracefully
  • Graph titles are defined on the backend, these need updating to fix case / match modelmesh charts.
  • There is an error in the Memory Usage query that returns an absolute value, instead of a percentage. As such you won't see any data on that chart.

How Has This Been Tested?

Environment setup:

  • Install latest ODH/RHOAI
  • Use the following DSC
  • Add the following SR for single model serving
  • Use the serving runtime to create a mnist model. Can use the rhods-public s3 bucket with your access+secret key
    model path is : kserve-samples/mnist/
    (can keep the require service account box unchecked for easier curls..)
    To perform requests against the model
    curl --insecure {isvc_url}/v2/models/mnist/infer -d @./input-onnx.json
    where,
    isvc_url=$(oc get isvc -n <your_project> mnist -o jsonpath='{.status.url}')
    and
    you can find input-onnx.json here

Instructions:

  • Navigate to kserve model mnist
  • Hit the model with some data (spam a number of requests as directed above)
  • Validate charts function correctly (use the timeframe / refresh interval controls.
  • Navigate to model sklearn-v2-iris metrics
  • Validate the unsupported runtime state is visible.

Test Impact

  • Cypress tests added
  • Unit tests of utility functions
  • Tests of new hooks to follow after.

Request review criteria:

Self checklist (all need to be checked):

  • The developer has manually tested the changes and verified that the changes work
  • Commits have been squashed into descriptive, self-contained units of work (e.g. 'WIP' and 'Implements feedback' style messages have been removed)
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has added tests or explained why testing cannot be added (unit or cypress tests for related changes)

If you have UI changes:

  • Included any necessary screenshots or gifs if it was a UI change.
  • Included tags to the UX team if it was a UI/UX change (find relevant UX in the SMEs section).

After the PR is posted & before it merges:

  • The developer has tested their solution on a cluster by using the image produced by the PR to main

@alexcreasy alexcreasy added the do-not-merge/hold This PR is hold for some reason label Jun 13, 2024
@openshift-ci openshift-ci bot requested review from DaoDaoNoCode and lucferbux June 13, 2024 17:59
@alexcreasy
Copy link
Contributor Author

/retest-required

@alexcreasy alexcreasy removed the do-not-merge/hold This PR is hold for some reason label Jun 13, 2024
Copy link

codecov bot commented Jun 13, 2024

Codecov Report

Attention: Patch coverage is 92.59259% with 14 lines in your changes missing coverage. Please review.

Project coverage is 78.49%. Comparing base (78e21a0) to head (b0dde63).
Report is 2 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2913      +/-   ##
==========================================
+ Coverage   78.40%   78.49%   +0.09%     
==========================================
  Files        1127     1139      +12     
  Lines       23988    24170     +182     
  Branches     6066     6088      +22     
==========================================
+ Hits        18807    18973     +166     
- Misses       5181     5197      +16     
Files Coverage Δ
frontend/src/api/k8s/configMaps.ts 87.50% <100.00%> (+0.83%) ⬆️
frontend/src/api/prometheus/const.ts 100.00% <100.00%> (ø)
...nd/src/api/prometheus/useQueryRangeResourceData.ts 100.00% <ø> (ø)
frontend/src/concepts/metrics/kserve/const.ts 100.00% <100.00%> (ø)
...pts/metrics/kserve/content/KserveCpuUsageGraph.tsx 100.00% <100.00%> (ø)
.../metrics/kserve/content/KserveMeanLatencyGraph.tsx 100.00% <100.00%> (ø)
.../metrics/kserve/content/KserveMemoryUsageGraph.tsx 100.00% <100.00%> (ø)
...ts/metrics/kserve/content/KserveMetricsContent.tsx 100.00% <100.00%> (ø)
...metrics/kserve/content/KserveRequestCountGraph.tsx 100.00% <100.00%> (ø)
frontend/src/concepts/metrics/kserve/utils.ts 100.00% <100.00%> (ø)
... and 11 more

... and 7 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 78e21a0...b0dde63. Read the comment docs.

Comment on lines 7 to 11
export const isKserveMetricsConfigMapKind = (
configMapKind: ConfigMapKind,
): configMapKind is KserveMetricsConfigMapKind =>
(configMapKind as KserveMetricsConfigMapKind).data.supported === 'true' ||
(configMapKind as KserveMetricsConfigMapKind).data.supported === 'false';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure exactly what kind of values will be passed into this, but data is possibly undefined for a config map and there's no defensive checks here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the spot, this function is unfinished - I'll improve this and add a test for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can try to do this without using the keyword as that would be good as we're attempting to remove the usage of casting. You can do checks like 'data' in configMapKind

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh didn't see the followup, I've removed the as - is this up to standard now?

frontend/src/api/prometheus/kservePerformanceMetrics.ts Outdated Show resolved Hide resolved
frontend/src/api/prometheus/kservePerformanceMetrics.ts Outdated Show resolved Hide resolved
@alexcreasy alexcreasy force-pushed the RHOAIENG-6560-rebase1 branch 2 times, most recently from afcc422 to e5ba97d Compare June 14, 2024 12:07
@alexcreasy alexcreasy force-pushed the RHOAIENG-6560-rebase1 branch 3 times, most recently from 3d27c60 to ef9e30a Compare June 14, 2024 13:20
@alexcreasy alexcreasy force-pushed the RHOAIENG-6560-rebase1 branch from ef9e30a to b0dde63 Compare June 14, 2024 13:24
Copy link
Contributor

@mturley mturley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulled and tested locally. LGTM.

Copy link
Contributor

openshift-ci bot commented Jun 14, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mturley

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 6e657cd into opendatahub-io:main Jun 14, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants