Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Fix issue with sending metrics that are None to Prometheus #6951

Merged

Conversation

GitHK
Copy link
Contributor

@GitHK GitHK commented Dec 11, 2024

What do these changes do?

Prometheus client does not accept None values when calling observe on a metric.

When stopping a service, if this one did not start, the stop metric would be set to None.
In the case where metrics are None we do not send them.

Traceback

log_level=ERROR | log_timestamp=2024-12-11 15:17:42,480 | log_source=asyncio:run(118) | log_uid=None | log_msg=Task exception was never retrieved
future: <Task finished name='simcore_service_director_v2.modules.dynamic_sidecar.scheduler._core._scheduler.observe_dy-sidecar_44c49f9a-07f6-4eb1-8bb5-aef52e3dc76c' coro=<observing_single_service() done, defined at /home/scu/.venv/lib/python3.11/site-packages/simcore_service_director_v2/modules/dynamic_sidecar/scheduler/_core/_observer.py:86> exception=TypeError("unsupported operand type(s) for +=: 'float' and 'NoneType'")>
Traceback (most recent call last):
  File "/home/scu/.venv/lib/python3.11/site-packages/simcore_service_director_v2/modules/dynamic_sidecar/scheduler/_core/_observer.py", line 134, in observing_single_service
    await attempt_pod_removal_and_data_saving(app, scheduler_data)
  File "/home/scu/.venv/lib/python3.11/site-packages/simcore_service_director_v2/modules/dynamic_sidecar/scheduler/_core/_events_utils.py", line 391, in attempt_pod_removal_and_data_saving
    ).observe(stop_duration)
      ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scu/.venv/lib/python3.11/site-packages/prometheus_client/metrics.py", line 650, in observe
    self._sum.inc(amount)
  File "/home/scu/.venv/lib/python3.11/site-packages/prometheus_client/values.py", line 20, in inc
    self._value += amount
TypeError: unsupported operand type(s) for +=: 'float' and 'NoneType'

Related issue/s

How to test

Dev-ops checklist

@GitHK GitHK added this to the Event Horizon milestone Dec 11, 2024
@GitHK GitHK self-assigned this Dec 11, 2024
@GitHK GitHK added the a:director-v2 issue related with the director-v2 service label Dec 11, 2024
@GitHK GitHK marked this pull request as ready for review December 11, 2024 15:39
Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please double check other prometheus calls in case we have the same issue elsewhere? thx

@GitHK
Copy link
Contributor Author

GitHK commented Dec 11, 2024

could you please double check other prometheus calls in case we have the same issue elsewhere? thx

Very good point. this also could happen for the start. I've also applied the same logic here

Copy link

codecov bot commented Dec 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 67.86%. Comparing base (c0df260) to head (5fc3ae0).
Report is 1 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (c0df260) and HEAD (5fc3ae0). Click for more details.

HEAD has 29 uploads less than BASE
Flag BASE (c0df260) HEAD (5fc3ae0)
unittests 30 1
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #6951       +/-   ##
===========================================
- Coverage   88.08%   67.86%   -20.22%     
===========================================
  Files        1589      632      -957     
  Lines       62243    30616    -31627     
  Branches     2012      262     -1750     
===========================================
- Hits        54825    20779    -34046     
- Misses       7082     9777     +2695     
+ Partials      336       60      -276     
Flag Coverage Δ
integrationtests 64.97% <100.00%> (-0.03%) ⬇️
unittests 65.07% <25.00%> (-21.57%) ⬇️
Components Coverage Δ
api ∅ <ø> (∅)
pkg_aws_library ∅ <ø> (∅)
pkg_dask_task_models_library ∅ <ø> (∅)
pkg_models_library ∅ <ø> (∅)
pkg_notifications_library ∅ <ø> (∅)
pkg_postgres_database ∅ <ø> (∅)
pkg_service_integration ∅ <ø> (∅)
pkg_service_library ∅ <ø> (∅)
pkg_settings_library ∅ <ø> (∅)
pkg_simcore_sdk 77.37% <ø> (-8.02%) ⬇️
agent ∅ <ø> (∅)
api_server ∅ <ø> (∅)
autoscaling ∅ <ø> (∅)
catalog ∅ <ø> (∅)
clusters_keeper ∅ <ø> (∅)
dask_sidecar ∅ <ø> (∅)
datcore_adapter ∅ <ø> (∅)
director ∅ <ø> (∅)
director_v2 91.38% <100.00%> (ø)
dynamic_scheduler ∅ <ø> (∅)
dynamic_sidecar 59.86% <ø> (-29.89%) ⬇️
efs_guardian ∅ <ø> (∅)
invitations ∅ <ø> (∅)
osparc_gateway_server ∅ <ø> (∅)
payments ∅ <ø> (∅)
resource_usage_tracker ∅ <ø> (∅)
storage ∅ <ø> (∅)
webclient ∅ <ø> (∅)
webserver 59.61% <ø> (-28.14%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c0df260...5fc3ae0. Read the comment docs.

@GitHK GitHK enabled auto-merge (squash) December 12, 2024 08:04
@GitHK GitHK merged commit f29dc89 into ITISFoundation:master Dec 12, 2024
88 of 93 checks passed
@GitHK GitHK deleted the pr-osparc-fix-metric-raising-errors branch December 12, 2024 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:director-v2 issue related with the director-v2 service
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants