Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: unify metrics ( cleanup and add missing metrics ) #2207

Merged
merged 17 commits into from
Nov 23, 2024

Conversation

adarsh0728
Copy link
Contributor

@adarsh0728 adarsh0728 commented Nov 6, 2024

Resolves #2218
The PR aims to have unified metrics.
Currently we are exposing metrics for different sources/sinks(kafka, http etc) in addition to forwarder metrics.
Cleanup of such metrics as well as addition of any missing metrics

Copy link

codecov bot commented Nov 6, 2024

Codecov Report

Attention: Patch coverage is 94.76190% with 11 lines in your changes missing coverage. Please review.

Project coverage is 64.06%. Comparing base (c5afc90) to head (c7beafa).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
pkg/udf/forward/forward.go 94.20% 4 Missing ⚠️
pkg/sinks/forward/forward.go 91.42% 3 Missing ⚠️
pkg/reduce/data_forward.go 94.28% 2 Missing ⚠️
pkg/sources/forward/data_forward.go 96.36% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2207      +/-   ##
==========================================
+ Coverage   63.91%   64.06%   +0.15%     
==========================================
  Files         338      338              
  Lines       41085    41002      -83     
==========================================
+ Hits        26259    26269      +10     
+ Misses      13756    13671      -85     
+ Partials     1070     1062       -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@adarsh0728 adarsh0728 changed the title Unify metrics: cleanup and add missing metrics unify metrics for source: cleanup and add missing metrics Nov 11, 2024
@adarsh0728 adarsh0728 changed the title unify metrics for source: cleanup and add missing metrics feat: unify metrics for source ( cleanup and add missing metrics ) Nov 12, 2024
@adarsh0728 adarsh0728 marked this pull request as ready for review November 12, 2024 17:05
@vigith
Copy link
Member

vigith commented Nov 12, 2024

How have you tested it?

@adarsh0728
Copy link
Contributor Author

adarsh0728 commented Nov 12, 2024

How have you tested it?

For metrics that I removed: Ran pipelines locally with sources(http, generator and kafka).. compared the source specific metrics with forwarder metrics - Testing doc

@adarsh0728 adarsh0728 changed the title feat: unify metrics for source ( cleanup and add missing metrics ) feat: unify metrics ( cleanup and add missing metrics ) Nov 13, 2024
Signed-off-by: adarsh0728 <[email protected]>
@adarsh0728 adarsh0728 self-assigned this Nov 15, 2024
pkg/sources/udsource/grpc_udsource.go Outdated Show resolved Hide resolved
pkg/udf/forward/forward.go Outdated Show resolved Hide resolved
@whynowy
Copy link
Member

whynowy commented Nov 19, 2024

@KeranYang @kohlisid @yhl25 - please review.

pkg/sinks/forward/forward.go Show resolved Hide resolved
pkg/sinks/kafka/metrics.go Show resolved Hide resolved
pkg/sources/forward/data_forward.go Outdated Show resolved Hide resolved
pkg/sources/kafka/metrics.go Show resolved Hide resolved
pkg/sources/udsource/metrics.go Outdated Show resolved Hide resolved
Copy link
Member

@KeranYang KeranYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably noticed that one of the e2e tests is failing. It's because the vertex processing rate calculation stopped working. This is because in numaflow, for each vertex, we use forwarder_data_read_total metric to calculate processing rate. See here. Please make sure that we don't change this name for ALL type of vertices. @adarsh0728

@KeranYang
Copy link
Member

@adarsh0728 please take a look at https://github.com/numaproj/numaflow/blob/main/pkg/daemon/server/service/rater/rater_test.go#L63 to understand how the metric is used to calculate data processing rate and see if your PR somehow breaks the contract.

@adarsh0728
Copy link
Contributor Author

You probably noticed that one of the e2e tests is failing. It's because the vertex processing rate calculation stopped working. This is because in numaflow, for each vertex, we use forwarder_data_read_total metric to calculate processing rate. See here. Please make sure that we don't change this name for ALL type of vertices. @adarsh0728

Thanks, checking

@adarsh0728
Copy link
Contributor Author

adarsh0728 commented Nov 21, 2024

@adarsh0728 please take a look at https://github.com/numaproj/numaflow/blob/main/pkg/daemon/server/service/rater/rater_test.go#L63 to understand how the metric is used to calculate data processing rate and see if your PR somehow breaks the contract.

@KeranYang @whynowy: I think I got the issue.

  1. TestUserDefinedSourceSuite test case is failing which involves testing testSimpleSource. To calculate processing rates, client.GetVertexMetrics(context.Background(), pipelineName, vertexName) is called here

  2. Here, in GetVertexMetrics function we are assuming source to have single partition (vertex name itself) and subsequently calling getRates for this partition.

  3. With PR changes, we have changed source's partition name to 'source-partition-[partitionIdx]' which is causing the issue.

What should be the best possible way out here? Should I introduce a new label for partition idx (option 1) @KeranYang
Screenshot 2024-11-21 at 6 02 39 PM
Screenshot 2024-11-21 at 6 04 16 PM

@KeranYang
Copy link
Member

@adarsh0728 thank you for diving deep into the issue. The problem is introduced because we move Kafka per-partition metrics to source forwarder. I think we can simplify it by keeping Kafka per-partition metric in Kafka source and continue using vertex name as source partition name. cc: @whynowy

pkg/sources/forward/data_forward.go Outdated Show resolved Hide resolved
pkg/sources/forward/data_forward.go Outdated Show resolved Hide resolved
@whynowy
Copy link
Member

whynowy commented Nov 21, 2024

@adarsh0728 thank you for diving deep into the issue. The problem is introduced because we move Kafka per-partition metrics to source forwarder. I think we can simplify it by keeping Kafka per-partition metric in Kafka source and continue using vertex name as source partition name. cc: @whynowy

sounds good to me.

@adarsh0728
Copy link
Contributor Author

@adarsh0728 thank you for diving deep into the issue. The problem is introduced because we move Kafka per-partition metrics to source forwarder. I think we can simplify it by keeping Kafka per-partition metric in Kafka source and continue using vertex name as source partition name. cc: @whynowy

sounds good to me.

thanks @whynowy and @KeranYang . Will do the necessary changes and update.

Copy link
Contributor

@yhl25 yhl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@KeranYang KeranYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please double check metrics.md in sync with your changes.

@adarsh0728 adarsh0728 merged commit f5a79bf into numaproj:main Nov 23, 2024
25 checks passed
@adarsh0728 adarsh0728 deleted the unify-metrics branch November 23, 2024 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unify metrics to avoid duplicate metrics
5 participants