fix the proxy server backend metric error #295

YRXING · 2021-11-24T11:03:11Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

the backend metric should be recorded by proxy server

Which issue(s) this PR fixes:

Fixes #294

linux-foundation-easycla · 2021-11-24T11:03:14Z

The committers are authorized under a signed CLA.

✅ star (27e897b)

k8s-ci-robot · 2021-11-24T11:03:18Z

Welcome @YRXING!

It looks like this is your first PR to kubernetes-sigs/apiserver-network-proxy 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/apiserver-network-proxy has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2021-11-24T11:03:18Z

Hi @YRXING. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mainred · 2021-11-27T15:41:33Z

pkg/server/metrics/metrics.go

@@ -164,6 +167,16 @@ func (a *ServerMetrics) SetBackendCount(count int) {
 	a.backend.WithLabelValues().Set(float64(count))
 }

+// BackendCountInc increments a new backend connection.
+func (a *ServerMetrics) BackendCountInc(manager string, idType string) {


Thanks for your contribution.
we have only one managed supported by the server once, so I guess it's not necessary to record metrics for each backend manager. the introduction of idType is valuable, but for current use cases, I'd suggest we keep the code as it is now.

multiple backend managers are supported for one proxy-sever, close this comment

anything else I need to do about this issue?

mainred · 2021-11-29T08:50:44Z

pkg/server/server.go

@@ -200,14 +200,17 @@ func (s *ProxyServer) addBackend(agentID string, conn agent.AgentService_Connect
 			for _, ipv4 := range agentIdentifiers.IPv4 {
 				klog.V(5).InfoS("Add the agent to DestHostBackendManager", "agent address", ipv4)
 				s.BackendManagers[i].AddBackend(ipv4, pkgagent.IPv4, conn)
+				metrics.Metrics.BackendCountInc("DestHostBackendManager", "ipv4")


I think it's leaving this logic in AddBackend is preferable, which makes adding backend logic more clear.
How about adding filed to backend storage to differentiate backend count for different kinds of backend managers.

Please review the code.

mainred · 2021-12-06T05:49:25Z

pkg/server/default_route_backend_manager.go

-	dibm.mu.RLock()
-	defer dibm.mu.RUnlock()
-	if len(dibm.backends) == 0 {
+func (drbm *DefaultRouteBackendManager) Backend(ctx context.Context) (Backend, error) {


mainred · 2021-12-06T05:56:04Z

pkg/server/backend_manager.go

@@ -204,7 +214,6 @@ func (s *DefaultBackendStorage) AddBackend(identifier string, idType pkgagent.Id
 		return addedBackend
 	}
 	s.backends[identifier] = []*backend{addedBackend}
-	metrics.Metrics.SetBackendCount(len(s.backends))


Thanks for your contribution, @YRXING .
We are complicating the logic of the backend count metrics. we don't have to rewrite AddBackend for each manager to collect backend count metrics. code maintenance and reusability should be taken into consideration.

How about we keep SetBackendCount here and add the manager name, idtype into it. manager name, can be defined as member of backendstore.

Does the DefaultBackendStorage need record different idTypes count metric or just record the total backend count?

Following your thought already in this PR, record different backend counts from different idTypes and backend managers. SetBackendCount will finally look like this way after other changes are also included.

Suggested change

metrics.Metrics.SetBackendCount(len(s.backends))

metrics.Metrics.SetBackendCount(s.backendType, idType, len(s.backends))

cc @cheftako and @jkh52 who will give the final say to this PR.

jkh52 · 2022-01-14T21:12:16Z

/ok-to-test

jkh52 · 2022-01-14T21:16:58Z

/lgtm

I like depending on len() only, compared with Inc() / Dec().

k8s-triage-robot · 2022-04-14T21:35:06Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

jkh52 · 2022-05-13T21:07:54Z

pkg/server/metrics/metrics.go

+		[]string{
+			"manager",
+			"idType",
+		},


@logicalhan Is a change like this backward compatible, and if not is it more appropriate to create a new metric and deprecate the old?

https://kubernetes.io/blog/2021/04/23/kubernetes-release-1.21-metrics-stability-ga/

This is definitely not backwards compatible. This isn't in k8s-proper, so the same rules of API conformance do not necessarily hold here. I would at the very least add a release note clearly denoting the API change to the metric.

Also what's the expected cardinality of manager and idType?

I think cardinality is likely not a concern. (manager is currently 3, idType is about 3 or 4).

This is definitely not backwards compatible. This isn't in k8s-proper, so the same rules of API conformance do not necessarily hold here. I would at the very least add a release note clearly denoting the API change to the metric.

This is a binary we include in some distros of K8s (i.e. its an optional binary in K8s) so we should probably consider it part of K8s.

I think cardinality is likely not a concern. (manager is currently 3, idType is about 3 or 4).

Technically the idType is part of a CLI flag on the agent (Eg https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/addons/konnectivity-agent/konnectivity-agent-ds.yaml#L43), so theoretically its not bounded.
Practically the number of managers and id types's are under the cloud provider's control.
So I would expect this number to be small.

See:
https://github.com/kubernetes-sigs/apiserver-network-proxy/blob/master/cmd/agent/app/options/options.go#L178-L186

Agreed. However that is on the agent side and the agent is at some level a reference implementation.
If we had similar code on the server side to limit the cardinality of idType, then I think we could reasonably claim it was limited.

jkh52 · 2022-05-13T21:26:47Z

/remove-lgtm

I think we should keep this metric as-is, and add a new one. That way we avoid breaking current integrations with metrics readers.

k8s-triage-robot · 2022-06-15T21:25:09Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-07-15T21:57:10Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-07-15T21:57:21Z

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cheftako · 2022-09-09T22:34:55Z

/remove-lifecycle rotten

k8s-ci-robot · 2022-09-09T22:35:00Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: YRXING
Once this PR has been reviewed and has the lgtm label, please assign andrewsykim for approval by writing /assign @andrewsykim in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2022-09-09T22:35:12Z

@YRXING: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-apiserver-network-proxy-docker-build-arm64	`1a8d66f`	link	true	`/test pull-apiserver-network-proxy-docker-build-arm64`
pull-apiserver-network-proxy-make-lint	`1a8d66f`	link	true	`/test pull-apiserver-network-proxy-make-lint`
pull-apiserver-network-proxy-test	`1a8d66f`	link	true	`/test pull-apiserver-network-proxy-test`
pull-apiserver-network-proxy-docker-build-amd64	`1a8d66f`	link	true	`/test pull-apiserver-network-proxy-docker-build-amd64`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot · 2022-12-09T10:15:37Z

@YRXING: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-triage-robot · 2023-03-09T10:59:50Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2023-04-08T11:35:10Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2023-05-08T11:54:01Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen
Mark this PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2023-05-08T11:54:07Z

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen

Mark this PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 24, 2021

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 24, 2021

k8s-ci-robot requested review from caesarxuchao and dberkov November 24, 2021 11:03

mainred reviewed Nov 27, 2021

View reviewed changes

mainred reviewed Nov 29, 2021

View reviewed changes

YRXING force-pushed the master branch from 27e897b to 3a45f41 Compare December 2, 2021 02:09

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 2, 2021

YRXING force-pushed the master branch from 3a45f41 to b2881bd Compare December 2, 2021 02:17

mainred reviewed Dec 6, 2021

View reviewed changes

YRXING force-pushed the master branch from b2881bd to 95eeccc Compare December 21, 2021 03:31

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 21, 2021

fix the proxy server backend metric error

1a8d66f

YRXING force-pushed the master branch from 95eeccc to 1a8d66f Compare December 21, 2021 03:34

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 14, 2022

k8s-ci-robot assigned jkh52 Jan 14, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 14, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 14, 2022

jkh52 reviewed May 13, 2022

View reviewed changes

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 13, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 15, 2022

k8s-ci-robot closed this Jul 15, 2022

cheftako reopened this Sep 9, 2022

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Sep 9, 2022

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 9, 2022

k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Dec 8, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 9, 2023

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 8, 2023

k8s-ci-robot closed this May 8, 2023

	metrics.Metrics.SetBackendCount(len(s.backends))
	metrics.Metrics.SetBackendCount(s.backendType, idType, len(s.backends))

fix the proxy server backend metric error #295

fix the proxy server backend metric error #295

Conversation

YRXING commented Nov 24, 2021 • edited Loading

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

linux-foundation-easycla bot commented Nov 24, 2021 • edited Loading

k8s-ci-robot commented Nov 24, 2021

k8s-ci-robot commented Nov 24, 2021

mainred Nov 27, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkh52 commented Jan 14, 2022

jkh52 commented Jan 14, 2022

k8s-triage-robot commented Apr 14, 2022

jkh52 May 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkh52 commented May 13, 2022 • edited Loading

k8s-triage-robot commented Jun 15, 2022

k8s-triage-robot commented Jul 15, 2022

k8s-ci-robot commented Jul 15, 2022

cheftako commented Sep 9, 2022

k8s-ci-robot commented Sep 9, 2022

k8s-ci-robot commented Sep 9, 2022

k8s-ci-robot commented Dec 9, 2022

k8s-triage-robot commented Mar 9, 2023

k8s-triage-robot commented Apr 8, 2023

k8s-triage-robot commented May 8, 2023

k8s-ci-robot commented May 8, 2023

YRXING commented Nov 24, 2021 •

edited

Loading

linux-foundation-easycla bot commented Nov 24, 2021 •

edited

Loading

mainred Nov 27, 2021 •

edited

Loading

jkh52 May 13, 2022 •

edited

Loading

jkh52 commented May 13, 2022 •

edited

Loading