Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks Will Now Be Tested During CI #44486

Merged
merged 29 commits into from
Aug 15, 2024
Merged

Conversation

doggydogworld
Copy link
Contributor

@doggydogworld doggydogworld commented Jul 20, 2024

Closes #8401

Purpose

Benchmarks aren't included in CI unit tests and potentially breaking changes to them are not caught. This change will run benchmarks through CI unit tests.

This won't test performance regressions but simply ensure that benchmarks will run successfully.

Implementation

  • Created a new Makefile target test-go-bench that will run benchmarks only once. It had to be a separate target since the race detector is enabled for unit tests. The race detector slows down benchmarks way too much adding ~30mins. Unit tests should catch any races anyway.
  • gotestsum is not currently compatible with benchmarks and will always flag them as failed no matter their actual outcome. (see Successful benchmark run is marked as failed gotestyourself/gotestsum#332)
  • Since gotestsum isn't compatible the benchmarks will be run in a separate step so that it isn't noisy.
  • To speed up the benchmarks only packages with benchmarks (grep -l testing.B) are passed in to the go test command.
  • sed will process benchmark failures and set a workflow error message which marks the failed benchmark in the summary.

Copy link

The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with changelog: followed by the changelog entries for the PR.

@doggydogworld doggydogworld changed the title Adding flags for benchmarks in unit test CI Benchmarks Will Now Be Tested During Unit Tests CI Jul 23, 2024
@doggydogworld doggydogworld added the no-changelog Indicates that a PR does not require a changelog entry label Jul 23, 2024
@rosstimothy
Copy link
Contributor

It looks like BenchmarkMux_ProxyV2Signature is broken, which is a sign that this PR is doing it's job, but the test needs to be fixed before this can land.

Copy link
Contributor

@camscale camscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you and any info on how much longer this workflow job will take now? I guess less than 5 minutes given the timeout, but hopefully not close to that - this is already one of the slower jobs on a PR ISTR.

.github/workflows/unit-tests-code.yaml Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Copy link
Collaborator

@r0mant r0mant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@doggydogworld Since we're not going to be testing performance regressions by automatically comparing this with previous runs for now, can we output the summary of the run as a PR comment or something, so we can at least potentially compare results on different PRs manually if need be?

@rosstimothy What do you think? Would that be useful?

@doggydogworld
Copy link
Contributor Author

@doggydogworld Since we're not going to be testing performance regressions by automatically comparing this with previous runs for now, can we output the summary of the run as a PR comment or something, so we can at least potentially compare results on different PRs manually if need be?

@rosstimothy What do you think? Would that be useful?

So right now it actually does output failed jobs in the run summary:
image

I can update the script to just put a notice out which should show up there too for all benchmarks.

I think I might separate this into its own job so that it doesn't slow down the unit-testing workflow and provides more focused reporting on benchmarks specifically. It would probably also fit in with the eventual direction of this effort. For now I wanted to make sure benchmarks aren't breaking anymore. Once that's done I was imagining that a nightly build can be run for benchmarks to serve as a "baseline" that CI benchmarks are checked against and to provide some view of the performance of those benchmarks over time.

@fheinecke
Copy link
Contributor

FYI because we collect GHA job metrics now, we could probably create a dashboard to show the change in benchmark time for nightly runs (if nightly runs are added).

@rosstimothy
Copy link
Contributor

@rosstimothy What do you think? Would that be useful?

It might be easier or less work to ship the results to the summary like we do for the Bloat Check. That allows you to get information without digging through the logs.

I think I might separate this into its own job

@doggydogworld I think in addition to not slowing down the test job it would be easier to find results for benchmark tests if they were in their own job instead of bundled in with unit tests.

Copy link
Contributor

@rosstimothy rosstimothy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in the process of adding a benchmark test for a feature that requires running as root - #45164. Can we copy our integration test workflows here and have a Benchmark and Benchmark Root workflow?

Makefile Outdated
@@ -871,8 +871,9 @@ endif
# todo: Use gotestsum when it is compatible with benchmark output. Currently will consider all benchmarks failed.
.PHONY: test-go-bench
test-go-bench: PACKAGES = $(shell grep --exclude-dir api --include "*_test.go" -lr testing.B . | xargs dirname | xargs go list | sort -u)
test-go-bench: BENCHMARK_PATTERN = "[^(?:(?:BenchmarkRoot|BenchmarkMux_ProxyV2Signature).*)].*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to rename BenchmarkMux_ProxyV2Signature to BenchmarkRootMuxProxyV2Signature so we don't need to special case this pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I added that just to get a passing job to see if I could see the output. Unfortunately one of cache benchmarks fails as well which is a bit weird because it works on my machine. Looks like the testing code for that might be flaky.

I added a new workflow for the root tests already so it would fail those anyway. I was planning to fix that benchmark and include the fix here as well and then remove that from the pattern.

@doggydogworld doggydogworld force-pushed the gus/benchmarks-will-run branch from 55c3691 to 9b9a7c4 Compare August 7, 2024 19:56
# todo: Use gotestsum when it is compatible with benchmark output. Currently will consider all benchmarks failed.
.PHONY: test-go-bench
test-go-bench: PACKAGES = $(shell grep --exclude-dir api --include "*_test.go" -lr testing.B . | xargs dirname | xargs go list | sort -u)
test-go-bench: BENCHMARK_SKIP_PATTERN = "^BenchmarkRoot|^BenchmarkGetMaxNodes$$"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we skipping BenchmarkGetMaxNodes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit of a flaky benchmark and I couldn't decide how to deal with it. It adds 2 million server objects to a cache and allows for 200ms for each update. If the cache eviction and/or GC kick it at the wrong time the latency added is enough to push it past that time limit. I pushed it to 300ms and 400ms and it still sometimes fails.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fspmarshall is there anything we can do to make BenchmarkGetMaxNodes less flaky without sacrificing what the test is trying to capture?

listener4, err := net.Listen("tcp", "127.0.0.1:")
require.NoError(b, err)

startServing := func(muxListener net.Listener, cluster string) (*Mux, *httptest.Server) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically the same as in the unit test:

startServing := func(muxListener net.Listener, cluster string) (*Mux, *httptest.Server) {
mux, err := New(Config{
Listener: muxListener,
PROXYProtocolMode: PROXYProtocolUnspecified,
CertAuthorityGetter: casGetter,
Clock: clockwork.NewFakeClockAt(time.Now()),
LocalClusterName: cluster,
})
require.NoError(t, err)
muxTLSListener := mux.TLS()
go mux.Serve()
backend := &httptest.Server{
Listener: muxTLSListener,
Config: &http.Server{
Handler: http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, r.RemoteAddr)
}),
},
}
backend.StartTLS()
return mux, backend
}

The only difference I can see is here we're not using TLS listener - is there a reason, can we unify, factor out and reuse this in both tests? Then both tests become more compact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah I can just move that function out. I wanted to keep any refactors to the absolute minimum but considering this is just for setup it shouldn't affect much.

@doggydogworld doggydogworld changed the title Benchmarks Will Now Be Tested During Unit Tests CI Benchmarks Will Now Be Tested During CI Aug 14, 2024
Copy link
Collaborator

@r0mant r0mant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@doggydogworld Let's ship this and make further changes/tweaks in followups if needed.

@rosstimothy
Copy link
Contributor

I noticed today that BenchmarkRootExecCommand was recently broken. Feel free to exclude it in this PR and I'll remove the exclusion after I'm able to figure out how it broke already.

@doggydogworld doggydogworld force-pushed the gus/benchmarks-will-run branch from 61f3f81 to b640e0a Compare August 15, 2024 00:01
@doggydogworld doggydogworld force-pushed the gus/benchmarks-will-run branch from 0bee17a to 5d2ef84 Compare August 15, 2024 15:33
@doggydogworld doggydogworld added this pull request to the merge queue Aug 15, 2024
Merged via the queue into master with commit c1c08a5 Aug 15, 2024
40 checks passed
@doggydogworld doggydogworld deleted the gus/benchmarks-will-run branch August 15, 2024 16:12
@rosstimothy
Copy link
Contributor

@doggydogworld can we backport this to active release branches?

@doggydogworld
Copy link
Contributor Author

@doggydogworld can we backport this to active release branches?

Yup will do

doggydogworld added a commit that referenced this pull request Aug 26, 2024
* Adding flags for benchmarks in unit test CI

* Upping timeout for unit tests to accomodate benchmarks

* Fixing a typo

* Run benchmarks once

* Adding a makefile target for ci bench test

* Removing a stray quotation

* Compacting workflow

* Benchmarks now with nicer output

* Benchmarks back to their own step

* With error messages

* Adding pipefail to preserve exit code

* Using bash

* Simpler command and including e

* Separating benchmarks into its own workflow

* Fixing benchmarks workflow name

* Ignoring benchmarks that require root

* Updating to account for root and nonroot bench

* Cleaning up the benchmark workflows

* Bench requires test log dir to be created

* Excluding all BenchmarkRoot benchmarks from running

* Bumping benchGetNodes timeout duration to account for some jitter

* Fixing multiplexer benchmark by mirroring unit tests

* Remembered that skip exists

* Skipping BenchmarkGetMaxNodes due to flake

* Fixing formatting of summary

* Actually fixing summary

* Adding another newline

* Simplifying setup for mux listener

* Excluding BenchmarkRootExecCommand
doggydogworld added a commit that referenced this pull request Aug 26, 2024
* Adding flags for benchmarks in unit test CI

* Upping timeout for unit tests to accomodate benchmarks

* Fixing a typo

* Run benchmarks once

* Adding a makefile target for ci bench test

* Removing a stray quotation

* Compacting workflow

* Benchmarks now with nicer output

* Benchmarks back to their own step

* With error messages

* Adding pipefail to preserve exit code

* Using bash

* Simpler command and including e

* Separating benchmarks into its own workflow

* Fixing benchmarks workflow name

* Ignoring benchmarks that require root

* Updating to account for root and nonroot bench

* Cleaning up the benchmark workflows

* Bench requires test log dir to be created

* Excluding all BenchmarkRoot benchmarks from running

* Bumping benchGetNodes timeout duration to account for some jitter

* Fixing multiplexer benchmark by mirroring unit tests

* Remembered that skip exists

* Skipping BenchmarkGetMaxNodes due to flake

* Fixing formatting of summary

* Actually fixing summary

* Adding another newline

* Simplifying setup for mux listener

* Excluding BenchmarkRootExecCommand
doggydogworld added a commit that referenced this pull request Aug 26, 2024
* Adding flags for benchmarks in unit test CI

* Upping timeout for unit tests to accomodate benchmarks

* Fixing a typo

* Run benchmarks once

* Adding a makefile target for ci bench test

* Removing a stray quotation

* Compacting workflow

* Benchmarks now with nicer output

* Benchmarks back to their own step

* With error messages

* Adding pipefail to preserve exit code

* Using bash

* Simpler command and including e

* Separating benchmarks into its own workflow

* Fixing benchmarks workflow name

* Ignoring benchmarks that require root

* Updating to account for root and nonroot bench

* Cleaning up the benchmark workflows

* Bench requires test log dir to be created

* Excluding all BenchmarkRoot benchmarks from running

* Bumping benchGetNodes timeout duration to account for some jitter

* Fixing multiplexer benchmark by mirroring unit tests

* Remembered that skip exists

* Skipping BenchmarkGetMaxNodes due to flake

* Fixing formatting of summary

* Actually fixing summary

* Adding another newline

* Simplifying setup for mux listener

* Excluding BenchmarkRootExecCommand
github-merge-queue bot pushed a commit that referenced this pull request Aug 27, 2024
* Benchmarks Will Now Be Tested During CI (#44486)

* Adding flags for benchmarks in unit test CI

* Upping timeout for unit tests to accomodate benchmarks

* Fixing a typo

* Run benchmarks once

* Adding a makefile target for ci bench test

* Removing a stray quotation

* Compacting workflow

* Benchmarks now with nicer output

* Benchmarks back to their own step

* With error messages

* Adding pipefail to preserve exit code

* Using bash

* Simpler command and including e

* Separating benchmarks into its own workflow

* Fixing benchmarks workflow name

* Ignoring benchmarks that require root

* Updating to account for root and nonroot bench

* Cleaning up the benchmark workflows

* Bench requires test log dir to be created

* Excluding all BenchmarkRoot benchmarks from running

* Bumping benchGetNodes timeout duration to account for some jitter

* Fixing multiplexer benchmark by mirroring unit tests

* Remembered that skip exists

* Skipping BenchmarkGetMaxNodes due to flake

* Fixing formatting of summary

* Actually fixing summary

* Adding another newline

* Simplifying setup for mux listener

* Excluding BenchmarkRootExecCommand

* Fixing bench pattern selectors to include all tests
github-merge-queue bot pushed a commit that referenced this pull request Aug 27, 2024
* Benchmarks Will Now Be Tested During CI (#44486)

* Adding flags for benchmarks in unit test CI

* Upping timeout for unit tests to accomodate benchmarks

* Fixing a typo

* Run benchmarks once

* Adding a makefile target for ci bench test

* Removing a stray quotation

* Compacting workflow

* Benchmarks now with nicer output

* Benchmarks back to their own step

* With error messages

* Adding pipefail to preserve exit code

* Using bash

* Simpler command and including e

* Separating benchmarks into its own workflow

* Fixing benchmarks workflow name

* Ignoring benchmarks that require root

* Updating to account for root and nonroot bench

* Cleaning up the benchmark workflows

* Bench requires test log dir to be created

* Excluding all BenchmarkRoot benchmarks from running

* Bumping benchGetNodes timeout duration to account for some jitter

* Fixing multiplexer benchmark by mirroring unit tests

* Remembered that skip exists

* Skipping BenchmarkGetMaxNodes due to flake

* Fixing formatting of summary

* Actually fixing summary

* Adding another newline

* Simplifying setup for mux listener

* Excluding BenchmarkRootExecCommand

* Removing some lines that got mistakenly pulled in cherrypick

* Fixing bench pattern selectors to include all tests
github-merge-queue bot pushed a commit that referenced this pull request Aug 27, 2024
* Benchmarks Will Now Be Tested During CI (#44486)

* Adding flags for benchmarks in unit test CI

* Upping timeout for unit tests to accomodate benchmarks

* Fixing a typo

* Run benchmarks once

* Adding a makefile target for ci bench test

* Removing a stray quotation

* Compacting workflow

* Benchmarks now with nicer output

* Benchmarks back to their own step

* With error messages

* Adding pipefail to preserve exit code

* Using bash

* Simpler command and including e

* Separating benchmarks into its own workflow

* Fixing benchmarks workflow name

* Ignoring benchmarks that require root

* Updating to account for root and nonroot bench

* Cleaning up the benchmark workflows

* Bench requires test log dir to be created

* Excluding all BenchmarkRoot benchmarks from running

* Bumping benchGetNodes timeout duration to account for some jitter

* Fixing multiplexer benchmark by mirroring unit tests

* Remembered that skip exists

* Skipping BenchmarkGetMaxNodes due to flake

* Fixing formatting of summary

* Actually fixing summary

* Adding another newline

* Simplifying setup for mux listener

* Excluding BenchmarkRootExecCommand

* Fixing bench pattern selectors to include all tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exercise benchmarks in CI
5 participants