Benchmarks Will Now Be Tested During CI #44486

doggydogworld · 2024-07-20T01:22:35Z

Purpose

Benchmarks aren't included in CI unit tests and potentially breaking changes to them are not caught. This change will run benchmarks through CI unit tests.

This won't test performance regressions but simply ensure that benchmarks will run successfully.

Implementation

Created a new Makefile target test-go-bench that will run benchmarks only once. It had to be a separate target since the race detector is enabled for unit tests. The race detector slows down benchmarks way too much adding ~30mins. Unit tests should catch any races anyway.
gotestsum is not currently compatible with benchmarks and will always flag them as failed no matter their actual outcome. (see Successful benchmark run is marked as failed gotestyourself/gotestsum#332)
Since gotestsum isn't compatible the benchmarks will be run in a separate step so that it isn't noisy.
To speed up the benchmarks only packages with benchmarks (grep -l testing.B) are passed in to the go test command.
sed will process benchmark failures and set a workflow error message which marks the failed benchmark in the summary.

github-actions · 2024-07-23T17:31:04Z

The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with changelog: followed by the changelog entries for the PR.

rosstimothy · 2024-07-25T13:06:15Z

It looks like BenchmarkMux_ProxyV2Signature is broken, which is a sign that this PR is doing it's job, but the test needs to be fixed before this can land.

camscale

Do you and any info on how much longer this workflow job will take now? I guess less than 5 minutes given the timeout, but hopefully not close to that - this is already one of the slower jobs on a PR ISTR.

.github/workflows/unit-tests-code.yaml

Makefile

r0mant

@doggydogworld Since we're not going to be testing performance regressions by automatically comparing this with previous runs for now, can we output the summary of the run as a PR comment or something, so we can at least potentially compare results on different PRs manually if need be?

@rosstimothy What do you think? Would that be useful?

doggydogworld · 2024-07-26T15:38:24Z

@doggydogworld Since we're not going to be testing performance regressions by automatically comparing this with previous runs for now, can we output the summary of the run as a PR comment or something, so we can at least potentially compare results on different PRs manually if need be?

@rosstimothy What do you think? Would that be useful?

So right now it actually does output failed jobs in the run summary:

I can update the script to just put a notice out which should show up there too for all benchmarks.

I think I might separate this into its own job so that it doesn't slow down the unit-testing workflow and provides more focused reporting on benchmarks specifically. It would probably also fit in with the eventual direction of this effort. For now I wanted to make sure benchmarks aren't breaking anymore. Once that's done I was imagining that a nightly build can be run for benchmarks to serve as a "baseline" that CI benchmarks are checked against and to provide some view of the performance of those benchmarks over time.

fheinecke · 2024-07-26T18:01:03Z

FYI because we collect GHA job metrics now, we could probably create a dashboard to show the change in benchmark time for nightly runs (if nightly runs are added).

rosstimothy · 2024-07-29T15:51:48Z

@rosstimothy What do you think? Would that be useful?

It might be easier or less work to ship the results to the summary like we do for the Bloat Check. That allows you to get information without digging through the logs.

I think I might separate this into its own job

@doggydogworld I think in addition to not slowing down the test job it would be easier to find results for benchmark tests if they were in their own job instead of bundled in with unit tests.

rosstimothy

I'm in the process of adding a benchmark test for a feature that requires running as root - #45164. Can we copy our integration test workflows here and have a Benchmark and Benchmark Root workflow?

rosstimothy · 2024-08-07T19:15:13Z

Makefile

@@ -871,8 +871,9 @@ endif
 # todo: Use gotestsum when it is compatible with benchmark output. Currently will consider all benchmarks failed.
 .PHONY: test-go-bench
 test-go-bench: PACKAGES = $(shell grep --exclude-dir api --include "*_test.go" -lr testing.B .  | xargs dirname | xargs go list | sort -u)
+test-go-bench: BENCHMARK_PATTERN = "[^(?:(?:BenchmarkRoot|BenchmarkMux_ProxyV2Signature).*)].*"


Feel free to rename BenchmarkMux_ProxyV2Signature to BenchmarkRootMuxProxyV2Signature so we don't need to special case this pattern.

Ah I added that just to get a passing job to see if I could see the output. Unfortunately one of cache benchmarks fails as well which is a bit weird because it works on my machine. Looks like the testing code for that might be flaky.

I added a new workflow for the root tests already so it would fail those anyway. I was planning to fix that benchmark and include the fix here as well and then remove that from the pattern.

r0mant · 2024-08-13T23:56:24Z

Makefile

+# todo: Use gotestsum when it is compatible with benchmark output. Currently will consider all benchmarks failed.
+.PHONY: test-go-bench
+test-go-bench: PACKAGES = $(shell grep --exclude-dir api --include "*_test.go" -lr testing.B .  | xargs dirname | xargs go list | sort -u)
+test-go-bench: BENCHMARK_SKIP_PATTERN = "^BenchmarkRoot|^BenchmarkGetMaxNodes$$"


Why are we skipping BenchmarkGetMaxNodes?

It's a bit of a flaky benchmark and I couldn't decide how to deal with it. It adds 2 million server objects to a cache and allows for 200ms for each update. If the cache eviction and/or GC kick it at the wrong time the latency added is enough to push it past that time limit. I pushed it to 300ms and 400ms and it still sometimes fails.

@fspmarshall is there anything we can do to make BenchmarkGetMaxNodes less flaky without sacrificing what the test is trying to capture?

r0mant · 2024-08-14T00:10:23Z

lib/multiplexer/multiplexer_test.go

+	listener4, err := net.Listen("tcp", "127.0.0.1:")
+	require.NoError(b, err)
+
+	startServing := func(muxListener net.Listener, cluster string) (*Mux, *httptest.Server) {


This is basically the same as in the unit test:

teleport/lib/multiplexer/multiplexer_test.go

Lines 835 to 861 in 9f08fc3

startServing := func(muxListener net.Listener, cluster string) (*Mux, *httptest.Server) {

mux, err := New(Config{

Listener: muxListener,

PROXYProtocolMode: PROXYProtocolUnspecified,

CertAuthorityGetter: casGetter,

Clock: clockwork.NewFakeClockAt(time.Now()),

LocalClusterName: cluster,

})

require.NoError(t, err)

muxTLSListener := mux.TLS()

go mux.Serve()

backend := &httptest.Server{

Listener: muxTLSListener,

Config: &http.Server{

Handler: http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {

fmt.Fprintf(w, r.RemoteAddr)

}),

},

}

backend.StartTLS()

return mux, backend

}

The only difference I can see is here we're not using TLS listener - is there a reason, can we unify, factor out and reuse this in both tests? Then both tests become more compact.

Ah yeah I can just move that function out. I wanted to keep any refactors to the absolute minimum but considering this is just for setup it shouldn't affect much.

r0mant

@doggydogworld Let's ship this and make further changes/tweaks in followups if needed.

rosstimothy · 2024-08-14T22:20:20Z

I noticed today that BenchmarkRootExecCommand was recently broken. Feel free to exclude it in this PR and I'll remove the exclusion after I'm able to figure out how it broke already.

rosstimothy · 2024-08-26T17:12:16Z

@doggydogworld can we backport this to active release branches?

doggydogworld · 2024-08-26T18:37:14Z

@doggydogworld can we backport this to active release branches?

Yup will do

* Adding flags for benchmarks in unit test CI * Upping timeout for unit tests to accomodate benchmarks * Fixing a typo * Run benchmarks once * Adding a makefile target for ci bench test * Removing a stray quotation * Compacting workflow * Benchmarks now with nicer output * Benchmarks back to their own step * With error messages * Adding pipefail to preserve exit code * Using bash * Simpler command and including e * Separating benchmarks into its own workflow * Fixing benchmarks workflow name * Ignoring benchmarks that require root * Updating to account for root and nonroot bench * Cleaning up the benchmark workflows * Bench requires test log dir to be created * Excluding all BenchmarkRoot benchmarks from running * Bumping benchGetNodes timeout duration to account for some jitter * Fixing multiplexer benchmark by mirroring unit tests * Remembered that skip exists * Skipping BenchmarkGetMaxNodes due to flake * Fixing formatting of summary * Actually fixing summary * Adding another newline * Simplifying setup for mux listener * Excluding BenchmarkRootExecCommand

* Benchmarks Will Now Be Tested During CI (#44486) * Adding flags for benchmarks in unit test CI * Upping timeout for unit tests to accomodate benchmarks * Fixing a typo * Run benchmarks once * Adding a makefile target for ci bench test * Removing a stray quotation * Compacting workflow * Benchmarks now with nicer output * Benchmarks back to their own step * With error messages * Adding pipefail to preserve exit code * Using bash * Simpler command and including e * Separating benchmarks into its own workflow * Fixing benchmarks workflow name * Ignoring benchmarks that require root * Updating to account for root and nonroot bench * Cleaning up the benchmark workflows * Bench requires test log dir to be created * Excluding all BenchmarkRoot benchmarks from running * Bumping benchGetNodes timeout duration to account for some jitter * Fixing multiplexer benchmark by mirroring unit tests * Remembered that skip exists * Skipping BenchmarkGetMaxNodes due to flake * Fixing formatting of summary * Actually fixing summary * Adding another newline * Simplifying setup for mux listener * Excluding BenchmarkRootExecCommand * Fixing bench pattern selectors to include all tests

* Benchmarks Will Now Be Tested During CI (#44486) * Adding flags for benchmarks in unit test CI * Upping timeout for unit tests to accomodate benchmarks * Fixing a typo * Run benchmarks once * Adding a makefile target for ci bench test * Removing a stray quotation * Compacting workflow * Benchmarks now with nicer output * Benchmarks back to their own step * With error messages * Adding pipefail to preserve exit code * Using bash * Simpler command and including e * Separating benchmarks into its own workflow * Fixing benchmarks workflow name * Ignoring benchmarks that require root * Updating to account for root and nonroot bench * Cleaning up the benchmark workflows * Bench requires test log dir to be created * Excluding all BenchmarkRoot benchmarks from running * Bumping benchGetNodes timeout duration to account for some jitter * Fixing multiplexer benchmark by mirroring unit tests * Remembered that skip exists * Skipping BenchmarkGetMaxNodes due to flake * Fixing formatting of summary * Actually fixing summary * Adding another newline * Simplifying setup for mux listener * Excluding BenchmarkRootExecCommand * Removing some lines that got mistakenly pulled in cherrypick * Fixing bench pattern selectors to include all tests

* Benchmarks Will Now Be Tested During CI (#44486) * Adding flags for benchmarks in unit test CI * Upping timeout for unit tests to accomodate benchmarks * Fixing a typo * Run benchmarks once * Adding a makefile target for ci bench test * Removing a stray quotation * Compacting workflow * Benchmarks now with nicer output * Benchmarks back to their own step * With error messages * Adding pipefail to preserve exit code * Using bash * Simpler command and including e * Separating benchmarks into its own workflow * Fixing benchmarks workflow name * Ignoring benchmarks that require root * Updating to account for root and nonroot bench * Cleaning up the benchmark workflows * Bench requires test log dir to be created * Excluding all BenchmarkRoot benchmarks from running * Bumping benchGetNodes timeout duration to account for some jitter * Fixing multiplexer benchmark by mirroring unit tests * Remembered that skip exists * Skipping BenchmarkGetMaxNodes due to flake * Fixing formatting of summary * Actually fixing summary * Adding another newline * Simplifying setup for mux listener * Excluding BenchmarkRootExecCommand * Fixing bench pattern selectors to include all tests

doggydogworld marked this pull request as ready for review July 23, 2024 17:30

doggydogworld requested review from klizhentas, russjones, r0mant, zmb3, fheinecke, camscale, tcsc, rosstimothy and codingllama as code owners July 23, 2024 17:30

github-actions bot added the size/sm label Jul 23, 2024

doggydogworld changed the title ~~Adding flags for benchmarks in unit test CI~~ Benchmarks Will Now Be Tested During Unit Tests CI Jul 23, 2024

doggydogworld added the no-changelog Indicates that a PR does not require a changelog entry label Jul 23, 2024

camscale reviewed Jul 26, 2024

View reviewed changes

.github/workflows/unit-tests-code.yaml Outdated Show resolved Hide resolved

Makefile Outdated Show resolved Hide resolved

r0mant reviewed Jul 26, 2024

View reviewed changes

rosstimothy reviewed Aug 6, 2024

View reviewed changes

rosstimothy reviewed Aug 7, 2024

View reviewed changes

doggydogworld force-pushed the gus/benchmarks-will-run branch from 55c3691 to 9b9a7c4 Compare August 7, 2024 19:56

r0mant reviewed Aug 14, 2024

View reviewed changes

doggydogworld changed the title ~~Benchmarks Will Now Be Tested During Unit Tests CI~~ Benchmarks Will Now Be Tested During CI Aug 14, 2024

r0mant approved these changes Aug 14, 2024

View reviewed changes

rosstimothy approved these changes Aug 14, 2024

View reviewed changes

doggydogworld force-pushed the gus/benchmarks-will-run branch from 61f3f81 to b640e0a Compare August 15, 2024 00:01

Adding flags for benchmarks in unit test CI

f0680dc

doggydogworld added 9 commits August 15, 2024 10:32

Bumping benchGetNodes timeout duration to account for some jitter

6375192

Fixing multiplexer benchmark by mirroring unit tests

04dc1f2

Remembered that skip exists

a64b98f

Skipping BenchmarkGetMaxNodes due to flake

9971366

Fixing formatting of summary

d266e47

Actually fixing summary

e0fd12d

Adding another newline

35a4d8e

Simplifying setup for mux listener

c89fd3c

Excluding BenchmarkRootExecCommand

5d2ef84

doggydogworld force-pushed the gus/benchmarks-will-run branch from 0bee17a to 5d2ef84 Compare August 15, 2024 15:33

doggydogworld added this pull request to the merge queue Aug 15, 2024

Merged via the queue into master with commit c1c08a5 Aug 15, 2024
40 checks passed

doggydogworld deleted the gus/benchmarks-will-run branch August 15, 2024 16:12

fspmarshall mentioned this pull request Aug 15, 2024

improve max nodes benchmark #45545

Merged

doggydogworld added backport backport/branch/v14 backport/branch/v15 backport/branch/v16 labels Aug 26, 2024

doggydogworld mentioned this pull request Aug 26, 2024

[v14] Benchmarks Will Now Be Tested During CI #45866

Merged

doggydogworld mentioned this pull request Aug 26, 2024

[v15] Benchmarks Will Now Be Tested During CI #45867

Merged

doggydogworld mentioned this pull request Aug 26, 2024

[v16] Benchmarks Will Now Be Tested During CI #45870

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks Will Now Be Tested During CI #44486

Benchmarks Will Now Be Tested During CI #44486

doggydogworld commented Jul 20, 2024 •

edited

Loading

github-actions bot commented Jul 23, 2024

rosstimothy commented Jul 25, 2024

camscale left a comment

r0mant left a comment •

edited

Loading

doggydogworld commented Jul 26, 2024

fheinecke commented Jul 26, 2024

rosstimothy commented Jul 29, 2024

rosstimothy left a comment

rosstimothy Aug 7, 2024

doggydogworld Aug 8, 2024

r0mant Aug 13, 2024

doggydogworld Aug 14, 2024

rosstimothy Aug 14, 2024

r0mant Aug 14, 2024

doggydogworld Aug 14, 2024

r0mant left a comment

rosstimothy commented Aug 14, 2024

rosstimothy commented Aug 26, 2024

doggydogworld commented Aug 26, 2024

	startServing := func(muxListener net.Listener, cluster string) (Mux, httptest.Server) {
	mux, err := New(Config{
	Listener: muxListener,
	PROXYProtocolMode: PROXYProtocolUnspecified,
	CertAuthorityGetter: casGetter,
	Clock: clockwork.NewFakeClockAt(time.Now()),
	LocalClusterName: cluster,
	})
	require.NoError(t, err)

	muxTLSListener := mux.TLS()

	go mux.Serve()

	backend := &httptest.Server{
	Listener: muxTLSListener,

	Config: &http.Server{
	Handler: http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
	fmt.Fprintf(w, r.RemoteAddr)
	}),
	},
	}
	backend.StartTLS()

	return mux, backend
	}

Benchmarks Will Now Be Tested During CI #44486

Benchmarks Will Now Be Tested During CI #44486

Conversation

doggydogworld commented Jul 20, 2024 • edited Loading

Purpose

Implementation

github-actions bot commented Jul 23, 2024

rosstimothy commented Jul 25, 2024

camscale left a comment

Choose a reason for hiding this comment

r0mant left a comment • edited Loading

Choose a reason for hiding this comment

doggydogworld commented Jul 26, 2024

fheinecke commented Jul 26, 2024

rosstimothy commented Jul 29, 2024

rosstimothy left a comment

Choose a reason for hiding this comment

rosstimothy Aug 7, 2024

Choose a reason for hiding this comment

doggydogworld Aug 8, 2024

Choose a reason for hiding this comment

r0mant Aug 13, 2024

Choose a reason for hiding this comment

doggydogworld Aug 14, 2024

Choose a reason for hiding this comment

rosstimothy Aug 14, 2024

Choose a reason for hiding this comment

r0mant Aug 14, 2024

Choose a reason for hiding this comment

doggydogworld Aug 14, 2024

Choose a reason for hiding this comment

r0mant left a comment

Choose a reason for hiding this comment

rosstimothy commented Aug 14, 2024

rosstimothy commented Aug 26, 2024

doggydogworld commented Aug 26, 2024

doggydogworld commented Jul 20, 2024 •

edited

Loading

r0mant left a comment •

edited

Loading