-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarks Will Now Be Tested During CI #44486
Conversation
The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with |
It looks like BenchmarkMux_ProxyV2Signature is broken, which is a sign that this PR is doing it's job, but the test needs to be fixed before this can land. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you and any info on how much longer this workflow job will take now? I guess less than 5 minutes given the timeout, but hopefully not close to that - this is already one of the slower jobs on a PR ISTR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@doggydogworld Since we're not going to be testing performance regressions by automatically comparing this with previous runs for now, can we output the summary of the run as a PR comment or something, so we can at least potentially compare results on different PRs manually if need be?
@rosstimothy What do you think? Would that be useful?
So right now it actually does output failed jobs in the run summary: I can update the script to just put a notice out which should show up there too for all benchmarks. I think I might separate this into its own job so that it doesn't slow down the unit-testing workflow and provides more focused reporting on benchmarks specifically. It would probably also fit in with the eventual direction of this effort. For now I wanted to make sure benchmarks aren't breaking anymore. Once that's done I was imagining that a nightly build can be run for benchmarks to serve as a "baseline" that CI benchmarks are checked against and to provide some view of the performance of those benchmarks over time. |
FYI because we collect GHA job metrics now, we could probably create a dashboard to show the change in benchmark time for nightly runs (if nightly runs are added). |
It might be easier or less work to ship the results to the summary like we do for the Bloat Check. That allows you to get information without digging through the logs.
@doggydogworld I think in addition to not slowing down the test job it would be easier to find results for benchmark tests if they were in their own job instead of bundled in with unit tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in the process of adding a benchmark test for a feature that requires running as root - #45164. Can we copy our integration test workflows here and have a Benchmark and Benchmark Root workflow?
Makefile
Outdated
@@ -871,8 +871,9 @@ endif | |||
# todo: Use gotestsum when it is compatible with benchmark output. Currently will consider all benchmarks failed. | |||
.PHONY: test-go-bench | |||
test-go-bench: PACKAGES = $(shell grep --exclude-dir api --include "*_test.go" -lr testing.B . | xargs dirname | xargs go list | sort -u) | |||
test-go-bench: BENCHMARK_PATTERN = "[^(?:(?:BenchmarkRoot|BenchmarkMux_ProxyV2Signature).*)].*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to rename BenchmarkMux_ProxyV2Signature to BenchmarkRootMuxProxyV2Signature so we don't need to special case this pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I added that just to get a passing job to see if I could see the output. Unfortunately one of cache benchmarks fails as well which is a bit weird because it works on my machine. Looks like the testing code for that might be flaky.
I added a new workflow for the root tests already so it would fail those anyway. I was planning to fix that benchmark and include the fix here as well and then remove that from the pattern.
55c3691
to
9b9a7c4
Compare
# todo: Use gotestsum when it is compatible with benchmark output. Currently will consider all benchmarks failed. | ||
.PHONY: test-go-bench | ||
test-go-bench: PACKAGES = $(shell grep --exclude-dir api --include "*_test.go" -lr testing.B . | xargs dirname | xargs go list | sort -u) | ||
test-go-bench: BENCHMARK_SKIP_PATTERN = "^BenchmarkRoot|^BenchmarkGetMaxNodes$$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we skipping BenchmarkGetMaxNodes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit of a flaky benchmark and I couldn't decide how to deal with it. It adds 2 million server objects to a cache and allows for 200ms for each update. If the cache eviction and/or GC kick it at the wrong time the latency added is enough to push it past that time limit. I pushed it to 300ms and 400ms and it still sometimes fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fspmarshall is there anything we can do to make BenchmarkGetMaxNodes
less flaky without sacrificing what the test is trying to capture?
lib/multiplexer/multiplexer_test.go
Outdated
listener4, err := net.Listen("tcp", "127.0.0.1:") | ||
require.NoError(b, err) | ||
|
||
startServing := func(muxListener net.Listener, cluster string) (*Mux, *httptest.Server) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically the same as in the unit test:
teleport/lib/multiplexer/multiplexer_test.go
Lines 835 to 861 in 9f08fc3
startServing := func(muxListener net.Listener, cluster string) (*Mux, *httptest.Server) { | |
mux, err := New(Config{ | |
Listener: muxListener, | |
PROXYProtocolMode: PROXYProtocolUnspecified, | |
CertAuthorityGetter: casGetter, | |
Clock: clockwork.NewFakeClockAt(time.Now()), | |
LocalClusterName: cluster, | |
}) | |
require.NoError(t, err) | |
muxTLSListener := mux.TLS() | |
go mux.Serve() | |
backend := &httptest.Server{ | |
Listener: muxTLSListener, | |
Config: &http.Server{ | |
Handler: http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | |
fmt.Fprintf(w, r.RemoteAddr) | |
}), | |
}, | |
} | |
backend.StartTLS() | |
return mux, backend | |
} |
The only difference I can see is here we're not using TLS listener - is there a reason, can we unify, factor out and reuse this in both tests? Then both tests become more compact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah I can just move that function out. I wanted to keep any refactors to the absolute minimum but considering this is just for setup it shouldn't affect much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@doggydogworld Let's ship this and make further changes/tweaks in followups if needed.
I noticed today that BenchmarkRootExecCommand was recently broken. Feel free to exclude it in this PR and I'll remove the exclusion after I'm able to figure out how it broke already. |
61f3f81
to
b640e0a
Compare
0bee17a
to
5d2ef84
Compare
@doggydogworld can we backport this to active release branches? |
Yup will do |
* Adding flags for benchmarks in unit test CI * Upping timeout for unit tests to accomodate benchmarks * Fixing a typo * Run benchmarks once * Adding a makefile target for ci bench test * Removing a stray quotation * Compacting workflow * Benchmarks now with nicer output * Benchmarks back to their own step * With error messages * Adding pipefail to preserve exit code * Using bash * Simpler command and including e * Separating benchmarks into its own workflow * Fixing benchmarks workflow name * Ignoring benchmarks that require root * Updating to account for root and nonroot bench * Cleaning up the benchmark workflows * Bench requires test log dir to be created * Excluding all BenchmarkRoot benchmarks from running * Bumping benchGetNodes timeout duration to account for some jitter * Fixing multiplexer benchmark by mirroring unit tests * Remembered that skip exists * Skipping BenchmarkGetMaxNodes due to flake * Fixing formatting of summary * Actually fixing summary * Adding another newline * Simplifying setup for mux listener * Excluding BenchmarkRootExecCommand
* Adding flags for benchmarks in unit test CI * Upping timeout for unit tests to accomodate benchmarks * Fixing a typo * Run benchmarks once * Adding a makefile target for ci bench test * Removing a stray quotation * Compacting workflow * Benchmarks now with nicer output * Benchmarks back to their own step * With error messages * Adding pipefail to preserve exit code * Using bash * Simpler command and including e * Separating benchmarks into its own workflow * Fixing benchmarks workflow name * Ignoring benchmarks that require root * Updating to account for root and nonroot bench * Cleaning up the benchmark workflows * Bench requires test log dir to be created * Excluding all BenchmarkRoot benchmarks from running * Bumping benchGetNodes timeout duration to account for some jitter * Fixing multiplexer benchmark by mirroring unit tests * Remembered that skip exists * Skipping BenchmarkGetMaxNodes due to flake * Fixing formatting of summary * Actually fixing summary * Adding another newline * Simplifying setup for mux listener * Excluding BenchmarkRootExecCommand
* Adding flags for benchmarks in unit test CI * Upping timeout for unit tests to accomodate benchmarks * Fixing a typo * Run benchmarks once * Adding a makefile target for ci bench test * Removing a stray quotation * Compacting workflow * Benchmarks now with nicer output * Benchmarks back to their own step * With error messages * Adding pipefail to preserve exit code * Using bash * Simpler command and including e * Separating benchmarks into its own workflow * Fixing benchmarks workflow name * Ignoring benchmarks that require root * Updating to account for root and nonroot bench * Cleaning up the benchmark workflows * Bench requires test log dir to be created * Excluding all BenchmarkRoot benchmarks from running * Bumping benchGetNodes timeout duration to account for some jitter * Fixing multiplexer benchmark by mirroring unit tests * Remembered that skip exists * Skipping BenchmarkGetMaxNodes due to flake * Fixing formatting of summary * Actually fixing summary * Adding another newline * Simplifying setup for mux listener * Excluding BenchmarkRootExecCommand
* Benchmarks Will Now Be Tested During CI (#44486) * Adding flags for benchmarks in unit test CI * Upping timeout for unit tests to accomodate benchmarks * Fixing a typo * Run benchmarks once * Adding a makefile target for ci bench test * Removing a stray quotation * Compacting workflow * Benchmarks now with nicer output * Benchmarks back to their own step * With error messages * Adding pipefail to preserve exit code * Using bash * Simpler command and including e * Separating benchmarks into its own workflow * Fixing benchmarks workflow name * Ignoring benchmarks that require root * Updating to account for root and nonroot bench * Cleaning up the benchmark workflows * Bench requires test log dir to be created * Excluding all BenchmarkRoot benchmarks from running * Bumping benchGetNodes timeout duration to account for some jitter * Fixing multiplexer benchmark by mirroring unit tests * Remembered that skip exists * Skipping BenchmarkGetMaxNodes due to flake * Fixing formatting of summary * Actually fixing summary * Adding another newline * Simplifying setup for mux listener * Excluding BenchmarkRootExecCommand * Fixing bench pattern selectors to include all tests
* Benchmarks Will Now Be Tested During CI (#44486) * Adding flags for benchmarks in unit test CI * Upping timeout for unit tests to accomodate benchmarks * Fixing a typo * Run benchmarks once * Adding a makefile target for ci bench test * Removing a stray quotation * Compacting workflow * Benchmarks now with nicer output * Benchmarks back to their own step * With error messages * Adding pipefail to preserve exit code * Using bash * Simpler command and including e * Separating benchmarks into its own workflow * Fixing benchmarks workflow name * Ignoring benchmarks that require root * Updating to account for root and nonroot bench * Cleaning up the benchmark workflows * Bench requires test log dir to be created * Excluding all BenchmarkRoot benchmarks from running * Bumping benchGetNodes timeout duration to account for some jitter * Fixing multiplexer benchmark by mirroring unit tests * Remembered that skip exists * Skipping BenchmarkGetMaxNodes due to flake * Fixing formatting of summary * Actually fixing summary * Adding another newline * Simplifying setup for mux listener * Excluding BenchmarkRootExecCommand * Removing some lines that got mistakenly pulled in cherrypick * Fixing bench pattern selectors to include all tests
* Benchmarks Will Now Be Tested During CI (#44486) * Adding flags for benchmarks in unit test CI * Upping timeout for unit tests to accomodate benchmarks * Fixing a typo * Run benchmarks once * Adding a makefile target for ci bench test * Removing a stray quotation * Compacting workflow * Benchmarks now with nicer output * Benchmarks back to their own step * With error messages * Adding pipefail to preserve exit code * Using bash * Simpler command and including e * Separating benchmarks into its own workflow * Fixing benchmarks workflow name * Ignoring benchmarks that require root * Updating to account for root and nonroot bench * Cleaning up the benchmark workflows * Bench requires test log dir to be created * Excluding all BenchmarkRoot benchmarks from running * Bumping benchGetNodes timeout duration to account for some jitter * Fixing multiplexer benchmark by mirroring unit tests * Remembered that skip exists * Skipping BenchmarkGetMaxNodes due to flake * Fixing formatting of summary * Actually fixing summary * Adding another newline * Simplifying setup for mux listener * Excluding BenchmarkRootExecCommand * Fixing bench pattern selectors to include all tests
Closes #8401
Purpose
Benchmarks aren't included in CI unit tests and potentially breaking changes to them are not caught. This change will run benchmarks through CI unit tests.
This won't test performance regressions but simply ensure that benchmarks will run successfully.
Implementation
test-go-bench
that will run benchmarks only once. It had to be a separate target since the race detector is enabled for unit tests. The race detector slows down benchmarks way too much adding ~30mins. Unit tests should catch any races anyway.gotestsum
is not currently compatible with benchmarks and will always flag them as failed no matter their actual outcome. (see Successful benchmark run is marked as failed gotestyourself/gotestsum#332)gotestsum
isn't compatible the benchmarks will be run in a separate step so that it isn't noisy.grep -l testing.B
) are passed in to thego test
command.sed
will process benchmark failures and set a workflow error message which marks the failed benchmark in the summary.