-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
testing: an improved router benchmark that only runs one router #4444
Conversation
This still has some issues but I don't like carrying unsub code on my laptop.
Only collects br_transit metrics for now. Test runs fast so metrics window is short only 8 seconds for 5M packets. This makes the pkt/s measurement flaky. Will have to try and increase the window size.
Mainly: * Get rid of the "pause" container. Use the prometheus container for the same purpose. * Add multi paralle streams and core-count support. * Slight cleanup of the addressing/numbering scheme and the begining of using it to simplify test cases.
brload no-longer makes sense as a stand-alone tool; it is tightly bound to the associated test topology, so move it in with router_newbenchmark which is in charge of bringing up that topology. That way, at least, all the inter-dependent pieces are in the same location.
Rely on address assignment conventions to simplify concrete addresses out of test cases.
Namely: * Contained all topology knowledge in brload: test.py is told what to do by the --showInterfaces option of brload. * Contained all interface names knowledge (mac addresses are next) in test.py: brload is told the real interface names via the -interface flag.
Moved knowledge of mac addresses outside brload. The test harness now tells brload what the interface mac addresses are and not the other way around. In the usual automated test context it makes no difference (the same naming and addressing scheme is used), but in case we ever run the test on a real self-contained machine, the test doesn't get to choose the interface names and macs.
We still check that forwarding works, but consume only 1 paket for that purpose. Let the others accumulate and spill.
4 is also 1/2 of the CI system cores which works-out nicely even with the test scaling absent.
Measured at ~311K pkts/machine*s so setting expectations at 250K.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 13 files at r1, 2 of 5 files at r2.
Reviewable status: 6 of 14 files reviewed, 11 unresolved discussions (waiting on @jiceatscion)
acceptance/router_newbenchmark/BUILD.bazel
line 74 at r2 (raw file):
go_binary( name = "router_newbenchmark",
I think this is redundant (same as brload
).
acceptance/router_newbenchmark/main.go
line 71 at r2 (raw file):
// initDevices inventories the available network interfaces, picks the ones that a case may inject // traffic into, and associates them with a AF Packet interface. func initDevices() error {
Nit: handles
does not need to be a gloabal, return it from here.
acceptance/router_newbenchmark/main.go
line 105 at r2 (raw file):
fmt.Println(listInterfaces()) return 0 }
The flags of this tool are sufficiently complex that the flags
package becomes ugly. I'd suggest to use cobra
, which we use in other places already, and which can handle subcommands etc.
acceptance/router_newbenchmark/main.go
line 122 at r2 (raw file):
} artifactsDir, err := os.MkdirTemp("", "brload_")
Nit: clean this up?
Also, it seems we only use this if dir
/ TEST_ARTIFACTS_DIR are not set, so only create when needed?
acceptance/router_newbenchmark/main.go
line 201 at r2 (raw file):
} } }()
Nit: move the main body of this go func()
into a separate function, passing in the readPktFrom
handle; something like func receivePkts(handle *afpacket.TPacket, expectedPayload string) int
.
acceptance/router_newbenchmark/main.go
line 223 at r2 (raw file):
close(packetChan) outcome := <-listenerChan
Perhaps a bit more elegant; instead of Sleep + close + read, read from the listenerChan with a timeout.
acceptance/router_newbenchmark/test.py
line 106 at r2 (raw file):
# Accepts an IntfReq records names and mac addresses into an Intf and associates it with # the request label. def create_interface(self, req: IntfReq, ns: str) -> Intf:
Nit: type, does not actually return anything?
acceptance/router_newbenchmark/test.py
line 165 at r2 (raw file):
brload = self.get_executable("brload") output = exec_sudo(f"{brload.executable} -artifacts {self.artifacts} " "-show_interfaces")
Nit: don't run as sudo.
acceptance/router_newbenchmark/test.py
line 183 at r2 (raw file):
"--network container:prometheus " "--name router " "bazel/acceptance/router_newbenchmark:router")
Is this the name of the docker image? If so, do we then not need this clunky "container-loader" argument passed to the test in the BUILD file?
acceptance/router_newbenchmark/topo.go
line 174 at r2 (raw file):
sb.WriteString(i.peerIP.String()) sb.WriteString("\n") }
This is quite hard to understand. Document what the expected format is?
acceptance/router_newbenchmark/conf/topology.json
line 14 at r2 (raw file):
"discovery_service": { "cs1-ff00_0_1-1": { "addr": "172.20.0.28:30252"
Nit; Is it legal to leave control_service/discovery_service out? We don't use them.
Moved test cases into home dir and implement a number of reviewer's suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 15 files reviewed, 6 unresolved discussions (waiting on @matzf)
acceptance/router_newbenchmark/BUILD.bazel
line 74 at r2 (raw file):
Previously, matzf (Matthias Frei) wrote…
I think this is redundant (same as
brload
).
Done.
acceptance/router_newbenchmark/main.go
line 71 at r2 (raw file):
Previously, matzf (Matthias Frei) wrote…
Nit:
handles
does not need to be a gloabal, return it from here.
Done
acceptance/router_newbenchmark/main.go
line 122 at r2 (raw file):
Previously, matzf (Matthias Frei) wrote…
Nit: clean this up?
Also, it seems we only use this ifdir
/ TEST_ARTIFACTS_DIR are not set, so only create when needed?
I was also wondering why that was created unconditionally. The only things that came to mind were that it could double as cheap mutual exclusion between parallel runs (can't see why that would be needed) or that it was just a nicer looking code flow. In addition, it seems that the only thing we want out of that dir is to read files from it. There's no alternative if the files don't exist. So, creating the dir serves no purpose. I think this is code left-over from something that used to be more complicated. Removed it entirely.
Done
acceptance/router_newbenchmark/main.go
line 201 at r2 (raw file):
Previously, matzf (Matthias Frei) wrote…
Nit: move the main body of this
go func()
into a separate function, passing in thereadPktFrom
handle; something likefunc receivePkts(handle *afpacket.TPacket, expectedPayload string) int
.
done
Not sure it looks better, but no worse. I kinda like running go routines in the context of the function that creates them. Saves on boilerplate.
acceptance/router_newbenchmark/main.go
line 223 at r2 (raw file):
Previously, matzf (Matthias Frei) wrote…
Perhaps a bit more elegant; instead of Sleep + close + read, read from the listenerChan with a timeout.
Yeah. Here's the prettier version.
acceptance/router_newbenchmark/test.py
line 106 at r2 (raw file):
Previously, matzf (Matthias Frei) wrote…
Nit: type, does not actually return anything?
Ah, yes, no more. It used to.
acceptance/router_newbenchmark/test.py
line 165 at r2 (raw file):
Previously, matzf (Matthias Frei) wrote…
Nit: don't run as sudo.
done
acceptance/router_newbenchmark/test.py
line 183 at r2 (raw file):
Previously, matzf (Matthias Frei) wrote…
Is this the name of the docker image? If so, do we then not need this clunky "container-loader" argument passed to the test in the BUILD file?
Not sure if that's necessary or not. May be we don't have to build one specificly for the test? Will look tomorrow.
acceptance/router_newbenchmark/conf/topology.json
line 14 at r2 (raw file):
Previously, matzf (Matthias Frei) wrote…
Nit; Is it legal to leave control_service/discovery_service out? We don't use them.
I have been wondering. I haven't tried to remove them. I will.
acceptance/router_newbenchmark/topo.go
line 174 at r2 (raw file):
Previously, matzf (Matthias Frei) wrote…
This is quite hard to understand. Document what the expected format is?
Done.
There's no reason to do that, we can just use the same image that's used for everything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 15 files reviewed, 6 unresolved discussions (waiting on @matzf)
acceptance/router_newbenchmark/test.py
line 183 at r2 (raw file):
Previously, jiceatscion wrote…
Not sure if that's necessary or not. May be we don't have to build one specificly for the test? Will look tomorrow.
I think I figured out how it's supposed to work. Yes the build file needs to pass container-loader. I also named just the same as everywhere else: posix-router:latest and made test.py run that. Deleted the custom router image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 15 files reviewed, 6 unresolved discussions (waiting on @matzf)
acceptance/router_newbenchmark/main.go
line 105 at r2 (raw file):
Previously, matzf (Matthias Frei) wrote…
The flags of this tool are sufficiently complex that the
flags
package becomes ugly. I'd suggest to usecobra
, which we use in other places already, and which can handle subcommands etc.
I don't find cobra any prettier, but if it's our standard, I'll comply. Stay tuned.
Gazelle also reordered some stuff and I had to cater to impi's obsessive import order demands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 15 files reviewed, 5 unresolved discussions (waiting on @matzf)
acceptance/router_newbenchmark/main.go
line 105 at r2 (raw file):
Previously, jiceatscion wrote…
I don't find cobra any prettier, but if it's our standard, I'll comply. Stay tuned.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 15 files reviewed, 4 unresolved discussions (waiting on @matzf)
acceptance/router_newbenchmark/conf/topology.json
line 14 at r2 (raw file):
Previously, jiceatscion wrote…
I have been wondering. I haven't tried to remove them. I will.
done
Changes: * Reduce chance of mistake in test case by supplying the underlay UDP port automatically based on the interface type. * Add a comment in the test harness to explain why we shouldn't configure the host side interface addresses.
Reason: Though these functions aren't actually needed outside the package, they are semantically exportable and some of them are only used by test cases that I haven't submitted yet, which makes them unused in the eyes of the linter.
Changes: * Added warmup test run to trigger cpu frequency scaling. * Fixed some docstrings and comments.
Move the TTL setting out of create_interface and improve comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 6 files at r5, 1 of 4 files at r7, 1 of 3 files at r8, all commit messages.
Reviewable status: 13 of 15 files reviewed, 1 unresolved discussion (waiting on @jiceatscion)
acceptance/router_newbenchmark/BUILD.bazel
line 74 at r2 (raw file):
Previously, jiceatscion wrote…
Done.
This unused router_newbenchmark
go binary is still here though.
acceptance/router_newbenchmark/topo.go
line 174 at r2 (raw file):
Previously, jiceatscion wrote…
Meh, it has to be supplied to ip addr add. Since the test is supplying the IPs, it should decide on the subnets. We could hard-code it in test.py, but why break the model?
Ok, I see.
acceptance/router_newbenchmark/cases/topo.go
line 125 at r6 (raw file):
Previously, jiceatscion wrote…
I disagree. I did this deliberately. Crashing in this case is the most helpful thing we can do. It attracts attention and provides the cause.
Ok. I disagree with that, too. 😉 How is this better than returning an error explaining that the argument was malformed? If I see a panic, I assume that the program is broken, not my invocation of it. Never mind though, leave it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 11 of 19 files reviewed, 1 unresolved discussion (waiting on @matzf)
acceptance/router_newbenchmark/cases/topo.go
line 125 at r6 (raw file):
Previously, matzf (Matthias Frei) wrote…
Ok. I disagree with that, too. 😉 How is this better than returning an error explaining that the argument was malformed? If I see a panic, I assume that the program is broken, not my invocation of it. Never mind though, leave it.
It's a common and defensible opinion. In general I would agree with the distinction b/w internal error and usage error. However, this isn't an ordinary cmd line tool. It isn't really meant for human consumption: the arguments come from another component of the test. So, if the args are broken, it can be regarded as an internal error. I do not feel strongly about that. If it does offend your sense of quality, I will change it.
The main motivation is to stop Gazelle from adding a redundant target router_newbenchmark, which is brload with another name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 10 of 20 files reviewed, 1 unresolved discussion (waiting on @matzf)
acceptance/router_newbenchmark/BUILD.bazel
line 74 at r2 (raw file):
Previously, matzf (Matthias Frei) wrote…
This unused
router_newbenchmark
go binary is still here though.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 7 files at r9, 2 of 2 files at r10, 4 of 4 files at r11, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @jiceatscion)
acceptance/router_newbenchmark/cases/in.go
line 106 at r10 (raw file):
// Calculate MACs... // Seg0: Hops are in non-consdir. sp.HopFields[1].Mac = path.MAC(mac, sp.InfoFields[0], sp.HopFields[1], nil)
Here and all other cases; this creates a bogus MAC because it uses the key of AS 1 for a hop field of AS 2. Instead, rather leave it empty, or initialize with or otherwise obviously arbitrary value.
acceptance/router_newbenchmark/cases/out_transit.go
line 156 at r10 (raw file):
panic(err) } return input.Bytes()
There is a relatively large fraction of these cases that is just boilerplate. Setting up the underlay ethernet, IP and UDP layers, the L4 payload layer, and serializing the layers, always looks identical. What about a minimal helper function function to wrap this. I'd suggest to pass all the addresses explicitly -- no magic, just a shorthand.
func SerializeSCIONPacketWithUnderlay(srcMac net.HardwareAddr, srcIPPort netip.AddrPort, dstMac net.HardwareAddr, dstIPPort netip.AddrPort, scionL *slayers.SCION, payload []byte) []byte
acceptance/router_newbenchmark/cases/topo.go
line 94 at r11 (raw file):
// Such names are those used when responding to the show-interfaces command and when translating // the --interface option. func InterfaceLabel(AS int, intf int) string {
Nit: unexport this, it seems strictly internal (only used to initialize intfMap
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 19 of 20 files reviewed, 2 unresolved discussions (waiting on @matzf)
acceptance/router_newbenchmark/cases/topo.go
line 94 at r11 (raw file):
Previously, matzf (Matthias Frei) wrote…
Nit: unexport this, it seems strictly internal (only used to initialize
intfMap
).
done
(and the Subnet = max() thing too).
Changes: * Add helper function for boiler plate underlay layer construction. * Fix Fake MacAddresses to make them more deliberately fake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 13 of 20 files reviewed, 2 unresolved discussions (waiting on @matzf)
acceptance/router_newbenchmark/cases/in.go
line 106 at r10 (raw file):
Previously, matzf (Matthias Frei) wrote…
Here and all other cases; this creates a bogus MAC because it uses the key of AS 1 for a hop field of AS 2. Instead, rather leave it empty, or initialize with or otherwise obviously arbitrary value.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 13 of 20 files reviewed, 1 unresolved discussion (waiting on @matzf)
acceptance/router_newbenchmark/cases/out_transit.go
line 156 at r10 (raw file):
Previously, matzf (Matthias Frei) wrote…
There is a relatively large fraction of these cases that is just boilerplate. Setting up the underlay ethernet, IP and UDP layers, the L4 payload layer, and serializing the layers, always looks identical. What about a minimal helper function function to wrap this. I'd suggest to pass all the addresses explicitly -- no magic, just a shorthand.
func SerializeSCIONPacketWithUnderlay(srcMac net.HardwareAddr, srcIPPort netip.AddrPort, dstMac net.HardwareAddr, dstIPPort netip.AddrPort, scionL *slayers.SCION, payload []byte) []byte
done
…hoke on max(). Also fold an extra long line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 7 files at r13, 3 of 3 files at r14, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @jiceatscion)
go.mod
line 133 at r14 (raw file):
) go 1.18
Can you please mention this in the PR description, so we won't forget to mention this in the next release notes (affects downstream).
acceptance/router_newbenchmark/cases/topo.go
line 70 at r14 (raw file):
// the local AS. This works if there are no cycles. Else there could be subnet number collisions. func PublicIP(localAS byte, remoteAS byte) netip.Addr { subnetNr := byte(max(int(remoteAS), int(localAS)))
The type conversions are not necessary, max
is generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 21 of 22 files reviewed, 1 unresolved discussion (waiting on @matzf)
go.mod
line 133 at r14 (raw file):
Previously, matzf (Matthias Frei) wrote…
Can you please mention this in the PR description, so we won't forget to mention this in the next release notes (affects downstream).
done
acceptance/router_newbenchmark/cases/topo.go
line 70 at r14 (raw file):
Previously, matzf (Matthias Frei) wrote…
The type conversions are not necessary,
max
is generic.
Yeah, some of the random things I tried when the linter was screaming at me.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r15, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @jiceatscion)
…nproto#4444) This is an alternative to router_benchmark and meant to eventually replace it. It's main feature is that it only runs a single router. As a result it's measured performance is far less dependent on the number of cores available on the test system. It is heavily inspired by the braccept test, in that: * it relies on the test harness to create an isolated network and veth interfaces to connect the test driver to the router. * it manufactures packets that are fed directly into the router's interfaces and it optionally captures packets for verification in the same manner. * it assumes a custom topology and cannot be usefully run on any other topology. It differs from braccept in that: * It does far less verification of the forwarded packets (it's braccept's job to do that). * It is considerably simplified, mainly because of the lighter verifications. * It relies on a simple address-assignment scheme so code maintainers don't have to keep tens of addresses in mind. * It does not assume that the router runs in a container network; it could also talk to a real router if given the names and mac addresses of the relevant interfaces. The test harness (or the user) is responsible for supplying the interface names and mac addresses to the test driver. * It does not require the test harness (or the user) to have any understanding of the topology (only to configure the router with it). The specifics of the captive network to be configured are supplied by the test driver. * The test harness doesn't need the "pause" container anymore. The similarity between this and braccept means that, in the future, we could re-converge them. Notably, the benefits of the simple addressing scheme would make the braccept test cases easier to maintain or expand. Collateral: the go version required was bumped to 1.21 Fixes scionproto#4442
This is an alternative to router_benchmark and meant to eventually replace it.
It's main feature is that it only runs a single router. As a result it's measured performance is far less dependent on the number of cores available on the test system.
It is heavily inspired by the braccept test, in that:
It differs from braccept in that:
The similarity between this and braccept means that, in the future, we could re-converge them. Notably, the benefits of the simple addressing scheme would make the braccept test cases easier to maintain or expand.
Collateral: the go version required was bumped to 1.21
Fixes #4442
This change is