-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
router: extend packet processing metrics #4422
Conversation
… metrics. Packet counters are labeled with a "crossing" flag, which denotes whether a packet is crossing the local AS boundary or not. Combined the existing separate ingress/egress counters and the existing interface labels (from which we can tell internal from external interfaces), this allows to counts packets according to 4 types of processing: incoming: from another AS to the local AS outgoing: from the local AS to another AS local: from the local AS to the local AS transit: from another AS to another AS It should be noted that these counters are about a single router, so a packet that transits through an AS via two routers will be counted by each router independently: once as incoming and once as outgoing. Not as transit.
A bit of naming cleanup ensued.
* Added error logging. * Added preemption count reporting.
The latter provides the time spend in runnable-not-running state which is what we truly need to do scheduling-aware performance observations.
* Massage (interface=internal/not-internal, crossing=t/f) into a traffic type, so we don't have to do it with interminably long queries. * This required updating prometheus to the latest, which is just as well.
The collected metrics are originally per-threads. Collecting them means knowing what threads exist. The procfs API isn't smart about it and generates work for the GC. To prevent it, we stat /proc/pid/tasks ourselves and check if the count of files has changed. Go never kills threads; only adds them. So any set of changes is a change of count. Added a metric to get an idea of how often we actually have to pay the price: process_metrics_tasklist_updates_total
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 15 files at r2.
Reviewable status: 7 of 16 files reviewed, 6 unresolved discussions (waiting on @jiceatscion)
pkg/private/processmetrics/BUILD.bazel~
line 0 at r4 (raw file):
Accidentally commited backup file? Same for processmetrics.go~
. Btw, feel free to add a gitignore pattern for emacs.
pkg/private/processmetrics/processmetrics_linux.go
line 43 at r4 (raw file):
// runningTime should be the same as uTime+sTime reported in a variety of other ways, // but when doing calculations, better use the two data from the same source. So, collect them // both.
Since go 1.20, the runtime/metrics package has a bunch of CPU metrics. In particular /cpu/classes/idle:cpu-seconds
and /cpu/classes/total
appear to be the value that we're after. It appears to be tracked by the go runtime and thus is likely not exactly the same value as the one exposed by the OS scheduler.
I believe this data should be exported by prometheus as go_cpu_classes_idle_cpu_seconds_total
and go_cpu_classes_total_cpu_seconds_total
. For some reason, I couldn't make this happen though yet, I'll have another look. If this works, it seems that we don't even need this custom collector here.
pkg/private/processmetrics/processmetrics_linux.go
line 70 at r4 (raw file):
// FIXME (if we can): AllThreads builds a list and it // ends-up on the garbage pile. myProcs, err := procfs.AllThreads(c.myPid)
I think it would be safe to get this list of threads during the initialization and then keep using it. As far as I understand, the go runtime spins up threads during the initialization
pkg/private/processmetrics/processmetrics_linux.go
line 80 at r4 (raw file):
// and remove only the unupdated ones, but Go makes map values un-addressible. We'd need // to use pointers instead, which would probably make things worse. TBD. for pid := range c.lastSchedstats {
Could use clear(c.lastSchedstats)
(new in go 1.21).
If garbage is a concern, couldn't we just some over the threads already here, instead of keeping the map and then summing later?
pkg/private/processmetrics/processmetrics_linux.go
line 133 at r4 (raw file):
// It is safe to ignore errors from this but prometheus may lack some // metrics. func NewProcStatCollector() error {
I'd expect that New...
returns something. If this directly registers the collector, perhapsInit..
or Register...
would be clearer.
pkg/private/processmetrics/processmetrics_linux.go
line 149 at r4 (raw file):
err = prometheus.Register(c) if err != nil { log.Error("NewProcStatCollector", "registration failed", err)
IMO, ideally we'd move the logging outside the library in the calling code.
Per reviewer's suggestion: * Rename Newxyz to Init(), since not returning anything. * Moved logging to caller side. * Normalized errors a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 7 of 16 files reviewed, 3 unresolved discussions (waiting on @matzf)
pkg/private/processmetrics/BUILD.bazel~
line at r4 (raw file):
Previously, matzf (Matthias Frei) wrote…
Accidentally commited backup file? Same for
processmetrics.go~
. Btw, feel free to add a gitignore pattern for emacs.
Done.
pkg/private/processmetrics/processmetrics_linux.go
line 43 at r4 (raw file):
Previously, matzf (Matthias Frei) wrote…
Since go 1.20, the runtime/metrics package has a bunch of CPU metrics. In particular
/cpu/classes/idle:cpu-seconds
and/cpu/classes/total
appear to be the value that we're after. It appears to be tracked by the go runtime and thus is likely not exactly the same value as the one exposed by the OS scheduler.
I believe this data should be exported by prometheus asgo_cpu_classes_idle_cpu_seconds_total
andgo_cpu_classes_total_cpu_seconds_total
. For some reason, I couldn't make this happen though yet, I'll have another look. If this works, it seems that we don't even need this custom collector here.
/cpu/classes/total:cpu-seconds seemed promising, but it is defined as "GOMAXPROCS integrated over the wall-clock duration this process has been executing for." Which means that includes all periods of "Runnable" state. Where the process is kept off the CPU by preemption. Every existing metric I looked at, other than "schedstats" has that problem. The Go doc is even brutally honest about the disconnection with reality: "Compare only with other /cpu/classes metrics". So, I'd rather use something the meaning of which I'm pretty sure is what I need.
pkg/private/processmetrics/processmetrics_linux.go
line 70 at r4 (raw file):
Previously, matzf (Matthias Frei) wrote…
I think it would be safe to get this list of threads during the initialization and then keep using it. As far as I understand, the go runtime spins up threads during the initialization
I have found differently. The line-up increases in 2 or 3 increments before stabilizing. The number of thread never goes down, though. That is confirmed by several feature requests since 2016, never addressed. As you see in my latest revision. I now refresh the list only when needed. That's 2 or 3 times total.
pkg/private/processmetrics/processmetrics_linux.go
line 80 at r4 (raw file):
Previously, matzf (Matthias Frei) wrote…
Could use
clear(c.lastSchedstats)
(new in go 1.21).If garbage is a concern, couldn't we just some over the threads already here, instead of keeping the map and then summing later?
You're right. I was trying to stick to a more general pattern of gathering everything and be free to report what we want later. However, the cost greatly exceeds the benefits. Simplified the map out.
done
pkg/private/processmetrics/processmetrics_linux.go
line 133 at r4 (raw file):
Previously, matzf (Matthias Frei) wrote…
I'd expect that
New...
returns something. If this directly registers the collector, perhapsInit..
orRegister...
would be clearer.
Done.
pkg/private/processmetrics/processmetrics_linux.go
line 149 at r4 (raw file):
Previously, matzf (Matthias Frei) wrote…
IMO, ideally we'd move the logging outside the library in the calling code.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 15 files at r2, 6 of 7 files at r6, all commit messages.
Reviewable status: 11 of 17 files reviewed, 4 unresolved discussions (waiting on @jiceatscion)
a discussion (no related file):
As discussed offline, it seems clearer to additionally record the "ingress" interface only for output packets, instead of this crossing label. This makes the label "schemas" different for processed/output packets vs input and dropped packets though (but this does seem to make conceptual sense).
pkg/private/processmetrics/processmetrics_linux.go
line 43 at r4 (raw file):
Previously, jiceatscion wrote…
/cpu/classes/total:cpu-seconds seemed promising, but it is defined as "GOMAXPROCS integrated over the wall-clock duration this process has been executing for." Which means that includes all periods of "Runnable" state. Where the process is kept off the CPU by preemption. Every existing metric I looked at, other than "schedstats" has that problem. The Go doc is even brutally honest about the disconnection with reality: "Compare only with other /cpu/classes metrics". So, I'd rather use something the meaning of which I'm pretty sure is what I need.
💯
router/dataplane.go
line 986 at r6 (raw file):
writeMsgs[i].Buffers[0] = writeMsgs[i+written+1].Buffers[0] writeMsgs[i].Addr = writeMsgs[i+written+1].Addr }
This seems to miss the "transport" of the corresponding ingressID
slice, so the accounting will be wrong.
This code was "optimized" for least amount of mental overhead, that's why only the IPv4 messages are tracked here. If we now want to track packet metadata, it's probably much better to reorganize this so that:
readUpTo
fills a list ofpacket
(instead ofipv4.Message
and new alsoingressIDs
)- we only keep track of the
packet
s to be written - translate
packet
toipv4.Message
immediately before the WriteBatch call and dont use theipv4.Message
otherwise
router/dataplane.go
line 2465 at r6 (raw file):
// packet's size is <128k. Just in case that would change, larger packets are simply put in the // last class. const maxSizeClass = 18
The largest supported packet size is private/comnmon.SupportedMTU == 9188
, so maxSizeClass
can be 14
.
router/dataplane.go
line 2488 at r6 (raw file):
// Returns the sizeClass of the given size. All classes above // maxSizeClass - 1 are conflated with maxSizeClass - 1.
Would it make sense to also introduce a minSizeClass
or so? Distinguishing the tiny packets probably does not give any benefit, but they make up the bulk of the different size classes. Everything below 128 bytes or so is just "very small".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 15 files at r2, 6 of 7 files at r6, all commit messages.
Reviewable status: 11 of 17 files reviewed, 4 unresolved discussions (waiting on @jiceatscion)
The "crossing" label had dubious semantics with the goal of keeping the same schema for all packet metrics. Instead, as reviewer suggested, accepted that only the output packets really deserved a traffic type and gave up on the same-schema requirements (joining input and output metrics wouldn't be that meaningful anyway). This greatly simplifies things. Now, the real traffic type (not just the "crossing" flag) can be computed by process(), where it's simple, and kept until the packets are output and the metrics captured. The traffic type is also richer than what could be derived from the crossing flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 10 of 17 files reviewed, 3 unresolved discussions (waiting on @matzf)
a discussion (no related file):
Previously, matzf (Matthias Frei) wrote…
As discussed offline, it seems clearer to additionally record the "ingress" interface only for output packets, instead of this crossing label. This makes the label "schemas" different for processed/output packets vs input and dropped packets though (but this does seem to make conceptual sense).
I have refactored accordingly and then some. There's now a fine grain traffic type label for output packets/bytes metrics only.
router/dataplane.go
line 986 at r6 (raw file):
Previously, matzf (Matthias Frei) wrote…
This seems to miss the "transport" of the corresponding
ingressID
slice, so the accounting will be wrong.This code was "optimized" for least amount of mental overhead, that's why only the IPv4 messages are tracked here. If we now want to track packet metadata, it's probably much better to reorganize this so that:
readUpTo
fills a list ofpacket
(instead ofipv4.Message
and new alsoingressIDs
)- we only keep track of the
packet
s to be written- translate
packet
toipv4.Message
immediately before the WriteBatch call and dont use theipv4.Message
otherwise
I have fixed the bug. I have yet to implement the refactoring you suggest....tomorrow.
router/dataplane.go
line 2465 at r6 (raw file):
Previously, matzf (Matthias Frei) wrote…
The largest supported packet size is
private/comnmon.SupportedMTU == 9188
, somaxSizeClass
can be14
.
If I do that, then I have to keep track of SupportedMTU. Which, unlike the maximum supported by the protocol, could change easily.
router/dataplane.go
line 2488 at r6 (raw file):
Previously, matzf (Matthias Frei) wrote…
Would it make sense to also introduce a
minSizeClass
or so? Distinguishing the tiny packets probably does not give any benefit, but they make up the bulk of the different size classes. Everything below 128 bytes or so is just "very small".
done
…upported by the rest of the code. Previously the largest size per the spec was supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 10 of 17 files reviewed, 3 unresolved discussions (waiting on @matzf)
router/dataplane.go
line 2465 at r6 (raw file):
Previously, jiceatscion wrote…
If I do that, then I have to keep track of SupportedMTU. Which, unlike the maximum supported by the protocol, could change easily.
Ok. I added an init function to assert for the size. maxSizeClass is now 15 (it is the first not-supported size).
Done.
This is to avoid remove the code that supports it and having to reinvent it later if we need it, which is likely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 10 of 17 files reviewed, 3 unresolved discussions (waiting on @matzf)
router/dataplane.go
line 986 at r6 (raw file):
Previously, jiceatscion wrote…
I have fixed the bug. I have yet to implement the refactoring you suggest....tomorrow.
Done.
I'll let you judge if it's really better than before. As long as we can't pass []Packet to writeBatch, there isn't a whole lot we can do to avoid managing Packets and Messages in parallel (without sacrificing performance).
The cost ends-up being the same: unsent packets yield to repeated copying to writeMsg. In one case by shifting the writeMsg buffer. In the other be recreating writeMsg elements from pkts. The latter is just simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 10 of 17 files reviewed, 7 unresolved discussions (waiting on @jiceatscion)
a discussion (no related file):
Previously, jiceatscion wrote…
I have refactored accordingly and then some. There's now a fine grain traffic type label for output packets/bytes metrics only.
👍
router/dataplane.go
line 986 at r6 (raw file):
I think it would help if we get rid of updating the writeMsgs
in this loop here and just stupidly "copy" the entire readPkts
into writeMsgs
everytime. I doubt this will hurt.
// Note: we could use a disposable array for that, but we'd have to redo the copies for the packets that we retry.
Also, can we name it just pkts
or so? "read" being this stupid verb without proper participle makes this a bit confusing...
router/dataplane.go
line 2465 at r6 (raw file):
Previously, jiceatscion wrote…
Ok. I added an init function to assert for the size. maxSizeClass is now 15 (it is the first not-supported size).
Done.
Nice. There's a (somewhat cryptic) way to make this a compile time check instead:
const _ = uint(1 << (maxSizeClass-1) - bufSize) // compile time assert: ensure that 2 ^ (maxSizeClass-1) >= bufSize
Checking at compile time avoids that this shows up only very late in the refactoring process, after someone has already declared success when things compile.
Not blocking, if you think it's too obscure just leave it as is.
router/dataplane.go
line 953 at r7 (raw file):
} // We need to collect stats by traffic type and size class.
This accounting got a bit verbose now. Would it be possible to move this into a separate function? accountWrittenPkts(metrics, pkts[:written])
or so?
The call to returnPacketToPool
should perhaps still be here though for clarity.
router/dataplane.go
line 958 at r7 (raw file):
writtenPkts := [ttMax][maxSizeClass]int{} writtenBytes := [ttMax][maxSizeClass]int{} for i := range writeMsgs[:written] {
Nit Replace writeMsgs
here with readPkts
in the loop.
router/dataplane.go
line 1001 at r7 (raw file):
func readUpTo(c <-chan packet, n int, needsBlocking bool, pkts []packet) int {
Nit: styleguide for multiline function declarations is: "If a function declaration uses more than 1 line, each parameter should be declared on a separate line and the first line of the function body should be empty"
This one probably still fits on a single line though :)
router/dataplane.go
line 1078 at r7 (raw file):
pld := p.lastLayer.LayerPayload() minResult := processResult{}
No longer useful, please revert to returning processResult{}
.
router/dataplane.go
line 2424 at r7 (raw file):
} // interfaceMetrics is the set of metrics that are relevant for one given interface.
We could move all of metrics stuff (~things below this line) into metrics.go
. I think the (*Dataplane).initMetrics
should stay, but for the rest this would seem fairly appropriate, and we'd avoid further growing this gargantuan dataplane.go
thing. What do you think?
router/dataplane.go
line 2523 at r7 (raw file):
return "inTransit" case ttOutTransit: return "outTransit"
We use snake_case
for the metric labels and values.
In passing, apply reviewer's suggestion of replacing the init() function with a static assertion.
* Improved variables naming * move metrics updates out of runForwarder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 10 of 17 files reviewed, 4 unresolved discussions (waiting on @matzf)
router/dataplane.go
line 986 at r6 (raw file):
Previously, matzf (Matthias Frei) wrote…
I think it would help if we get rid of updating the
writeMsgs
in this loop here and just stupidly "copy" the entirereadPkts
intowriteMsgs
everytime. I doubt this will hurt.// Note: we could use a disposable array for that, but we'd have to redo the copies for the packets that we retry.
Also, can we name it just
pkts
or so? "read" being this stupid verb without proper participle makes this a bit confusing...
Done
I came to the same conclusion. Did some more variable renaming.
router/dataplane.go
line 2465 at r6 (raw file):
Previously, matzf (Matthias Frei) wrote…
Nice. There's a (somewhat cryptic) way to make this a compile time check instead:
const _ = uint(1 << (maxSizeClass-1) - bufSize) // compile time assert: ensure that 2 ^ (maxSizeClass-1) >= bufSize
Checking at compile time avoids that this shows up only very late in the refactoring process, after someone has already declared success when things compile.
Not blocking, if you think it's too obscure just leave it as is.
Done
router/dataplane.go
line 953 at r7 (raw file):
Previously, matzf (Matthias Frei) wrote…
This accounting got a bit verbose now. Would it be possible to move this into a separate function?
accountWrittenPkts(metrics, pkts[:written])
or so?
The call toreturnPacketToPool
should perhaps still be here though for clarity.
Done
router/dataplane.go
line 958 at r7 (raw file):
Previously, matzf (Matthias Frei) wrote…
Nit Replace
writeMsgs
here withreadPkts
in the loop.
Done
router/dataplane.go
line 1001 at r7 (raw file):
Previously, matzf (Matthias Frei) wrote…
Nit: styleguide for multiline function declarations is: "If a function declaration uses more than 1 line, each parameter should be declared on a separate line and the first line of the function body should be empty"
This one probably still fits on a single line though :)
Done
fits one line again now.
router/dataplane.go
line 1078 at r7 (raw file):
Previously, matzf (Matthias Frei) wrote…
No longer useful, please revert to returning
processResult{}
.
Done
router/dataplane.go
line 2424 at r7 (raw file):
Previously, matzf (Matthias Frei) wrote…
We could move all of metrics stuff (~things below this line) into
metrics.go
. I think the(*Dataplane).initMetrics
should stay, but for the rest this would seem fairly appropriate, and we'd avoid further growing this gargantuandataplane.go
thing. What do you think?
Done
I agree.
router/dataplane.go
line 2523 at r7 (raw file):
Previously, matzf (Matthias Frei) wrote…
We use
snake_case
for the metric labels and values.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 15 files at r2.
Reviewable status: 11 of 17 files reviewed, 10 unresolved discussions (waiting on @jiceatscion)
router/dataplane.go
line 626 at r10 (raw file):
// For non-broken packets, we defer the ingress-side metrics accounting // until we have a chance to figure the traffic type. // That's in runProcessor and processPkt.
This comment is no longer accurate.
router/dataplane.go
line 927 at r10 (raw file):
} for t := ttOther; t < ttMax; t++ { for sc := sizeClass(0); sc < maxSizeClass; sc++ {
for sc := minSizeClass; ...
router/dataplane.go
line 930 at r10 (raw file):
if writtenPkts[t][sc] > 0 { metrics[sc].Output[t].OutputPacketsTotal.Add( float64(writtenPkts[t][sc]))
Nit: fits on one line
router/dataplane.go
line 941 at r10 (raw file):
cfg *RunConfig, c <-chan packet) { fmt.Println("Initialize forwarder for", "interface")
Leftover printf
router/dataplane.go
line 959 at r10 (raw file):
for d.running { toWrite += readUpTo(c, cfg.BatchSize-toWrite, toWrite == 0, pkts[toWrite:])
Nit: also seems to fit onto a single line now
router/export_test.go
line 70 at r10 (raw file):
func InitDPMetrics(dp *DataPlane) { dp.initMetrics()
Move this into NewDP
? It seems we always call this now
router/metrics.go
line 220 at r10 (raw file):
func (sc sizeClass) String() string { return strings.Join( []string{strconv.Itoa((1 << sc) >> 1), strconv.Itoa(1<<sc - 1)},
Nit: this is not accurate for minSizeClass
which would now show as 32_63
(or so.. off by some), instead of 0_63
.
Btw, Sprintf()
might be simpler here.
router/metrics.go
line 227 at r10 (raw file):
// trafficMetrics groups all the metrics instances that all share the same interface AND // sizeClass label values (but have different names - i.e. they count different things). type trafficMetrics struct {
Nit: I think it would help readability if we keep the type definitions of these metrics structs close together:
type interfaceMetrics map[sizeClass]trafficMetrics
struct trafficMetrics { ... }
struct outputMetrics { ... }
... size class, traffic type, ...
router/metrics.go
line 235 at r10 (raw file):
DroppedPacketsBusySlowPath prometheus.Counter ProcessedPackets prometheus.Counter Output map[trafficType]outputMetrics
Not sure if it's worth optimizing this, but this could be an array instead of a map, Output [ttMax]outputMetrics
.
Same for the size classes, actually, but there it's a bit more awkward because of the min offset.
I think this would allow to turn outputMetrics
into an anonymous inline struct definition, which potentially makes this easier to navigate.
router/metrics.go
line 285 at r10 (raw file):
// counters are already instantiated for all the relevant interfaces so this // will not have to be repeated during packet forwarding. func (d *DataPlane) initMetrics() {
My preference would be to keep this function together with all the otherDataplane
methods in the dataplane.go
file.
If we slightly amend the interface of initInterfaceMetrics
so that it takes id uint16, localIA addr.IA, neighbors map[uint16]addr.IA
instead of labels
and does the call to interfaceToMetricsLabels
internally, this can be the only function that is exposed from this metrics.go file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 11 of 17 files reviewed, 11 unresolved discussions (waiting on @jiceatscion)
tools/topology/monitoring.py
line 140 at r10 (raw file):
if relabels is not None: job_scrape_config['metric_relabel_configs'] = relabels scrape_configs.append(job_scrape_config)
Left over?
* deleted InitDPMetrics and moved what it does to NewDP. * moved initMetrics back to dataplane.go because it is a dataplane method. * shifted some of the label manipulation in initMetrics into initInterfaceMetrics to minimize dataplane's exposure to metrics mechanics. * sorted declarations in metrics.go to keep important structs together. * fixed string representation of size class label values * implement trafficMetrics.Output as an array instead of a map. * renamed some init* functions to new*, to better eflect what they do. * reformated to 100 columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 11 of 17 files reviewed, 8 unresolved discussions (waiting on @matzf)
router/dataplane.go
line 626 at r10 (raw file):
Previously, matzf (Matthias Frei) wrote…
This comment is no longer accurate.
Done
router/dataplane.go
line 927 at r10 (raw file):
Previously, matzf (Matthias Frei) wrote…
for sc := minSizeClass; ...
Done
router/dataplane.go
line 930 at r10 (raw file):
Previously, matzf (Matthias Frei) wrote…
Nit: fits on one line
Done
router/dataplane.go
line 941 at r10 (raw file):
Previously, matzf (Matthias Frei) wrote…
Leftover printf
Done
router/dataplane.go
line 959 at r10 (raw file):
Previously, matzf (Matthias Frei) wrote…
Nit: also seems to fit onto a single line now
Done
router/export_test.go
line 70 at r10 (raw file):
Previously, matzf (Matthias Frei) wrote…
Move this into
NewDP
? It seems we always call this now
Done
(Yeah, looks that way, now)
router/metrics.go
line 220 at r10 (raw file):
Previously, matzf (Matthias Frei) wrote…
Nit: this is not accurate for
minSizeClass
which would now show as32_63
(or so.. off by some), instead of0_63
.
Btw,Sprintf()
might be simpler here.
Done
Ah, right. Here you are. Upper bound also promoted to "inf".
router/metrics.go
line 227 at r10 (raw file):
Previously, matzf (Matthias Frei) wrote…
Nit: I think it would help readability if we keep the type definitions of these metrics structs close together:
type interfaceMetrics map[sizeClass]trafficMetrics struct trafficMetrics { ... } struct outputMetrics { ... } ... size class, traffic type, ...
K. Sorted differently.
router/metrics.go
line 235 at r10 (raw file):
Previously, matzf (Matthias Frei) wrote…
Not sure if it's worth optimizing this, but this could be an array instead of a map,
Output [ttMax]outputMetrics
.
Same for the size classes, actually, but there it's a bit more awkward because of the min offset.I think this would allow to turn
outputMetrics
into an anonymous inline struct definition, which potentially makes this easier to navigate.
- Done
- We'll talk.
router/metrics.go
line 285 at r10 (raw file):
Previously, matzf (Matthias Frei) wrote…
My preference would be to keep this function together with all the other
Dataplane
methods in thedataplane.go
file.If we slightly amend the interface of
initInterfaceMetrics
so that it takesid uint16, localIA addr.IA, neighbors map[uint16]addr.IA
instead oflabels
and does the call tointerfaceToMetricsLabels
internally, this can be the only function that is exposed from this metrics.go file.
Done. Good idea. Worked out nicely. I took the opportunity to rename the init(.*)Metrics to new(\1)Metrics since that's what these functions ~are. Unfortunately your wish isn't entirely granted: serviceMetricLabels() is still referenced directly somewhere in dataplane.go
tools/topology/monitoring.py
line 140 at r10 (raw file):
Previously, matzf (Matthias Frei) wrote…
Left over?
No. I left it there deliberately as I predict that relabeling will be back soon, just for other reasons. I don't want to throw away support for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 4 files at r7, 4 of 4 files at r11, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @jiceatscion)
tools/topology/monitoring.py
line 140 at r10 (raw file):
Previously, jiceatscion wrote…
No. I left it there deliberately as I predict that relabeling will be back soon, just for other reasons. I don't want to throw away support for it.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r12, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @jiceatscion)
The interface label value of every metric instance had been lost to a refactoring in #4422.
Improve metrics for router packet processing: * Classify packets by size bucket: log2(size) * Classify packets by traffic type: incoming, outgoing, as-transit-in/out, br-transit. * Add process-level on-cpu and sleep time accounting
The interface label value of every metric instance had been lost to a refactoring in scionproto#4422.
Improve metrics for router packet processing:
closes #4409
This change is