-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ethtool monitoring support to Flent #302
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -85,34 +85,40 @@ behave. These are: | |
|
||
.. envvar:: cpu_stats_hosts | ||
.. envvar:: netstat_hosts | ||
.. envvar:: ethtool_hosts | ||
.. envvar:: qdisc_stats_hosts | ||
.. envvar:: wifi_stats_hosts | ||
|
||
These set hostnames to gather statistics from from during the test. The | ||
hostnames are passed to SSH, so can be anything understood by SSH (including | ||
using ``username@host`` syntax, or using hosts defined in ``~/.ssh/config``). | ||
This will attempt to run remote commands on these hosts to gather the | ||
required statistics, so passwordless login has to be enabled for. Multiple | ||
hostnames can be specified, separated by commas. | ||
|
||
CPU stats and netstat output is global to the machine being connected to. The | ||
qdisc and WiFi stats need extra parameters to work. These are | ||
``qdisc_stats_interfaces``, ``wifi_stats_interfaces`` and | ||
``wifi_stats_stations``. The two former specify which interfaces to gather | ||
statistics from. These are paired with the hostnames, and so must contain the | ||
same number of elements (also comma-separated) as the ``_hosts`` variables. | ||
To specify multiple interfaces on the same host, duplicate the hostname. The | ||
``wifi_stats_stations`` parameter specifies MAC addresses of stations to | ||
gather statistics for. This list is the same for all hosts, but only stations | ||
present in debugfs on each host are actually captured. | ||
|
||
The qdisc stats gather statistics output from ``tc -s``, while the WiFi stats | ||
gather statistics from debugfs. These are gathered by looping in a shell | ||
script; however, for better performance, the ``tc_iterate`` and | ||
``wifistats_iterate`` programmes available in the ``misc/`` directory of the | ||
source code tarball can be installed. On low-powered systems this can be | ||
critical to get correct statistics. The helper programmes are packaged for | ||
LEDE/OpenWrt in the ``flent-tools`` package. | ||
These specify the hostnames from which to gather statistics during the test. | ||
Flent passes the hostnames to SSH; therefore, the hostnames follow all the | ||
traditional SSH hostname declarations, including using the ``username@host`` | ||
syntax or hosts defined in ``~/.ssh/config``. Flent will attempt to run | ||
remote commands on these hosts to gather the required statistics. For this to | ||
work, the hosts must have passwordless login enabled. You can specify | ||
multiple hostnames by separating them by commas. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's not change the style of the text; keep the passive voice. |
||
|
||
While CPU stats, ethtool, and netstat output are global to the machine being | ||
connected to, the qdisc and WiFi stats are more specific and require extra | ||
parameters to work effectively. These parameters, namely | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Drop 'effectively' |
||
``qdisc_stats_interfaces``, ``wifi_stats_interfaces``, and | ||
``wifi_stats_stations``, play a crucial role in specifying which interfaces | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Drop the value-laden language "play a crucial role in" |
||
to gather statistics from and which MAC addresses of stations to gather | ||
statistics for. Remember, these parameters are paired with the hostnames, so | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Overly pedagogical "remember...". Just describe what the options do |
||
they must contain the same number of elements as the ``_hosts`` variables. To | ||
specify multiple interfaces on the same host, simply duplicate the hostname. | ||
The ``wifi_stats_stations`` parameter specifies the MAC addresses of stations | ||
for which statistics are to be gathered. This list is the same for all hosts, | ||
but only stations present in debugfs on each host are actually captured. | ||
The ``ethtool_hosts`` parameter lets you finetune which devices and fields to | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not "lets you finetune". Passive voice description of what the parameters do, please. |
||
monitor using the ``ethtool_devices`` and ``ethtool_fields`` parameters. By | ||
default, Flent will monitor all network devices from which it can get values. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this is a good default; that will gather way too much data on a machine with many interfaces, and besides it's inconsistent with the other parameters. Let's keep the interfaces explicitly specified, like for the other runners. |
||
However, the ``ethtool_devices`` parameter lets you filter which devices to | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Drop "you", use passive voice |
||
monitor. If no fields are specified, Flent will monitor the ``rx_packets`` | ||
and ``tx_packets`` fields unless you specify other fields in the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should default to capturing both packet and byte counts |
||
``ethtool_fields`` parameter. You can create a comma-separated list of fields | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Drop the last half of this sentence (from "unless you...") |
||
to monitor; however, if you prefix the field with a network device name | ||
separated by a colon, Flent will only monitor that field for that particular | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This bit seems over-engineered. Let's just keep this to capturing the same fields for all interfaces. |
||
device. Example: ``ethtool_fields=tx_bytes,eth0:rx_packets,eth1:tx_packets`` | ||
|
||
.. envvar:: ping_hosts | ||
.. envvar:: ping_local_binds | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2399,6 +2399,86 @@ def find_binary(self, interval, length, host='localhost'): | |
host=host) | ||
|
||
|
||
class EthtoolStatsRunner(ProcessRunner): | ||
time_re = re.compile(r"^Time: (?P<timestamp>\d+\.\d+)", re.MULTILINE) | ||
value_re = re.compile(r"^(?P<ifname>[^:]+):(?P<field>[^:]+): (?P<value>\d+)", re.MULTILINE) | ||
|
||
def __init__(self, interval, length, host='localhost', | ||
devices=None, fields=None, **kwargs): | ||
self.interval = interval | ||
self.length = length | ||
self.host = normalise_host(host) | ||
self.devices = devices | ||
self.fields = fields | ||
super(EthtoolStatsRunner, self).__init__(**kwargs) | ||
|
||
def parse(self, output, error): | ||
results = {} | ||
raw_values = [] | ||
metadata = {} | ||
for part in self.split_stream(output): | ||
timestamp = self.time_re.search(part) | ||
if timestamp is None: | ||
continue | ||
timestamp = float(timestamp.group('timestamp')) | ||
value = self.value_re.search(part) | ||
|
||
if value is None: | ||
continue | ||
|
||
matches = {} | ||
|
||
for m in self.value_re.finditer(part): | ||
ifname = m.group("ifname") | ||
field = m.group("field") | ||
value = m.group("value") | ||
k = f'{ifname}-{field}' | ||
v = float(value) | ||
if k not in matches: | ||
matches[k] = v | ||
else: | ||
matches[k] += v | ||
|
||
for k, v in matches.items(): | ||
if not isinstance(v, float): | ||
continue | ||
if k not in results: | ||
results[k] = [[timestamp, v]] | ||
else: | ||
results[k].append([timestamp, v]) | ||
matches['t'] = timestamp | ||
raw_values.append(matches) | ||
return results, raw_values, metadata | ||
|
||
def check(self): | ||
self.command = self.find_binary(self.interval, | ||
self.length, self.host) | ||
super(EthtoolStatsRunner, self).check() | ||
|
||
def find_binary(self, interval, length, host='localhost'): | ||
script = os.path.join(DATA_DIR, 'scripts', 'ethtool_iterate.sh') | ||
if not os.path.exists(script): | ||
raise RunnerCheckError("Cannot find ethtool_iterate.sh.") | ||
|
||
bash = util.which('bash') | ||
if not bash: | ||
raise RunnerCheckError("Ethtool stats requires a Bash shell.") | ||
|
||
devices = f"-d {self.devices}" if self.devices else "" | ||
fields = f"-f {self.fields}" if self.fields else "" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should set the default field list (that should not be in the script) |
||
|
||
return "{bash} {script} -I {interval:.2f} " \ | ||
"-c {count:.0f} -H {host} " \ | ||
"{devices} {fields}".format( | ||
bash=bash, | ||
script=script, | ||
interval=interval, | ||
count=length // interval + 1, | ||
devices=devices, | ||
fields=fields, | ||
host=host) | ||
|
||
|
||
class WifiStatsRunner(ProcessRunner): | ||
"""Runner for getting WiFi debug stats from /sys/kernel/debug. Expects | ||
iterations to be separated by '\n---\n and a timestamp to be present in the | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,182 @@ | ||
#!/bin/bash | ||
# SPDX-License-Identifier: GPL-3.0 | ||
# SPDX-FileContributor: Freysteinn Alfredsson <[email protected]> | ||
# SPDX-FileType: SOURCE | ||
|
||
count=10 | ||
interval=0.1 | ||
host=localhost | ||
exact_values=0 | ||
|
||
while getopts "c:I:H:d:f:e" opt; do | ||
case $opt in | ||
c) count=$OPTARG ;; | ||
I) interval=$OPTARG ;; | ||
H) host=$OPTARG ;; | ||
d) devices=$OPTARG ;; | ||
f) fields=$OPTARG ;; | ||
e) exact_values=1 ;; | ||
esac | ||
done | ||
|
||
command_string=$(cat <<EOF | ||
awk -v COUNT="$count" \ | ||
-v INTERVAL="$interval" \ | ||
-v DEVICES="$devices" \ | ||
-v FIELDS="$fields" \ | ||
-v EXACT_VALUES="$exact_values" ' \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This script is doing way too much work on the captured host. The data gathering scripts are deliberately simple, so they can work on constrained devices, and so the data parting and smarts are implemented in Python in Flent. Also, looping through all the interfaces every iteration can take way too long on a constrained device, making the data points "late" relative to the polling interval. This is the reason the other runners spawn a separate runner instance per interface, for instance; we should do the same here. So basically, I would change this script to be something like: patterns=""
for f in fields; do
patterns="-e '$f' $patterns"
done
command_string=$(cat <<EOF
for i in \$(seq $count); do
date '+Time: %s.%N';
ethtool -S $interface | grep -w $patterns
echo "---";
sleep $interval || exit 1;
done
EOF
) and then do all the field parsing in the Flent code; and simply spawn a helper for each interface, like the qdisc and wifi stats do. |
||
function add_device(device) | ||
{ | ||
ethtool_cmd = ETHTOOL_CMD_TEMPLATE " " device; | ||
for (device_id in devices) | ||
if (device == devices[device_id]) | ||
return; | ||
|
||
if (((ethtool_cmd) | getline) > 0) | ||
devices[device_id_count++] = device; | ||
close(ethtool_cmd); | ||
} | ||
|
||
function add_field(device, field) | ||
{ | ||
if (entries[device, field] != 1) | ||
device_entry_count[device]++; | ||
entries[device, field] = 1; | ||
|
||
for (field_id in fields) | ||
if (field == fields[field_id]) | ||
return; | ||
fields[field_id_count++] = field; | ||
} | ||
|
||
# Startup | ||
BEGIN { | ||
ETHTOOL_CMD_TEMPLATE = "ethtool -S " | ||
DEVICE_LIST_CMD = "ls /sys/class/net"; | ||
device_id_count = 0; | ||
field_id_count = 0; | ||
|
||
# Set default fields if not specified | ||
if (FIELDS == "") { | ||
FIELDS = "rx_packets,tx_packets"; | ||
} | ||
|
||
split(DEVICES, device_args, ","); | ||
split(FIELDS, field_params, ","); | ||
|
||
# Get devices | ||
while (((DEVICE_LIST_CMD) | getline) > 0) { | ||
if (\$1 == "lo") | ||
continue; | ||
# Only add devices specified in DEVICES | ||
if (DEVICES != "") { | ||
for (device_arg_id in device_args) { | ||
if (\$1 == device_args[device_arg_id]) | ||
add_device(\$1); | ||
} | ||
} else { | ||
# Add all devices if DEVICES is not specified | ||
add_device(\$1); | ||
} | ||
} | ||
close(DEVICE_LIST_CMD); | ||
|
||
# Get fields | ||
for (field_param_id in field_params) { | ||
field_count = 0; | ||
split(field_params[field_param_id], field_pair, ":"); | ||
for (pair_id in field_pair) | ||
field_count++; | ||
field = field_pair[field_count]; | ||
|
||
# Add device specific fields | ||
if (field_count == 2) { | ||
device = field_pair[1]; | ||
add_device(device); | ||
add_field(device, field); | ||
continue | ||
} | ||
|
||
# Add global fields to all devices | ||
for (device_id in devices) { | ||
device = devices[device_id]; | ||
add_field(device, field); | ||
} | ||
} | ||
} | ||
|
||
function update_ethtool_stat(device) | ||
{ | ||
FS = ":"; | ||
ethtool_cmd = ETHTOOL_CMD_TEMPLATE " " device; | ||
found_fields = 0; | ||
|
||
while (((ethtool_cmd) | getline) > 0) { | ||
for (field_id in fields) { | ||
field = fields[field_id]; | ||
field_regexp = "^ *" field "\$"; | ||
if (\$1 !~ field_regexp) | ||
continue; | ||
value = \$2; | ||
|
||
entry_value_prev[device, field] = entry_value[device, field]; | ||
entry_value[device, field] = value; | ||
|
||
found_fields++; | ||
if (found_fields == device_entry_count[device]) | ||
break; | ||
} | ||
} | ||
close(ethtool_cmd); | ||
} | ||
|
||
function print_values(device) | ||
{ | ||
for (field_id in fields) { | ||
field = fields[field_id]; | ||
if (entries[device, field] != 1) | ||
continue; | ||
|
||
value_prev = entry_value_prev[device, field]; | ||
value = entry_value[device, field]; | ||
|
||
result = value - value_prev; | ||
result = EXACT_VALUES == 1 ? result : result / INTERVAL; | ||
|
||
print(device ":" field ": " result); | ||
} | ||
} | ||
|
||
# Main loop | ||
BEGIN { | ||
DATE_CMD = "date \"+Time: %s.%N\"" | ||
|
||
# Update initial values | ||
for (device_id in devices) { | ||
device = devices[device_id]; | ||
update_ethtool_stat(device); | ||
} | ||
|
||
# Print interval values | ||
for (i = 0; i < COUNT; i++) { | ||
print("---"); | ||
(DATE_CMD) | getline date; | ||
print(date); | ||
close(DATE_CMD); | ||
|
||
for (device_id in devices) { | ||
device = devices[device_id]; | ||
update_ethtool_stat(device); | ||
print_values(device); | ||
} | ||
system("sleep " INTERVAL); | ||
} | ||
}' | ||
EOF | ||
) | ||
|
||
if [ "$host" == "localhost" ]; then | ||
eval "$command_string" | ||
else | ||
echo "$command_string" | ssh "$host" sh | ||
fi |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
## -*- mode: python; coding: utf-8 -*- | ||
|
||
# Mixin include file to add cpu stats to a test | ||
|
||
|
||
ETHTOOL_HOSTS=get_test_parameter('ethtool_hosts', default=[], split=True) | ||
ETHTOOL_DEVICES=get_test_parameter('ethtool_devices', default=None) | ||
ETHTOOL_FIELDS=get_test_parameter('ethtool_fields', default=None) | ||
|
||
for host in ETHTOOL_HOSTS: | ||
DATA_SETS['ethtool_stats_%s' % host] = {'interval': STEP_SIZE, | ||
'length': TOTAL_LENGTH, | ||
'host': host, | ||
'units': 'misc', | ||
'id': host, | ||
'devices': ETHTOOL_DEVICES, | ||
'fields': ETHTOOL_FIELDS, | ||
'runner': 'ethtool_stats'} | ||
|
||
if ETHTOOL_HOSTS: | ||
PLOTS['ethtool'] = {'description': 'Per ethtool field stats', | ||
'type': 'timeseries', | ||
'axis_labels': ['value'], | ||
'series': [ | ||
{'data': glob('ethtool_stats_*'), | ||
'raw_key': glob('*', exclude=["t"]), | ||
'label': 'Ethtool field stats'}, | ||
]} | ||
|
||
PLOTS['ethtool_box'] = {'description': 'Per ethtool stats (box plot)', | ||
'type': 'box', | ||
'parent': 'ethtool'} | ||
|
||
PLOTS['ethtool_bar'] = {'description': 'Per ethtool stats (bar plot)', | ||
'type': 'bar', | ||
'parent': 'ethtool'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happened to this paragraph?