Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ethtool monitoring support to Flent #302

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 31 additions & 25 deletions doc/tests.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,34 +85,40 @@ behave. These are:

.. envvar:: cpu_stats_hosts
.. envvar:: netstat_hosts
.. envvar:: ethtool_hosts
.. envvar:: qdisc_stats_hosts
.. envvar:: wifi_stats_hosts

These set hostnames to gather statistics from from during the test. The
hostnames are passed to SSH, so can be anything understood by SSH (including
using ``username@host`` syntax, or using hosts defined in ``~/.ssh/config``).
This will attempt to run remote commands on these hosts to gather the
required statistics, so passwordless login has to be enabled for. Multiple
hostnames can be specified, separated by commas.

CPU stats and netstat output is global to the machine being connected to. The
qdisc and WiFi stats need extra parameters to work. These are
``qdisc_stats_interfaces``, ``wifi_stats_interfaces`` and
``wifi_stats_stations``. The two former specify which interfaces to gather
statistics from. These are paired with the hostnames, and so must contain the
same number of elements (also comma-separated) as the ``_hosts`` variables.
To specify multiple interfaces on the same host, duplicate the hostname. The
``wifi_stats_stations`` parameter specifies MAC addresses of stations to
gather statistics for. This list is the same for all hosts, but only stations
present in debugfs on each host are actually captured.

The qdisc stats gather statistics output from ``tc -s``, while the WiFi stats
gather statistics from debugfs. These are gathered by looping in a shell
script; however, for better performance, the ``tc_iterate`` and
``wifistats_iterate`` programmes available in the ``misc/`` directory of the
source code tarball can be installed. On low-powered systems this can be
critical to get correct statistics. The helper programmes are packaged for
LEDE/OpenWrt in the ``flent-tools`` package.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to this paragraph?

These specify the hostnames from which to gather statistics during the test.
Flent passes the hostnames to SSH; therefore, the hostnames follow all the
traditional SSH hostname declarations, including using the ``username@host``
syntax or hosts defined in ``~/.ssh/config``. Flent will attempt to run
remote commands on these hosts to gather the required statistics. For this to
work, the hosts must have passwordless login enabled. You can specify
multiple hostnames by separating them by commas.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not change the style of the text; keep the passive voice.


While CPU stats, ethtool, and netstat output are global to the machine being
connected to, the qdisc and WiFi stats are more specific and require extra
parameters to work effectively. These parameters, namely
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop 'effectively'

``qdisc_stats_interfaces``, ``wifi_stats_interfaces``, and
``wifi_stats_stations``, play a crucial role in specifying which interfaces
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop the value-laden language "play a crucial role in"

to gather statistics from and which MAC addresses of stations to gather
statistics for. Remember, these parameters are paired with the hostnames, so
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overly pedagogical "remember...". Just describe what the options do

they must contain the same number of elements as the ``_hosts`` variables. To
specify multiple interfaces on the same host, simply duplicate the hostname.
The ``wifi_stats_stations`` parameter specifies the MAC addresses of stations
for which statistics are to be gathered. This list is the same for all hosts,
but only stations present in debugfs on each host are actually captured.
The ``ethtool_hosts`` parameter lets you finetune which devices and fields to
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not "lets you finetune". Passive voice description of what the parameters do, please.

monitor using the ``ethtool_devices`` and ``ethtool_fields`` parameters. By
default, Flent will monitor all network devices from which it can get values.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good default; that will gather way too much data on a machine with many interfaces, and besides it's inconsistent with the other parameters. Let's keep the interfaces explicitly specified, like for the other runners.

However, the ``ethtool_devices`` parameter lets you filter which devices to
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop "you", use passive voice

monitor. If no fields are specified, Flent will monitor the ``rx_packets``
and ``tx_packets`` fields unless you specify other fields in the
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should default to capturing both packet and byte counts

``ethtool_fields`` parameter. You can create a comma-separated list of fields
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop the last half of this sentence (from "unless you...")

to monitor; however, if you prefix the field with a network device name
separated by a colon, Flent will only monitor that field for that particular
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit seems over-engineered. Let's just keep this to capturing the same fields for all interfaces.

device. Example: ``ethtool_fields=tx_bytes,eth0:rx_packets,eth1:tx_packets``

.. envvar:: ping_hosts
.. envvar:: ping_local_binds
Expand Down
80 changes: 80 additions & 0 deletions flent/runners.py
Original file line number Diff line number Diff line change
Expand Up @@ -2399,6 +2399,86 @@ def find_binary(self, interval, length, host='localhost'):
host=host)


class EthtoolStatsRunner(ProcessRunner):
time_re = re.compile(r"^Time: (?P<timestamp>\d+\.\d+)", re.MULTILINE)
value_re = re.compile(r"^(?P<ifname>[^:]+):(?P<field>[^:]+): (?P<value>\d+)", re.MULTILINE)

def __init__(self, interval, length, host='localhost',
devices=None, fields=None, **kwargs):
self.interval = interval
self.length = length
self.host = normalise_host(host)
self.devices = devices
self.fields = fields
super(EthtoolStatsRunner, self).__init__(**kwargs)

def parse(self, output, error):
results = {}
raw_values = []
metadata = {}
for part in self.split_stream(output):
timestamp = self.time_re.search(part)
if timestamp is None:
continue
timestamp = float(timestamp.group('timestamp'))
value = self.value_re.search(part)

if value is None:
continue

matches = {}

for m in self.value_re.finditer(part):
ifname = m.group("ifname")
field = m.group("field")
value = m.group("value")
k = f'{ifname}-{field}'
v = float(value)
if k not in matches:
matches[k] = v
else:
matches[k] += v

for k, v in matches.items():
if not isinstance(v, float):
continue
if k not in results:
results[k] = [[timestamp, v]]
else:
results[k].append([timestamp, v])
matches['t'] = timestamp
raw_values.append(matches)
return results, raw_values, metadata

def check(self):
self.command = self.find_binary(self.interval,
self.length, self.host)
super(EthtoolStatsRunner, self).check()

def find_binary(self, interval, length, host='localhost'):
script = os.path.join(DATA_DIR, 'scripts', 'ethtool_iterate.sh')
if not os.path.exists(script):
raise RunnerCheckError("Cannot find ethtool_iterate.sh.")

bash = util.which('bash')
if not bash:
raise RunnerCheckError("Ethtool stats requires a Bash shell.")

devices = f"-d {self.devices}" if self.devices else ""
fields = f"-f {self.fields}" if self.fields else ""
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should set the default field list (that should not be in the script)


return "{bash} {script} -I {interval:.2f} " \
"-c {count:.0f} -H {host} " \
"{devices} {fields}".format(
bash=bash,
script=script,
interval=interval,
count=length // interval + 1,
devices=devices,
fields=fields,
host=host)


class WifiStatsRunner(ProcessRunner):
"""Runner for getting WiFi debug stats from /sys/kernel/debug. Expects
iterations to be separated by '\n---\n and a timestamp to be present in the
Expand Down
182 changes: 182 additions & 0 deletions flent/scripts/ethtool_iterate.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-3.0
# SPDX-FileContributor: Freysteinn Alfredsson <[email protected]>
# SPDX-FileType: SOURCE

count=10
interval=0.1
host=localhost
exact_values=0

while getopts "c:I:H:d:f:e" opt; do
case $opt in
c) count=$OPTARG ;;
I) interval=$OPTARG ;;
H) host=$OPTARG ;;
d) devices=$OPTARG ;;
f) fields=$OPTARG ;;
e) exact_values=1 ;;
esac
done

command_string=$(cat <<EOF
awk -v COUNT="$count" \
-v INTERVAL="$interval" \
-v DEVICES="$devices" \
-v FIELDS="$fields" \
-v EXACT_VALUES="$exact_values" ' \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script is doing way too much work on the captured host. The data gathering scripts are deliberately simple, so they can work on constrained devices, and so the data parting and smarts are implemented in Python in Flent. Also, looping through all the interfaces every iteration can take way too long on a constrained device, making the data points "late" relative to the polling interval. This is the reason the other runners spawn a separate runner instance per interface, for instance; we should do the same here.

So basically, I would change this script to be something like:

patterns=""
for f in fields; do
    patterns="-e '$f' $patterns"
done

command_string=$(cat <<EOF
for i in \$(seq $count); do
    date '+Time: %s.%N';
    ethtool -S $interface | grep -w $patterns
    echo "---";
    sleep $interval || exit 1;
done
EOF
)

and then do all the field parsing in the Flent code; and simply spawn a helper for each interface, like the qdisc and wifi stats do.

function add_device(device)
{
ethtool_cmd = ETHTOOL_CMD_TEMPLATE " " device;
for (device_id in devices)
if (device == devices[device_id])
return;

if (((ethtool_cmd) | getline) > 0)
devices[device_id_count++] = device;
close(ethtool_cmd);
}

function add_field(device, field)
{
if (entries[device, field] != 1)
device_entry_count[device]++;
entries[device, field] = 1;

for (field_id in fields)
if (field == fields[field_id])
return;
fields[field_id_count++] = field;
}

# Startup
BEGIN {
ETHTOOL_CMD_TEMPLATE = "ethtool -S "
DEVICE_LIST_CMD = "ls /sys/class/net";
device_id_count = 0;
field_id_count = 0;

# Set default fields if not specified
if (FIELDS == "") {
FIELDS = "rx_packets,tx_packets";
}

split(DEVICES, device_args, ",");
split(FIELDS, field_params, ",");

# Get devices
while (((DEVICE_LIST_CMD) | getline) > 0) {
if (\$1 == "lo")
continue;
# Only add devices specified in DEVICES
if (DEVICES != "") {
for (device_arg_id in device_args) {
if (\$1 == device_args[device_arg_id])
add_device(\$1);
}
} else {
# Add all devices if DEVICES is not specified
add_device(\$1);
}
}
close(DEVICE_LIST_CMD);

# Get fields
for (field_param_id in field_params) {
field_count = 0;
split(field_params[field_param_id], field_pair, ":");
for (pair_id in field_pair)
field_count++;
field = field_pair[field_count];

# Add device specific fields
if (field_count == 2) {
device = field_pair[1];
add_device(device);
add_field(device, field);
continue
}

# Add global fields to all devices
for (device_id in devices) {
device = devices[device_id];
add_field(device, field);
}
}
}

function update_ethtool_stat(device)
{
FS = ":";
ethtool_cmd = ETHTOOL_CMD_TEMPLATE " " device;
found_fields = 0;

while (((ethtool_cmd) | getline) > 0) {
for (field_id in fields) {
field = fields[field_id];
field_regexp = "^ *" field "\$";
if (\$1 !~ field_regexp)
continue;
value = \$2;

entry_value_prev[device, field] = entry_value[device, field];
entry_value[device, field] = value;

found_fields++;
if (found_fields == device_entry_count[device])
break;
}
}
close(ethtool_cmd);
}

function print_values(device)
{
for (field_id in fields) {
field = fields[field_id];
if (entries[device, field] != 1)
continue;

value_prev = entry_value_prev[device, field];
value = entry_value[device, field];

result = value - value_prev;
result = EXACT_VALUES == 1 ? result : result / INTERVAL;

print(device ":" field ": " result);
}
}

# Main loop
BEGIN {
DATE_CMD = "date \"+Time: %s.%N\""

# Update initial values
for (device_id in devices) {
device = devices[device_id];
update_ethtool_stat(device);
}

# Print interval values
for (i = 0; i < COUNT; i++) {
print("---");
(DATE_CMD) | getline date;
print(date);
close(DATE_CMD);

for (device_id in devices) {
device = devices[device_id];
update_ethtool_stat(device);
print_values(device);
}
system("sleep " INTERVAL);
}
}'
EOF
)

if [ "$host" == "localhost" ]; then
eval "$command_string"
else
echo "$command_string" | ssh "$host" sh
fi
1 change: 1 addition & 0 deletions flent/tests/common.inc
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
include("qdisc_stats.inc")
include("cpu_stats.inc")
include("ethtool_stats.inc")
include("wifi_stats.inc")
include("ping.inc")
include("voip_mixin.inc")
Expand Down
36 changes: 36 additions & 0 deletions flent/tests/ethtool_stats.inc
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
## -*- mode: python; coding: utf-8 -*-

# Mixin include file to add cpu stats to a test


ETHTOOL_HOSTS=get_test_parameter('ethtool_hosts', default=[], split=True)
ETHTOOL_DEVICES=get_test_parameter('ethtool_devices', default=None)
ETHTOOL_FIELDS=get_test_parameter('ethtool_fields', default=None)

for host in ETHTOOL_HOSTS:
DATA_SETS['ethtool_stats_%s' % host] = {'interval': STEP_SIZE,
'length': TOTAL_LENGTH,
'host': host,
'units': 'misc',
'id': host,
'devices': ETHTOOL_DEVICES,
'fields': ETHTOOL_FIELDS,
'runner': 'ethtool_stats'}

if ETHTOOL_HOSTS:
PLOTS['ethtool'] = {'description': 'Per ethtool field stats',
'type': 'timeseries',
'axis_labels': ['value'],
'series': [
{'data': glob('ethtool_stats_*'),
'raw_key': glob('*', exclude=["t"]),
'label': 'Ethtool field stats'},
]}

PLOTS['ethtool_box'] = {'description': 'Per ethtool stats (box plot)',
'type': 'box',
'parent': 'ethtool'}

PLOTS['ethtool_bar'] = {'description': 'Per ethtool stats (bar plot)',
'type': 'bar',
'parent': 'ethtool'}
Binary file added unittests/test_data/test-ping-ethtool.flent.gz
Binary file not shown.
8 changes: 8 additions & 0 deletions unittests/test_plotters.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,14 @@
'tcp_rtt_box_combine',
'tcp_rtt_cdf',
)),
'test-ping-ethtool.flent.gz': set((
'tcp_cwnd',
'tcp_pacing',
'tcp_rtt',
'tcp_rtt_bar_combine',
'tcp_rtt_box_combine',
'tcp_rtt_cdf',
)),
'test-rrul-icmp.flent.gz': set((
'tcp_cwnd',
'tcp_delivery_rate',
Expand Down
Loading