Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore IDPF devices for renaming rules #95

Merged
merged 4 commits into from
Jan 7, 2025
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions src/lib/udev/rules.d/75-gce-network.rules
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,7 @@ IMPORT{builtin}="net_id"

# Rule to rename Mellanox devices
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="mlx5_core", PROGRAM="/usr/bin/gce-nic-naming", NAME="%c"

# Rule to rename RDMA devices
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="idpf", PROGRAM="/usr/bin/gce-nic-naming infiniband", NAME="%c"
SUBSYSTEM=="auxiliary", ACTION=="add", DRIVERS=="*irdma*", PROGRAM="/usr/bin/gce-nic-naming"
41 changes: 36 additions & 5 deletions src/usr/bin/gce-nic-naming
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,19 @@
#
# This script is used to generate the network device name for a given device.


# Input to test if a directory in the PCI tree exists under the device
declare PCI_TEST_ITEM="$1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needlessly confusing IMO, it isn't actually dynamic. All the cases are static, this only serves to obscure the behavior. Can you instead check for infiniband when "${DRIVER}" == "idpf"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The driver is not directly passed to the script, I thought this was more flexible for future cases but I can pass the driver instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


# Generate a instance specific tag for logging to journal
readonly LOG_TAG=${LOG_TAG:-"gce-nic-naming_$$"}

# Path to the PCI bus devices
declare PCI_BUS_PATH='/sys/bus/pci/devices'
declare SYS_PREPEND_PATH='/sys'
# 0x15b3:0x101e is the vendor and device ID for Mellanox CX7

readonly ETHERNET_DEVICES_VENDORS=('15b3:101e')
# 0x8086:0x145c is the vendor and device ID for Intel IDPF VF
readonly ETHERNET_DEVICES_VENDORS=('15b3:101e' '8086:145c')
# 0x10de is the vendor ID for Nvidia
readonly GPU_DEVICES_VENDORS=('10de' '10de')
# PCI BUS ID path is in the format of 0000:00:04.0
Expand Down Expand Up @@ -457,7 +461,14 @@ function generate_name() {
if [[ ${accelerator_devices[*]} == "" ]] ; then
if [[ " ${ETHERNET_DEVICES_VENDORS[*]} " =~ \
[[:space:]]${eth_device_vendor}[[:space:]] ]]; then
name_builder="rdma${eth_index}"

if [[ "${SUBSYSTEM}" == "net" ]] && [[ ! -d "${SYS_PREPEND_PATH}${DEVPATH}/device/${PCI_TEST_ITEM}" ]]; then
# If this is a VF device and not an RDMA the race condition is addressed by naming to position based name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the race condition, for readers that aren't google internal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

notice "Non RDMA NIC. Setting name to path name"
name_builder=$ID_NET_NAME_PATH
else
name_builder="rdma${eth_index}"
fi
else
error_and_exit "Device is not for intent based name: "\
"Device vendors: eth:${eth_device_vendor}"
Expand All @@ -478,8 +489,13 @@ function generate_name() {
echo ${name_builder}
}


# Give the ability to test the script without running the main logic
if [[ ! $TEST == 'test' ]]; then
if [[ ! $TEST == 'test' ]] && [[ ! "${GCE_NIC_NAMING}" == "disable" ]]; then

if [[ "$SUBSYSTEM" != "net" ]]; then
notice "Triggered for non Net Device"
fi
# Note can not use "" around ls path here or it errors out
list_devices ethernet_devices accelerator_devices "$(ls -d "${PCI_BUS_PATH}"/*)"

Expand All @@ -492,5 +508,20 @@ if [[ ! $TEST == 'test' ]]; then
determine_index_ratios
fi

generate_name "${DEVPATH}"
generated_name=$(generate_name "${DEVPATH}")

# handle the 2nd load order
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. net
  2. irdma
    right? Can you say this explicitly? Assumptions about load order should be documented for anyone using a custom kernel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if [[ "$SUBSYSTEM" == "net" ]]; then
echo ${generated_name}
else
dev_path="${SYS_PREPEND_PATH}${DEVPATH}/.."
# Check that device has loaded as a net dev before applying the name.
if [[ -d "$dev_path/net/" ]]; then
current_net_iface="$(ls $dev_path/net/)"
notice "Renaming iface ${current_net_iface} to ${generated_name}"
/usr/sbin/ip link set dev ${current_net_iface} down
/usr/sbin/ip link set dev ${current_net_iface} name ${generated_name}
/usr/sbin/ip link set dev ${generated_name} up
fi
fi
fi