Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

Add hacky iSCSI implementation #21

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Conversation

keszybz
Copy link
Member

@keszybz keszybz commented Nov 29, 2022

This reverts commit a5618aa.

@keszybz keszybz marked this pull request as draft December 3, 2022 09:59
@keszybz keszybz force-pushed the iscsi-support branch 2 times, most recently from c8513aa to 9eb7b6a Compare December 3, 2022 11:22
@keszybz
Copy link
Member Author

keszybz commented Dec 3, 2022

+ pgrep tgtd
+ /usr/sbin/tgtd
+ tgtadm --lld iscsi --op new --mode target --tid=1 --targetname iqn.2022-01.com.example:iscsi.initrd.test
tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected
Error: Process completed with exit code 107.

Hmm, this worked before.

@keszybz
Copy link
Member Author

keszybz commented Dec 3, 2022

[ 120.100615] iscsi-generator.sh[203]: iscsiadm: cannot make connection to 10.10.10.1: Network is unreachable

@mrc0mmand
Copy link
Member

+ pgrep tgtd
+ /usr/sbin/tgtd
+ tgtadm --lld iscsi --op new --mode target --tid=1 --targetname iqn.2022-01.com.example:iscsi.initrd.test
tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected
Error: Process completed with exit code 107.

Hmm, this worked before.

Yeah, that's a race - I worked around it locally by adding sleep after /usr/sbin/tgtd. There's no other way to avoid this (that I know of), even the systemd unit file in Fedora does this: https://src.fedoraproject.org/rpms/scsi-target-utils/blob/rawhide/f/tgtd.service#_10

@keszybz
Copy link
Member Author

keszybz commented Dec 3, 2022

So… NetworkManager-wait-online.service finishes successfully, and then tgtadm says:
[ 87.075055] iscsi-generator.sh[250]: iscsiadm: cannot make connection to 10.10.10.1: Network is unreachable

I don't think I want to deal with that. It seems that the tools for this are just so bad it doesn't make sense to try to make this work.

@mrc0mmand
Copy link
Member

So… NetworkManager-wait-online.service finishes successfully, and then tgtadm says: [ 87.075055] iscsi-generator.sh[250]: iscsiadm: cannot make connection to 10.10.10.1: Network is unreachable

I don't think I want to deal with that. It seems that the tools for this are just so bad it doesn't make sense to try to make this work.

I'll take a look, maybe I f-up the network setup somewhere along the way.

@mrc0mmand
Copy link
Member

Interesting. I tried this locally and it seems to working fine (meaning that the iSCSI device gets correctly discovered and the block device pops up). However, that was without KVM (only plain TCG), so maybe it's a timing issue. I'll try it with KVM next.

@mrc0mmand mrc0mmand mentioned this pull request Dec 3, 2022
@mrc0mmand
Copy link
Member

So, this is definitely a timing issue, but I'm not sure on which end, because:

[  OK  ] Finished NetworkManager-wait-online.service.
[  OK  ] Started NetworkManager-dispatcher.service.
[  OK  ] Reached target network-online.target.
         Starting initrd-iscsi-auto.service...
[   19.565110] iscsi-generator.sh[238]: iscsi:10.10.10.1::::iqn.2022-01.com.example:iscsi.initrd.test: will start server=10.10.10.1 port= lun= target=iqn.2022-01.com.example:iscsi.initrd.test
         Starting iscsi-init.service...
[  OK  ] Finished iscsi-init.service.
         Starting iscsid.service...
[   20.604910] iscsi-generator.sh[252]: iscsiadm: Could not scan /sys/class/iscsi_transport.
[   21.096906] Loading iSCSI transport class v2.0-870.
[  OK  ] Started iscsid.service.
[   21.605155] iscsi: registered transport (tcp)
[   21.155663] iscsi-generator.sh[252]: iscsiadm: cannot make connection to 10.10.10.1: Network is unreachable
[   22.033157] e1000 0000:00:03.0 eth0: (PCI:33MHz:32-bit) 52:54:00:12:34:56
[   22.035947] e1000 0000:00:03.0 eth0: Intel(R) PRO/1000 Network Connection
[   22.042602] e1000 0000:00:03.0 ens3: renamed from eth0
ens3: Processing udev action (SEQNUM=1730, ACTION=add)
ens3: Device busy: device is renaming
ens3: Device busy: device is renaming
ens3: Device busy: device is renaming
ens3: Processing udev action (SEQNUM=1736, ACTION=move)
sys-subsystem-net-devices-ens3.device: Changed dead -> plugged
sys-devices-pci0000:00-0000:00:03.0-net-ens3.device: Changed dead -> plugged
[   22.157171] iscsi-generator.sh[252]: iscsiadm: cannot make connection to 10.10.10.1: Network is unreachable
[   23.157414] iscsi-generator.sh[252]: iscsiadm: cannot make connection to 10.10.10.1: Network is unreachable
[   24.082504] e1000: ens3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[   24.086881] IPv6: ADDRCONF(NETDEV_CHANGE): ens3: link becomes ready
[   24.157753] iscsi-generator.sh[252]: iscsiadm: cannot make connection to 10.10.10.1: Network is unreachable
[   25.174272] iscsi-generator.sh[252]: 10.10.10.1:3260,1 iqn.2022-01.com.example:iscsi.initrd.test
[   25.663688] scsi host2: iSCSI Initiator over TCP/IP
[   25.255839] iscsid[255]: iscsid: Connection1:0 to [target: iqn.2022-01.com.example:iscsi.initrd.test, portal: 10.10.10.1,3260] through [iface: default] is operational now
[   25.723203] scsi 2:0:0:0: RAID              IET      Controller       0001 PQ: 0 ANSI: 5
[   25.731850] scsi 2:0:0:0: Attached scsi generic sg1 type 12
[   25.738314] scsi 2:0:0:1: Direct-Access     IET      VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
[   25.751858] sd 2:0:0:1: Attached scsi generic sg2 type 0
[   25.751952] sd 2:0:0:1: Power-on or device reset occurred
initrd-iscsi-auto.service: Deactivated successfully.

It looks like the network is fully configured after the network-online.target is reached?

@mrc0mmand
Copy link
Member

Yeah, if I add nmcli into the "iscsi generator", it shows:

         Starting initrd-iscsi-auto.service...
[   20.591050] iscsi-generator.sh[238]: iscsi:10.10.10.1::::iqn.2022-01.com.example:iscsi.initrd.test: will start server=10.10.10.1 port= lun= target=iqn.2022-01.com.example:iscsi.initrd.test
[   20.670206] iscsi-generator.sh[252]: lo: connected (externally) to lo
[   20.676492] iscsi-generator.sh[252]: 	"lo"
[   20.679673] iscsi-generator.sh[252]: 	loopback (unknown), 00:00:00:00:00:00, sw, mtu 65536
[   20.685995] iscsi-generator.sh[252]: 	inet4 127.0.0.1/8
[   20.690565] iscsi-generator.sh[252]: 	inet6 ::1/128
[   20.694661] iscsi-generator.sh[252]: 	route6 ::1/128 metric 256
[   20.699576] iscsi-generator.sh[252]: eth0: unmanaged
[   20.704250] iscsi-generator.sh[252]: 	"eth0"
[   20.708249] iscsi-generator.sh[252]: 	ethernet (e1000), 52:54:00:12:34:56, hw, mtu 1500
[   20.715470] iscsi-generator.sh[252]: Use "nmcli device show" to get complete information about known devices and
[   20.724134] iscsi-generator.sh[252]: "nmcli connection show" to get an overview on active connection profiles.
[   20.733834] iscsi-generator.sh[252]: Consult nmcli(1) and nmcli-examples(7) manual pages for complete usage details.
[   22.064037] e1000: ens3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[   22.068158] IPv6: ADDRCONF(NETDEV_CHANGE): ens3: link becomes ready
[   22.277530] Loading iSCSI transport class v2.0-870.
[   21.937465] iscsi-generator.sh[257]: iscsiadm: cannot make connection to 10.10.10.1: Network is unreachable
...
[   23.396786] iscsi-generator.sh[257]: 10.10.10.1:3260,1 iqn.2022-01.com.example:iscsi.initrd.test
[   24.929381] scsi host2: iSCSI Initiator over TCP/IP
[   24.436209] iscsid[261]: iscsid: Connection1:0 to [target: iqn.2022-01.com.example:iscsi.initrd.test, portal: 10.10.10.1,3260] through [iface: default] is operational now
[   24.976908] scsi 2:0:0:0: RAID              IET      Controller       0001 PQ: 0 ANSI: 5
[   24.990439] scsi 2:0:0:0: Attached scsi generic sg1 type 12
[   24.994702] scsi 2:0:0:1: Direct-Access     IET      VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
[   25.002228] sd 2:0:0:1: Power-on or device reset occurred
[   25.002948] sd 2:0:0:1: Attached scsi generic sg2 type 0
[   25.011642] sd 2:0:0:1: [sda] 6291536 512-byte logical blocks: (3.22 GB/3.00 GiB)
[   25.013804] sd 2:0:0:1: [sda] 4096-byte physical blocks
[   25.016221] sd 2:0:0:1: [sda] Write Protect is off

or even:

[  OK  ] Reached target network-online.target.
         Starting initrd-iscsi-auto.service...
[   26.933782] iscsi-generator.sh[240]: iscsi:10.10.10.1::::iqn.2022-01.com.example:iscsi.initrd.test: will start server=10.10.10.1 port= lun= target=iqn.2022-01.com.example:iscsi.initrd.test
[   27.004100] iscsi-generator.sh[254]: lo: connected (externally) to lo
[   27.011581] iscsi-generator.sh[254]: 	"lo"
[   27.012941] iscsi-generator.sh[254]: 	loopback (unknown), 00:00:00:00:00:00, sw, mtu 65536
[   27.017434] iscsi-generator.sh[254]: 	inet4 127.0.0.1/8
[   27.019071] iscsi-generator.sh[254]: 	inet6 ::1/128
[   27.030237] iscsi-generator.sh[254]: 	route6 ::1/128 metric 256
[   27.068183] iscsi-generator.sh[254]: Use "nmcli device show" to get complete information about known devices and
[   27.070696] iscsi-generator.sh[254]: "nmcli connection show" to get an overview on active connection profiles.
[   27.093223] iscsi-generator.sh[254]: Consult nmcli(1) and nmcli-examples(7) manual pages for complete usage details.
[   29.423881] e1000 0000:00:03.0 eth0: (PCI:33MHz:32-bit) 52:54:00:12:34:56
[   29.429596] e1000 0000:00:03.0 eth0: Intel(R) PRO/1000 Network Connection
[   29.439521] e1000 0000:00:03.0 ens3: renamed from eth0
[   29.447275] iscsi: registered transport (tcp)
[   29.047358] iscsi-generator.sh[258]: iscsiadm: cannot make connection to 10.10.10.1: Network is unreachable
sys-devices-pci0000:00-0000:00:03.0-net-ens3.device: Changed dead -> plugged
sys-subsystem-net-devices-ens3.device: Changed dead -> plugged
[   30.048928] iscsi-generator.sh[258]: iscsiadm: cannot make connection to 10.10.10.1: Network is unreachable
[   31.049107] iscsi-generator.sh[258]: iscsiadm: cannot make connection to 10.10.10.1: Network is unreachable
[   31.588895] e1000: ens3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[   31.601276] IPv6: ADDRCONF(NETDEV_CHANGE): ens3: link becomes ready
[   32.049762] iscsi-generator.sh[258]: iscsiadm: cannot make connection to 10.10.10.1: Network is unreachable
[   33.053191] iscsi-generator.sh[258]: 10.10.10.1:3260,1 iqn.2022-01.com.example:iscsi.initrd.test
[   33.470424] scsi host2: iSCSI Initiator over TCP/IP
[   33.480323] scsi 2:0:0:0: RAID              IET      Controller       0001 PQ: 0 ANSI: 5
[   33.075331] iscsid[261]: iscsid: Connection1:0 to [target: iqn.2022-01.com.example:iscsi.initrd.test, portal: 10.10.10.1,3260] through [iface: default] is operational now
[   33.501915] scsi 2:0:0:0: Attached scsi generic sg1 type 12
[   33.509323] scsi 2:0:0:1: Direct-Access     IET      VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
[   33.522489] sd 2:0:0:1: Power-on or device reset occurred
[   33.525692] sd 2:0:0:1: Attached scsi generic sg2 type 0
[   33.527355] sd 2:0:0:1: [sda] 6291536 512-byte logical blocks: (3.22 GB/3.00 GiB)
[   33.532696] sd 2:0:0:1: [sda] 4096-byte physical blocks
[   33.535061] sd 2:0:0:1: [sda] Write Protect is off
[   33.536942] sd 2:0:0:1: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
[   33.135617] iscsi-generator.sh[276]: Logging in to [iface: default, target: iqn.2022-01.com.example:iscsi.initrd.test, portal: 10.10.10.1,3260]
[   33.135696] iscsi-generator.sh[276]: Login to [iface: default, target: iqn.2022-01.com.example:iscsi.initrd.test, portal: 10.10.10.1,3260] successful.

i.e. NM doesn't even know about the ens3 interface when the network-online.target is reached. It yields the same result even if I change the kernel cmdline from ip=dhcp to ip=ens3:dhcp.

@mrc0mmand
Copy link
Member

And if I explicitly call nm-online in the generator, it seems to be working fine:

[  OK  ] Started NetworkManager-dispatcher.service.
[  OK  ] Finished NetworkManager-wait-online.service.
[  OK  ] Reached target network-online.target.
         Starting initrd-iscsi-auto.service...
[   21.582532] iscsi-generator.sh[243]: + for spec in "${specs[@]}"
[   21.584399] iscsi-generator.sh[246]: ++ cut -d: -f2
[   21.593227] iscsi-generator.sh[245]: ++ echo iscsi:10.10.10.1::::iqn.2022-01.com.example:iscsi.initrd.test
[   21.599950] iscsi-generator.sh[243]: + server=10.10.10.1
[   21.605214] iscsi-generator.sh[249]: ++ cut -d: -f4
[   21.613994] iscsi-generator.sh[248]: ++ echo iscsi:10.10.10.1::::iqn.2022-01.com.example:iscsi.initrd.test
[   21.622097] iscsi-generator.sh[243]: + port=
[   21.630042] iscsi-generator.sh[252]: ++ cut -d: -f5
[   21.646717] iscsi-generator.sh[251]: ++ echo iscsi:10.10.10.1::::iqn.2022-01.com.example:iscsi.initrd.test
[   21.649732] iscsi-generator.sh[243]: + lun=
[   21.655970] iscsi-generator.sh[255]: ++ cut -d: -f6-
[   21.665364] iscsi-generator.sh[254]: ++ echo iscsi:10.10.10.1::::iqn.2022-01.com.example:iscsi.initrd.test
[   22.018919] iscsi-generator.sh[243]: + target=iqn.2022-01.com.example:iscsi.initrd.test
[   22.019229] iscsi-generator.sh[243]: + echo 'iscsi:10.10.10.1::::iqn.2022-01.com.example:iscsi.initrd.test: will start server=10.10.10.1 port= lun= target=iqn.2022-01.com.example:iscsi.initrd.test'
[   22.020003] iscsi-generator.sh[243]: iscsi:10.10.10.1::::iqn.2022-01.com.example:iscsi.initrd.test: will start server=10.10.10.1 port= lun= target=iqn.2022-01.com.example:iscsi.initrd.test
[   22.020369] iscsi-generator.sh[243]: + nm-online
[   23.264805] e1000 0000:00:03.0 eth0: (PCI:33MHz:32-bit) 52:54:00:12:34:56
[   23.268915] e1000 0000:00:03.0 eth0: Intel(R) PRO/1000 Network Connection
[   23.277703] e1000 0000:00:03.0 ens3: renamed from eth0
ens3: Processing udev action (SEQNUM=1717, ACTION=add)
ens3: Device busy: device is renaming
ens3: Device busy: device is renaming
ens3: Device busy: device is renaming
ens3: Processing udev action (SEQNUM=1723, ACTION=move)
sys-subsystem-net-devices-ens3.device: Changed dead -> plugged
sys-devices-pci0000:00-0000:00:03.0-net-ens3.device: Changed dead -> plugged
[   25.348380] e1000: ens3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[   25.366887] IPv6: ADDRCONF(NETDEV_CHANGE): ens3: link becomes ready
Connecting...............   26s [online]
[   26.083727] iscsi-generator.sh[243]: + iscsiadm -m discovery -t st -p 10.10.10.1
...
[   27.657738] iscsi-generator.sh[267]: 10.10.10.1:3260,1 iqn.2022-01.com.example:iscsi.initrd.test
[   27.660282] iscsi-generator.sh[243]: + iscsiadm -m node -T iqn.2022-01.com.example:iscsi.initrd.test -l
[   28.041303] scsi host2: iSCSI Initiator over TCP/IP
[   27.672895] iscsid[271]: iscsid: Connection1:0 to [target: iqn.2022-01.com.example:iscsi.initrd.test, portal: 10.10.10.1,3260] through [iface: default] is operational now
[   28.061139] scsi 2:0:0:0: RAID              IET      Controller       0001 PQ: 0 ANSI: 5
[   28.069884] scsi 2:0:0:0: Attached scsi generic sg1 type 12
[   28.076081] scsi 2:0:0:1: Direct-Access     IET      VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
[   28.086057] sd 2:0:0:1: Attached scsi generic sg2 type 0
[   28.086535] sd 2:0:0:1: Power-on or device reset occurred
[   27.767460] iscsi-generator.sh[277]: Logging in to [iface: default, target: iqn.2022-01.com.example:iscsi.initrd.test, portal: 10.10.10.1,3260]
[   27.767513] iscsi-generator.sh[277]: Login to [iface: default, target: iqn.2022-01.com.example:iscsi.initrd.test, portal: 10.10.10.1,3260] successful.
[   28.287247] sd 2:0:0:1: [sda] 6291536 512-byte logical blocks: (3.22 GB/3.00 GiB)
[   28.288722] sd 2:0:0:1: [sda] 4096-byte physical blocks
[   28.290355] sd 2:0:0:1: [sda] Write Protect is off
[   28.292054] sd 2:0:0:1: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
[   28.309666]  sda: sda1
[   28.311890] sd 2:0:0:1: [sda] Attached SCSI disk
...

I think this happens due to the mode in which nm-online is called from the NetworkManager-wait-online.service, i.e. --wait-for-startup:

       -s | --wait-for-startup
           Wait for NetworkManager startup to complete, rather than waiting
           for network connectivity specifically. Startup is considered
           complete once NetworkManager has activated (or attempted to
           activate) every auto-activate connection which is available given
           the current network state. This corresponds to the moment when
           NetworkManager logs "startup complete". This mode is generally
           only useful at boot time. After startup has completed, nm-online
           -s will just return immediately, regardless of the current network
           state.

           There are various ways to affect when startup complete is reached.
           For details see NetworkManager-wait-online.service(8).

which means NM doesn't actually wait for the devices to come up during boot, it just probes them? 🤷

@mrc0mmand
Copy link
Member

which means NM doesn't actually wait for the devices to come up during boot, it just probes them? shrug

Ah, it looks like NM in dracut does a bit more magic, like calling nm-initrd-generator among other things: https://github.com/dracutdevs/dracut/blob/master/modules.d/35network-manager/nm-lib.sh which would explain the current behavior.

Looks like networking in initrd without dracut is going to be a lot of fun.

@mrc0mmand
Copy link
Member

With networkd the situations seems to be slightly better:

diff --git a/.github/workflows/build-fedora.sh b/.github/workflows/build-fedora.sh
index 2584e43..2e25d32 100755
--- a/.github/workflows/build-fedora.sh
+++ b/.github/workflows/build-fedora.sh
@@ -294,7 +294,7 @@ for phase in "${PHASES[@]}"; do
             INITRD="initrd_$KVER.cpio.zstd"
             mkosi --cache "$MKOSI_CACHE" \
                   --default fedora.mkosi \
-                  --package="NetworkManager,iscsi-initiator-utils" \
+                  --package="iscsi-initiator-utils,systemd-networkd" \
                   --extra-tree=mkosi.extra-iscsi \
                   --image-version="$KVER" \
                   --environment=KERNEL_VERSION="$KVER" \
@@ -334,7 +334,7 @@ for phase in "${PHASES[@]}"; do
             dnsmasq --interface=initrd0 --bind-interfaces --dhcp-range=10.10.10.10,10.10.10.100
             grep -q initrd0 /etc/qemu/bridge.conf || echo "allow initrd0" >>/etc/qemu/bridge.conf
 
-            iscsi_cmdline="ip=dhcp netroot=iscsi:10.10.10.1::::$target_name"
+            iscsi_cmdline="ip=dhcp netroot=iscsi:10.10.10.1::::$target_name rd.systemd.wants=systemd-networkd.service"
             timeout --foreground -k 10 5m \
                 qemu-kvm -m 1024 -smp "$(nproc)" -nographic -nic bridge,br=initrd0 \
                          -initrd "$INITRD" \
initrd-iscsi-auto.service: Passing 0 fds to service
initrd-iscsi-auto.service: About to execute /usr/lib/systemd/system-generators/iscsi-generator.sh --go
initrd-iscsi-auto.service: Forked /usr/lib/systemd/system-generators/iscsi-generator.sh as 221
initrd-iscsi-auto.service: Changed dead -> start[   18.162320] e1000: ens3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[   18.167202] IPv6: ADDRCONF(NETDEV_CHANGE): ens3: link becomes ready

         Starting initrd-iscsi-auto.service...
ens3: Processing udev action (SEQNUM=1720, ACTION=move)
sys-subsystem-net-devices-ens3.device: Changed dead -> plugged
sys-devices-pci0000:00-0000:00:03.0-net-ens3.device: Changed dead -> plugged
initrd-iscsi-auto.service: Executing: /usr/lib/systemd/system-generators/iscsi-generator.sh --go
[   17.790268] systemd-networkd[216]: sd-device: Failed to chase symlinks in "/sys/devices/pci0000:00/0000:00:03.0/physfn".
[   17.790380] systemd-networkd[216]: ens3: udev initialized link
[   17.795385] systemd-networkd[216]: ens3: State changed: pending -> initialized
...
[   18.149328] systemd-networkd[216]: ens3: DHCPv4 client is configured, acquiring DHCPv4 lease.
[   18.152605] systemd-networkd[216]: ens3: DHCPv6 client is configured.
[   18.155333] systemd-networkd[216]: ens3: IPv6 Router Discovery is configured.
[   18.158294] systemd-networkd[216]: ens3: DHCPv4 client: DISCOVER
[   18.160800] systemd-networkd[216]: Bus bus-api-network: changing state AUTHENTICATING → HELLO
...
[   18.847415] iscsi-generator.sh[226]: ++ cut -d: -f2
[   18.853958] iscsi-generator.sh[225]: ++ echo iscsi:10.10.10.1::::iqn.2022-01.com.example:iscsi.initrd.test
[   18.865111] iscsi-generator.sh[223]: + server=10.10.10.1
[   18.871423] iscsi-generator.sh[230]: ++ cut -d: -f4
[   18.876835] iscsi-generator.sh[229]: ++ echo iscsi:10.10.10.1::::iqn.2022-01.com.example:iscsi.initrd.test
[   18.888013] i[   19.293379] iscsi: registered transport (tcp)
scsi-generator.sh[223]: + port=
success'
[   18.900796] iscsi-generator.sh[232]: ++ echo iscsi:10.10.10.1::::iqn.2022-01.com.example:iscsi.initrd.test
[   18.911961] iscsi-generator.sh[233]: ++ cut -d: -f5
[   18.917423] iscsi-generator.sh[223]: + lun=
[   18.921915] iscsi-generator.sh[235]: ++ echo iscsi:10.10.10.1::::iqn.2022-01.com.example:iscsi.initrd.test
[   18.932636] iscsi-generator.sh[236]: ++ cut -d: -f6-
[   18.938476] iscsi-generator.sh[223]: + target=iqn.2022-01.com.example:iscsi.initrd.test
[   18.947715] iscsi-generator.sh[223]: + echo 'iscsi:10.10.10.1::::iqn.2022-01.com.example:iscsi.initrd.test: will start server=10.10.10.1 port= lun= target=iqn.2022-01.com.example:iscsi.initrd.test'
[   18.966821] iscsi-generator.sh[223]: iscsi:10.10.10.1::::iqn.2022-01.com.example:iscsi.initrd.test: will start server=10.10.10.1 port= lun= target=iqn.2022-01.com.example:iscsi.initrd.test
[   18.985094] iscsi-generator.sh[223]: + iscsiadm -m discovery -t st -p 10.10.10.1
[   19.126130] iscsi-generator.sh[237]: iscsiadm: cannot make connection to 10.10.10.1: Network is unreachable
[   19.747310] systemd-networkd[216]: ens3: NDISC: Sent Router Solicitation, next solicitation in 3s
[   19.903402] iscsi-generator.sh[237]: iscsiadm: cannot make connection to 10.10.10.1: Network is unreachable
[   20.806736] systemd-networkd[216]: ens3: DHCPv4 client: Received message from RAW socket, processing.
[   20.807162] systemd-networkd[216]: ens3: DHCPv4 client: OFFER
[   20.814039] systemd-networkd[216]: ens3: DHCPv4 client: REQUEST (requesting)
[   20.838959] systemd-networkd[216]: ens3: DHCPv4 client: Received message from RAW socket, processing.
[   20.839198] systemd-networkd[216]: ens3: DHCPv4 client: ACK
[   20.859215] systemd-networkd[216]: ens3: DHCPv4 client: lease expires in 59min 59s
[   20.868480] systemd-networkd[216]: ens3: DHCPv4 client: T2 expires in 52min 29s
[   20.877610] systemd-networkd[216]: ens3: DHCPv4 client: T1 expires in 29min 59s
[   20.877891] systemd-networkd[216]: Setting transient hostname: 'fedora'
e=6 reply_cookie=0 signature=sb error-name=n/a error-message=n/a
[   20.885667] systemd-networkd[216]: ens3: DHCPv4 address 10.10.10.97/24, gateway 10.10.10.1 acquired from 10.10.10.1
...
[   21.007260] systemd-networkd[216]: ens3: State changed: configuring -> configured
[   22.611539] iscsi-generator.sh[237]: 10.10.10.1:3260,1 iqn.2022-01.com.example:iscsi.initrd.test
[   22.614762] iscsi-generator.sh[223]: + iscsiadm -m node -T iqn.2022-01.com.example:iscsi.initrd.test -l
[   22.790558] iscsid[240]: iscsid: Connection1:0 to [target: iqn.2022-01.com.example:iscsi.initrd.test, portal: 10.10.10.1,3260] through [iface: default] is operational now
[   23.214489] scsi 2:0:0:0: RAID              IET      Controller       0001 PQ: 0 ANSI: 5
[   23.228745] scsi 2:0:0:0: Attached scsi generic sg1 type 12
[   23.236288] scsi 2:0:0:1: Direct-Access     IET      VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
[   23.249268] sd 2:0:0:1: Attached scsi generic sg2 type 0
[   23.255336] sd 2:0:0:1: Power-on or device reset occurred
[   22.864001] iscsi-generator.sh[245]: Logging in to [iface: default, target: iqn.2022-01.com.example:iscsi.initrd.test, portal: 10.10.10.1,3260]
[   22.871820] iscsi-generator.sh[245]: Login to [iface: default, target: iqn.2022-01.com.example:iscsi.initrd.test, portal: 10.10.10.1,3260] successful.
[  OK  ] Finished initrd-iscsi-auto.service.
[   23.518405] sd 2:0:0:1: [sda] 6291536 512-byte logical blocks: (3.22 GB/3.00 GiB)
[   23.527166] sd 2:0:0:1: [sda] 4096-byte physical blocks
[   23.535131] sd 2:0:0:1: [sda] Write Protect is off

but it still seems to start the initrd-iscsi-auto.service unit before the network stack is fully up?

Note: I use the term works quite loosely here, since the machine won't boot, as the iSCSI stack gets disassembled during switch root:

[  OK  ] Found device dev-disk-by\x2dlabel-root.device.
[  OK  ] Reached target initrd-root-device.target.
         Mounting sysroot.mount...
[   25.346245] BTRFS info (device sda1): using crc32c (crc32c-generic) checksum algorithm
[   25.349134] BTRFS info (device sda1): using free space tree
[  OK  ] Mounted sysroot.mount.
[  OK  ] Reached target initrd-root-fs.target.
...
[  OK  ] Reached target initrd.target.
         Starting initrd-cleanup.service...
iscsi-shutdown.service: Installed new job iscsi-shutdown.service/stop as 316
...
iscsi-shutdown.service: Will spawn child (service_enter_stop): /usr/sbin/iscsiadm
Failed to read pids.max attribute of root cgroup, ignoring: No data available
iscsi-shutdown.service: About to execute /usr/sbin/iscsiadm -m node --logoutall=all
iscsi-shutdown.service: Forked /usr/sbin/iscsiadm as 278
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/iscsi_2dshutdown_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=361 reply_cookie=0 s
ignature=sa{sv}as error-name=n/a error-message=n/a
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/iscsi_2dshutdown_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=362 reply_cookie=0 s
ignature=sa{sv}as error-name=n/a error-message=n/a
iscsi-shutdown.service: Changed exited -> stop
         Stopping iscsi-shutdown.service...
iscsid.service: stopping held back, waiting for: iscsi-shutdown.service
iscsi-shutdown.service: Executing: /usr/sbin/iscsiadm -m node --logoutall=all
...
[   35.085766] device offline error, dev sda, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 2
[   35.091928] device offline error, dev sda, sector 40 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 2
[   35.101201] device offline error, dev sda, sector 140136 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 2
...
[  OK  ] Stopped target basic.target.
iscsi-shutdown.service: Got final SIGCHLD for state stop.
iscsi-shutdown.service: Deactivated successfully.
iscsi-shutdown.service: Service restart not allowed.
iscsi-shutdown.service: Changed stop -> dead
iscsi-shutdown.service: Job 316 iscsi-shutdown.service/stop finished, result=done
[  OK  ] Stopped iscsi-shutdown.service.
[   38.257796] iscsid[240]: iscsid shutting down.
[  OK  ] Stopped iscsid.service.
[  OK  ] Closed iscsid.socket.
[  OK  ] Unmounted sysroot.mount.
         Starting initrd-switch-root.service...
[   40.248660] systemctl[291]: Switching root - root: /sysroot; init: n/a
[   40.356778] systemctl[291]: Failed to switch root: Specified switch root path '/sysroot' does not seem to be an OS tree. os-release file is missing.
initrd-switch-root.service: Main process exited, code=exited, status=1/FAILURE
initrd-switch-root.service: Failed with result 'exit-code'.
[FAILED] Failed to start initrd-switch-root.service.

@keszybz
Copy link
Member Author

keszybz commented Dec 5, 2022

Thanks for looking into this. It is indeed a mess.

but it still seems to start the initrd-iscsi-auto.service unit before the network stack is fully up?

rd.systemd.wants=systemd-networkd.service just pulls in systemd-networkd.service, but it does not pull in systemd-networkd-wait-online.service. The unit pulls in network-online.target, but that doesn't do anything unless the implementing service is enabled. I think we'd need to add rd.systemd.wants=systemd-networkd-wait-online.service too.

iscsi-shutdown.service

Hmm, so this tries to kill the iscsi connection. We could mask this, but iscsid.service has Requires on it, so it'd fail. I think we would need to neuter it by replacing the ExecStop line.

@keszybz keszybz changed the title Revert "ci: temporarily disable the iSCSI test case" Add hacky iSCSI implementation Dec 6, 2022
@keszybz
Copy link
Member Author

keszybz commented Dec 6, 2022

systemd 252.2-591.fc38 running in system mode (+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP -GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN -IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +BPF_FRAMEWORK +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Detected virtualization qemu.
Detected architecture x86-64.
Detected first boot.

Welcome to Fedora Linux 38 (Rawhide Prerelease)!

Initializing machine ID from random generator.
[  139.965227] kauditd_printk_skb: 15 callbacks suppressed
[  139.971139] audit: type=1334 audit(1670335689.133:66): prog-id=19 op=LOAD
[  139.983579] audit: type=1334 audit(1670335689.135:67): prog-id=0 op=UNLOAD
[  139.984304] audit: type=1334 audit(1670335689.147:68): prog-id=20 op=LOAD
[  139.984979] audit: type=1334 audit(1670335689.148:69): prog-id=0 op=UNLOAD
Failed to open libbpf, cgroup BPF features disabled: Operation not supported
Failed to populate /etc with preset unit settings, ignoring: No such file or directory
initrd-switch-root.service: Current command vanished from the unit file, execution of the command list won't be resumed.

I tried to figure out where the error about presets comes from. But it doesn't reproduce when I can the rootfs.img via nspawn. It's annoying because unit_file_preset_all calls quite a lot of things without any logging.

@keszybz keszybz force-pushed the iscsi-support branch 2 times, most recently from 7a7206a to 0edb254 Compare December 6, 2022 16:16
keszybz and others added 3 commits December 6, 2022 17:40
This is not indended as a real solution. The generator should be moved to the
open-iscsi project. It is added here to show how things should be done using a
systemd generator and an initrd unit. The generator does not handle all the
initricasies of iscsi.
It seems that NM doesn't actually wait for the devices to come up
during boot, it just probes them. NM in dracut does a bit more magic,
like calling nm-initrd-generator among other things:
https://github.com/dracutdevs/dracut/blob/master/modules.d/35network-manager/nm-lib.sh.
We were failing like this:
2022-12-05T14:03:20.0045333Z Syncing filesystems and block devices.
2022-12-05T14:03:20.0521424Z Sending SIGTERM to remaining processes...
2022-12-05T14:03:20.1942427Z Sending SIGKILL to remaining processes...
2022-12-05T14:03:20.3466668Z Unmounting file systems.
2022-12-05T14:03:20.3688161Z All filesystems unmounted.
2022-12-05T14:03:20.3700551Z Deactivating swaps.
2022-12-05T14:03:20.3722192Z All swaps deactivated.
2022-12-05T14:03:20.3729382Z Detaching loop devices.
2022-12-05T14:03:20.3863371Z All loop devices detached.
2022-12-05T14:03:20.3880513Z Stopping MD devices.
2022-12-05T14:03:20.3945671Z All MD devices stopped.
2022-12-05T14:03:20.3968579Z Detaching DM devices.
2022-12-05T14:03:20.4043233Z All DM devices detached.
2022-12-05T14:03:20.4103781Z All filesystems, swaps, loop devices, MD devices and DM devices detached.
2022-12-05T14:03:20.5347085Z Syncing filesystems and block devices.
2022-12-05T14:03:20.5761222Z Powering off.
2022-12-05T14:03:20.5828227Z [  158.519517] kvm: exiting hardware virtualization
2022-12-05T14:03:20.5901373Z [  158.527392] sd 2:0:0:1: [sda] Synchronizing SCSI cache
2022-12-05T14:03:25.1142225Z [  163.054446] ACPI: PM: Preparing to enter system sleep state S5
2022-12-05T14:03:25.1183925Z [  163.058916] reboot: Power down
2022-12-05T14:03:25.1607472Z + tgtadm --lld iscsi --op delete --mode target --tid=1
2022-12-05T14:03:25.1614934Z tgtadm: this target is still active
2022-12-05T14:03:25.1666245Z ##[error]Process completed with exit code 22.

We want iscsi-shutdown.service to disable the device for us.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants