Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v255 batch #403

Merged
merged 68 commits into from
May 27, 2024
Merged

v255 batch #403

merged 68 commits into from
May 27, 2024

Conversation

bluca
Copy link
Member

@bluca bluca commented May 26, 2024

No description provided.

YHNdnzj and others added 30 commits May 26, 2024 12:11
… is missing

Currently, SLEEP_NOT_ENOUGH_SWAP_SPACE (ENOSPC) is returned
on all sorts of error conditions. But one important case
that's worth differentiating from that is when the resume device
is manually specified yet missing.

Closes #32644

(cherry picked from commit 40eb83a)
Otherwise we might fail if PID 1 is currently accessing these files.

Fixes #32692 (hopefully)

(cherry picked from commit 65690de)
This can change between the call to homectl inspect and userdbctl
user so let's ignore it along with the other disk fields.

Fixes #32727

(cherry picked from commit 6c5d4f0)
This fixes build with old toolchains prior to Linux < 4.2 which do not
have a definition for NFPROTO_NETDEV.

(cherry picked from commit 41a94ae)
…acquired prefix

Previously, even if a DNS server is in the acquired prefix, the route to the
server might have gateway address.
This makes the prefix route, which is always configured, is also handled
as same as static routes, and do not use any gateway if the prefix route
is the most suitable route to access the destination.
The same change is also applied to route to NTP servers and semi-static
routes.

Fixes a regression introduced by 0ce86f5.

Fixes #32715.

(cherry picked from commit 0f3116f)
Also this makes several checks more strict.

(cherry picked from commit 24e3792)
This should be useful when the test run as a service, e.g.
running on a mkosi image.

(cherry picked from commit e92d7b7)
This adds checks for the kernel bug caused by
torvalds/linux@3ddc223,
it will be fixed by
https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/

(cherry picked from commit d22f2fb)
Follow-up for 9de324c.

(cherry picked from commit a937fa9)
The state might be "freezing-by-parent" as well so let's take that
into account.

Fixes #32746

(cherry picked from commit 034e85c)
… destroy a curl context on exit

If we destroy both an event loop and a curl contect object at the same
time, then we get into this weird situation where curl wants us to
reconfigure a timout event source right before destruction, which
sd-event will refuse however, since it is already being shutdown.

Hence, catch that and simply don't bother adjusting the timeout, since
we cannot get back from there anyway.

(cherry picked from commit c5ecf09)
The test-event test seems to be taking quite a bit more time than
the other 'simple tests', which usually complete in < 1s. In case
of a slower or loaded machine the default 30s timeout is not enough.

(cherry picked from commit 381c3b6)
We want to eanble running tests as part of the build, but
our builds run in VMs with networking disabled.

(cherry picked from commit 19614a0)
If tests are run during build time, without an already installed
systemd they fail to resolve the sysusersdir and tpmfilesdir pkg-config variables.

(cherry picked from commit 2aee829)
.osrel is also optional, but sd-boot and bootctl requires it.
So, let's keep .osrel section at least now.

Fixes #32774.

(cherry picked from commit 2e93331)
Otherwise we log a noisy error when we get ECONNRESET.

(cherry picked from commit 2540036)
Avoid regressions like systemd/systemd#32856

Follow-up for 2ef7cdc

(cherry picked from commit 88e7911)
Previously, one of the test route has the same address in destination
and gateway. Even it is a test case, that's super spurious. Let's use a
different address.

(cherry picked from commit cd65075)
(cherry picked from commit 5573263)
Fixes #32695.

(cherry picked from commit 71f0487)
Otherwise, expected lines may not be processed or not sync()ed to disk.

Fixes #32712.

(cherry picked from commit c22a112)
Fixes #32731.

(cherry picked from commit 272aae3)
Fixes #32697.

(cherry picked from commit 0664c1c)
@bluca bluca requested a review from keszybz May 26, 2024 12:15
yuwata and others added 25 commits May 26, 2024 14:05
Due to the bug in kernel 6.9 caused by
torvalds/linux@8debcf5,
the net_id udev builtin does not work for netdevsim interface.
So, eni99np1 cannot be used with kernel 6.9 anymore.

Workaround for #32910.

(cherry picked from commit f1f1be7)
Makes it easier to switch for debuggin

(cherry picked from commit 5002b57)
Helped track down issue with session tracking

(cherry picked from commit c275e01)
When running inside an LXC container the 'su' process will not be part of
any unit or slice.

manager_get_user_by_pid() which was used until v255 (included) does not fail
if it cannot find a unit/slice, but simply returns 'not found'. Do the same
in manager_get_session_by_pidref().

This was not detected as Semaphore CI does not reboot the testbed before
the logind test, so the session is started by the old logind from the base
distro, instead of the one being tested.

Follow-up for 8494f56
Follow-up for 5099a50

Fixes systemd/systemd#32929

(cherry picked from commit eb56b56)
Otherwise, journal entries comes during sleep may not be read.

Follow-up for c22a112.

(cherry picked from commit 123acb2)
Fixes #32936.

(cherry picked from commit 125cca1)
Coverity gets confused since the iterator change, so add an
assert to indicate that this is allocated if n_old_groups is > 0

CID#1545922

Follow-up for 125cca1

(cherry picked from commit 5e30e6e)
Follow-up for ade0789

The change in behavior was partly intentional, as I think
if both --wait and --pty are used, manually disconnecting
from PTY forwarder should not result in systemd-run exiting
with "Finished with ..." log. But we should check for
--wait here.

Closes #32953

(cherry picked from commit 2b4a691)
Fixes systemd/systemd#32680 (comment).
===
May 21 02:45:08 TEST-74-AUX-UTILS.sh[2475]: + mountpoint /tmp/tmp.eaRV7lSbX2/mnt
May 21 02:45:08 TEST-74-AUX-UTILS.sh[2476]: /tmp/tmp.eaRV7lSbX2/mnt is not a mountpoint
May 21 02:45:08 TEST-74-AUX-UTILS.sh[2449]: + systemd-mount /dev/loop0 /tmp/tmp.eaRV7lSbX2/mnt
May 21 02:45:08 systemd-mount[2477]: Failed to start transient mount unit: Unit tmp-tmp.eaRV7lSbX2-mnt.mount was already loaded or has a fragment file.
===

(cherry picked from commit 4a8ca3c)
Otherwise, when stopping the service, the last command may not be
started yet, and the service manager may not send SIGTERM signal to the
last command, but send SIGKILL on timeout.

===
May 21 08:23:24 test19-exit-cgroup.sh[437]: + disown
May 21 08:23:24 test19-exit-cgroup.sh[438]: + sleep infinity
May 21 08:23:24 test19-exit-cgroup.sh[437]: + systemd-notify --ready
May 21 08:23:24 test19-exit-cgroup.sh[437]: + sleep infinity
May 21 08:23:24 test19-exit-cgroup.sh[441]: + systemctl stop one
May 21 08:23:24 test19-exit-cgroup.sh[443]: + sleep infinity
(snip)
May 21 08:23:24 systemd[1]: one.service: Changed running -> stop-sigterm
May 21 08:23:24 systemd[1]: Stopping one.service - /tmp/test19-exit-cgroup.sh "systemctl stop one"...
May 21 08:23:24 systemd[1]: Received SIGCHLD from PID 441 (systemctl).
May 21 08:23:24 systemd[1]: Child 437 (bash) died (code=killed, status=15/TERM)
May 21 08:23:24 systemd[1]: one.service: Child 437 belongs to one.service.
May 21 08:23:24 systemd[1]: one.service: Main process exited, code=killed, status=15/TERM (success)
May 21 08:23:24 systemd[1]: Child 439 (bash) died (code=killed, status=15/TERM)
May 21 08:23:24 systemd[1]: one.service: Child 439 belongs to one.service.
May 21 08:23:24 systemd[1]: Child 441 (systemctl) died (code=killed, status=15/TERM)
May 21 08:23:24 systemd[1]: one.service: Child 441 belongs to one.service.
May 21 08:23:24 systemd[1]: Child 442 (bash) died (code=killed, status=15/TERM)
May 21 08:23:24 systemd[1]: one.service: Child 442 belongs to one.service.
(snip)
May 21 08:24:54 systemd[1]: one.service: State 'stop-sigterm' timed out. Killing.
May 21 08:24:54 systemd[1]: one.service: Killing process 443 (sleep) with signal SIGKILL.
May 21 08:24:54 systemd[1]: one.service: Changed stop-sigterm -> stop-sigkill
May 21 08:24:54 systemd[1]: Received SIGCHLD from PID 443 (sleep).
May 21 08:24:54 systemd[1]: Child 443 (sleep) died (code=killed, status=9/KILL)
May 21 08:24:54 systemd[1]: one.service: Child 443 belongs to one.service.
May 21 08:24:54 systemd[1]: one.service: Control group is empty.
May 21 08:24:54 systemd[1]: one.service: Failed with result 'timeout'.
May 21 08:24:54 systemd[1]: one.service: Service restart not allowed.
May 21 08:24:54 systemd[1]: one.service: Changed stop-sigkill -> failed
May 21 08:24:54 systemd[1]: one.service: Job 738 one.service/stop finished, result=done
May 21 08:24:54 systemd[1]: Stopped one.service - /tmp/test19-exit-cgroup.sh "systemctl stop one".
May 21 08:24:54 systemd[1]: one.service: Unit entered failed state.
May 21 08:24:54 systemd[1]: one.service: Releasing resources...
===

Fixes #32947.

(cherry picked from commit a5edb9b)
On running cryptsetup, udevd detects two inotify events for the
underlying device. Running the test on enough fast host, the expected
symlinks based on UUID and disk label are created by the second event.

During processing a uevent for a device, udevd disables the inotify
watch for the device. If the test runs on slow system, the second
inotify event may comes during a udev worker processing the synthesized
uevent triggered by the first inotify event. Hence, no synthesized
uevent for the second inotify event will be generated, and the expected
symlinks will be never created.

To prevent the issue, we need to lock the device during cryptsetup
command is running.

Fixes #32913.

(cherry picked from commit be43c9b)
Follow-up for a610ba0.

Fixes #32890.

(cherry picked from commit 87ed87e)
As per the documentation, EACCES is only returned when F_SETLK is
used, and only on some platforms, which doesn't seem to include
Linux:

https://github.com/torvalds/linux/blob/master/fs/locks.c

F_OFD_SETLK is documented to only return EAGAIN, and F_SETLKW/F_OFD_SETLKW
are blocking operations so this logic doesn't apply to them in the
first place.

Hence, only automatically convert EACCES into EAGAIN for F_SETLK
operations, and propagate the original error in the other cases.

This is important because in some cases we catch permission errors
and gracefully fallback, which is not possible if the original error
is lost.

This is an issue in practice because, due to a kernel bug present
before v6.2, AppArmor denies locking on file descriptors to LXC
containers. We support all currently maintained LTS kernels,
including v6.1, where despite a lot of effort and attempts over almost
a year, the bugfix still hasn't been backported, as it is complex and
requires large changes to AppArmor.
On affected kernels, all services running with PrivateNetwork=yes
fail and do not recover, instead of the normal behaviour of gracefully
downgrading to PrivateNetwork=no.

The integration tests in the Debian CI fail due to this issue:

https://ci.debian.net/packages/s/systemd/testing/arm64/46828037/
(cherry picked from commit 06384eb)
When running in LXC with AppArmor we'll most likely get an error when creating
a network namespace due to a kernel regression in < v6.2 affecting AppArmor,
resulting in denials. Like other tests, avoid failing in case of permission
issues and handle it gracefully.

(cherry picked from commit 6ab21f2)
We want to avoid reinitialization of our global variables with static
storage duration in case we get dlopened multiple times by the same
application. This will avoid potential resource leaks that could have
happened otherwise (e.g. leaking journal socket fd).

(cherry picked from commit 9d8533b)
Before:
/etc/kernel/install.conf:6: Unknown key name 'asdf' in section '(null)', ignoring.
After:
/etc/kernel/install.conf:6: Unknown key 'asdf', ignoring.

Also make the message a bit better.

(cherry picked from commit 600a740)
So, we need to try to read timezone several times.
Also, on failure, show journal of timedated instead of hostnamed,
as the timezone is handled by timedated.

Hopefully fixes #33007.

(cherry picked from commit 1ef586a)
@keszybz
Copy link
Member

keszybz commented May 27, 2024

CI failures appear unrelated.

@keszybz keszybz merged commit 41fb19e into systemd:v255-stable May 27, 2024
42 of 44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

10 participants