-
-
Notifications
You must be signed in to change notification settings - Fork 4
/
NEWS
16395 lines (13326 loc) · 871 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
systemd System and Service Manager
CHANGES WITH 254:
Announcements of Future Feature Removals and Incompatible Changes:
* The next release (v255) will remove support for split-usr (/usr/
mounted separately during late boot, instead of being mounted by the
initrd before switching to the rootfs) and unmerged-usr (parallel
directories /bin/ and /usr/bin/, /lib/ and /usr/lib/, …). For more
details, see:
https://lists.freedesktop.org/archives/systemd-devel/2022-September/048352.html
* We intend to remove cgroup v1 support from a systemd release after
the end of 2023. If you run services that make explicit use of
cgroup v1 features (i.e. the "legacy hierarchy" with separate
hierarchies for each controller), please implement compatibility with
cgroup v2 (i.e. the "unified hierarchy") sooner rather than later.
Most of Linux userspace has been ported over already.
* Support for System V service scripts is now deprecated and will be
removed in a future release. Please make sure to update your software
*now* to include a native systemd unit file instead of a legacy
System V script to retain compatibility with future systemd releases.
* Support for the SystemdOptions EFI variable is deprecated.
'bootctl systemd-efi-options' will emit a warning when used. It seems
that this feature is little-used and it is better to use alternative
approaches like credentials and confexts. The plan is to drop support
altogether at a later point, but this might be revisited based on
user feedback.
* EnvironmentFile= now treats the line following a comment line
trailing with escape as a non comment line. For details, see:
https://github.com/systemd/systemd/issues/27975
* PrivateNetwork=yes and NetworkNamespacePath= now imply
PrivateMounts=yes unless PrivateMounts=no is explicitly specified.
* Behaviour of sandboxing options for the per-user service manager
units has changed. They now imply PrivateUsers=yes, which means user
namespaces will be implicitly enabled when a sandboxing option is
enabled in a user unit. Enabling user namespaces has the drawback
that system users will no longer be visible (and processes/files will
appear as owned by 'nobody') in the user unit.
By definition a sandboxed user unit should run with reduced
privileges, so impact should be small. This will remove a great
source of confusion that has been reported by users over the years,
due to how these options require an extra setting to be manually
enabled when used in the per-user service manager, which is not
needed in the system service manager. For more details, see:
https://lists.freedesktop.org/archives/systemd-devel/2022-December/048682.html
* systemd-run's switch --expand-environment= which currently is disabled
by default when combined with --scope, will be changed in a future
release to be enabled by default.
Security Relevant Changes:
* pam_systemd will now by default pass the CAP_WAKE_ALARM ambient
process capability to invoked session processes of regular users on
local seats (as well as to systemd --user), unless configured
otherwise via data from JSON user records, or via the PAM module's
parameter list. This is useful in order allow desktop tools such as
GNOME's Alarm Clock application to set a timer for
CLOCK_REALTIME_ALARM that wakes up the system when it elapses. A
per-user service unit file may thus use AmbientCapability= to pass
the capability to invoked processes. Note that this capability is
relatively narrow in focus (in particular compared to other process
capabilities such as CAP_SYS_ADMIN) and we already — by default —
permit more impactful operations such as system suspend to local
users.
Service Manager:
* Memory limits that apply while the unit is activating are now
supported. Previously IO and CPU settings were already supported via
StartupCPUWeight= and similar. The same logic has been added for the
various manager and unit memory settings (DefaultStartupMemoryLow=,
StartupMemoryLow=, StartupMemoryHigh=, StartupMemoryMax=,
StartupMemorySwapMax=, StartupMemoryZSwapMax=).
* The service manager gained support for enqueuing POSIX signals to
services that carry an additional integer value, exposing the
sigqueue() system call. This is accessible via new D-Bus calls
org.freedesktop.systemd1.Manager.QueueSignalUnit() and
org.freedesktop.systemd1.Unit.QueueSignal(), as well as in systemctl
via the new --kill-value= option.
* systemctl gained a new "list-paths" verb, which shows all currently
active .path units, similarly to how "systemctl list-timers" shows
active timers, and "systemctl list-sockets" shows active sockets.
* systemctl gained a new --when= switch which is honoured by the various
forms of shutdown (i.e. reboot, kexec, poweroff, halt) and allows
scheduling these operations by time, similar in fashion to how this
has been supported by SysV shutdown.
* If MemoryDenyWriteExecute= is enabled for a service and the kernel
supports the new PR_SET_MDWE prctl() call, it is used instead of the
seccomp()-based system call filter to achieve the same effect.
* A new set of kernel command line options is now understood:
systemd.tty.term.<name>=, systemd.tty.rows.<name>=,
systemd.tty.columns.<name>= allow configuring the TTY type and
dimensions for the tty specified via <name>. When systemd invokes a
service on a tty (via TTYName=) it will look for these and configure
the TTY accordingly. This is particularly useful in VM environments
to propagate host terminal settings into the appropriate TTYs of the
guest.
* A new RootEphemeral= setting is now understood in service units. It
takes a boolean argument. If enabled for services that use RootImage=
or RootDirectory= an ephemeral copy of the disk image or directory
tree is made when the service is started. It is removed automatically
when the service is stopped. That ephemeral copy is made using
btrfs/xfs reflinks or btrfs snapshots, if available.
* The service activation logic gained new settings RestartSteps= and
RestartMaxDelaySec= which allow exponentially-growing restart
intervals for Restart=.
* The service activation logic gained a new setting RestartMode= which
can be set to 'direct' to skip the inactive/failed states when
restarting, so that dependent units are not notified until the service
converges to a final (successful or failed) state. For example, this
means that OnSuccess=/OnFailure= units will not be triggered until the
service state has converged.
* PID 1 will now automatically load the virtio_console kernel module
during early initialization if running in a suitable VM. This is done
so that early-boot logging can be written to the console if available.
* Similarly, virtio-vsock support is loaded early in suitable VM
environments. PID 1 will send sd_notify() notifications via AF_VSOCK
to the VMM if configured, thus loading this early is beneficial.
* A new verb "fdstore" has been added to systemd-analyze to show the
current contents of the file descriptor store of a unit. This is
backed by a new D-Bus call DumpUnitFileDescriptorStore() provided by
the service manager.
* The service manager will now set a new $FDSTORE environment variable
when invoking processes for services that have the file descriptor
store enabled.
* A new service option FileDescriptorStorePreserve= has been added that
allows tuning the life-cycle of the per-service file descriptor
store. If set to "yes", the entries in the fd store are retained even
after the service has been fully stopped.
* The "systemctl clean" command may now be used to clear the fdstore of
a service.
* Unit *.preset files gained a new directive "ignore", in addition to
the existing "enable" and "disable". As the name suggests, matching
units are left unchanged, i.e. neither enabled nor disabled.
* Service units gained a new setting DelegateSubgroup=. It takes the
name of a sub-cgroup to place any processes the service manager forks
off in. Previously, the service manager would place all service
processes directly in the top-level cgroup it created for the
service. This usually meant that main process in a service with
delegation enabled would first have to create a subgroup and move
itself down into it, in order to not conflict with the "no processes
in inner cgroups" rule of cgroup v2. With this option, this step is
now handled by PID 1.
* The service manager will now look for .upholds/ directories,
similarly to the existing support for .wants/ and .requires/
directories. Symlinks in this directory result in Upholds=
dependencies.
The [Install] section of unit files gained support for a new
UpheldBy= directive to generate .upholds/ symlinks automatically when
a unit is enabled.
* The service manager now supports a new kernel command line option
systemd.default_device_timeout_sec=, which may be used to override
the default timeout for .device units.
* A new "soft-reboot" mechanism has been added to the service manager.
A "soft reboot" is similar to a regular reboot, except that it
affects userspace only: the service manager shuts down any running
services and other units, then optionally switches into a new root
file system (mounted to /run/nextroot/), and then passes control to a
systemd instance in the new file system which then starts the system
up again. The kernel is not rebooted and neither is the hardware,
firmware or boot loader. This provides a fast, lightweight mechanism
to quickly reset or update userspace, without the latency that a full
system reset involves. Moreover, open file descriptors may be passed
across the soft reboot into the new system where they will be passed
back to the originating services. This allows pinning resources
across the reboot, thus minimizing grey-out time further. This new
reboot mechanism is accessible via the new "systemctl soft-reboot"
command.
* Services using RootDirectory= or RootImage= will now have read-only
access to a copy of the host's os-release file under
/run/host/os-release, which will be kept up-to-date on 'soft-reboot'.
This was already the case for Portable Services, and the feature has
now been extended to all services that do not run off the host's
root filesystem.
* A new service setting MemoryKSM= has been added to enable kernel
same-page merging individually for services.
* A new service setting ImportCredentials= has been added that augments
LoadCredential= and LoadCredentialEncrypted= and searches for
credentials to import from the system, and supports globbing.
* A new job mode "restart-dependencies" has been added to the service
manager (exposed via systemctl --job-mode=). It is only valid when
used with "start" jobs, and has the effect that the "start" job will
be propagated as "restart" jobs to currently running units that have
a BindsTo= or Requires= dependency on the started unit.
* A new verb "whoami" has been added to "systemctl" which determines as
part of which unit the command is being invoked. It writes the unit
name to standard output. If one or more PIDs are specified reports
the unit names the processes referenced by the PIDs belong to.
* The system and service credential logic has been improved: there's
now a clearly defined place where system provisioning tools running
in the initrd can place credentials that will be imported into the
system's set of credentials during the initrd → host transition: the
/run/credentials/@initrd/ directory. Once the credentials placed
there are imported into the system credential set they are deleted
from this directory, and the directory itself is deleted afterwards
too.
* A new kernel command line option systemd.set_credential_binary= has
been added, that is similar to the pre-existing
systemd.set_credential= but accepts arbitrary binary credential data,
encoded in Base64. Note that the kernel command line is not a
recommend way to transfer credentials into a system, since it is
world-readable from userspace.
* The default machine ID to use may now be configured via the
system.machine_id system credential. It will only be used if no
machine ID was set yet on the host.
* On Linux kernel 6.4 and newer system and service credentials will now
be placed in a tmpfs instance that has the "noswap" mount option
set. Previously, a "ramfs" instance was used. By switching to tmpfs
ACL support and overall size limits can now be enforced, without
compromising on security, as the memory is never paged out either
way.
* The service manager now can detect when it is running in a
'Confidential Virtual Machine', and a corresponding 'cvm' value is now
accepted by ConditionSecurity= for units that want to conditionalize
themselves on this. systemd-detect-virt gained new 'cvm' and
'--list-cvm' switches to respectively perform the detection or list
all known flavours of confidential VM, depending on the vendor. The
manager will publish a 'ConfidentialVirtualization' D-Bus property,
and will also set a SYSTEMD_CONFIDENTIAL_VIRTUALIZATION= environment
variable for unit generators. Finally, udev rules can match on a new
'cvm' key that will be set when in a confidential VM.
Additionally, when running in a 'Confidential Virtual Machine', SMBIOS
strings and QEMU's fw_cfg protocol will not be used to import
credentials and kernel command line parameters by the system manager,
systemd-boot and systemd-stub, because the hypervisor is considered
untrusted in this particular setting.
Journal:
* The sd-journal API gained a new call sd_journal_get_seqnum() to
retrieve the current log record's sequence number and sequence number
ID, which allows applications to order records the same way as
journal does internally. The sequence number is now also exported in
the JSON and "export" output of the journal.
* journalctl gained a new switch --truncate-newline. If specified
multi-line log records will be truncated at the first newline,
i.e. only the first line of each log message will be shown.
* systemd-journal-upload gained support for --namespace=, similar to
the switch of the same name of journalctl.
systemd-repart:
* systemd-repart's drop-in files gained a new ExcludeFiles= option which
may be used to exclude certain files from the effect of CopyFiles=.
* systemd-repart's Verity support now implements the Minimize= setting
to minimize the size of the resulting partition.
* systemd-repart gained a new --offline= switch, which may be used to
control whether images shall be built "online" or "offline",
i.e. whether to make use of kernel facilities such as loopback block
devices and device mapper or not.
* If systemd-repart is told to populate a newly created ESP or XBOOTLDR
partition with some files, it will now default to VFAT rather than
ext4.
* systemd-repart gained a new --architecture= switch. If specified, the
per-architecture GPT partition types (i.e. the root and /usr/
partitions) configured in the partition drop-in files are
automatically adjusted to match the specified CPU architecture, in
order to simplify cross-architecture DDI building.
* systemd-repart will now default to a minimum size of 300MB for XFS
filesystems if no size parameter is specified. This matches what the
XFS tools (xfsprogs) can support.
systemd-boot, systemd-stub, ukify, bootctl, kernel-install:
* gnu-efi is no longer required to build systemd-boot and systemd-stub.
Instead, pyelftools is now needed, and it will be used to perform the
ELF -> PE relocations at build time.
* bootctl gained a new switch --print-root-device/-R that prints the
block device the root file system is backed by. If specified twice,
it returns the whole disk block device (as opposed to partition block
device) the root file system is on. It's useful for invocations such
as "cfdisk $(bootctl -RR)" to quickly show the partition table of the
running OS.
* systemd-stub will now look for the SMBIOS Type 1 field
"io.systemd.stub.kernel-cmdline-extra" and append its value to the
kernel command line it invokes. This is useful for VMMs such as qemu
to pass additional kernel command lines into the system even when
booting via full UEFI. The contents of the field are measured into
TPM PCR 12.
* The KERNEL_INSTALL_LAYOUT= setting for kernel-install gained a new
value "auto". With this value, a kernel will be automatically
analyzed, and if it qualifies as UKI, it will be installed as if the
setting was to set to "uki", otherwise as "bls".
* systemd-stub can now optionally load UEFI PE "add-on" images that may
contain additional kernel command line information. These "add-ons"
superficially look like a regular UEFI executable, and are expected
to be signed via SecureBoot/shim. However, they do not actually
contain code, but instead a subset of the PE sections that UKIs
support. They are supposed to provide a way to extend UKIs with
additional resources in a secure and authenticated way. Currently,
only the .cmdline PE section may be used in add-ons, in which case
any specified string is appended to the command line embedded into
the UKI itself. A new 'addon<EFI-ARCH>.efi.stub' is now provided that
can be used to trivially create addons, via 'ukify' or 'objcopy'. In
the future we expect other sections to be made extensible like this as
well.
* ukify has been updated to allow building these UEFI PE "add-on"
images, using the new 'addon<EFI-ARCH>.efi.stub'.
* ukify gained a new "genkey" verb for generating a set of of key pairs
to sign UKIs and their PCR data with.
* ukify now accepts SBAT information to place in the .sbat PE section
of UKIs and addons. If a UKI is built the SBAT information from the
inner kernel is merged with any SBAT information associated with
systemd-stub and the SBAT data specified on the ukify command line.
* The kernel-install script has been rewritten in C, and reuses much of
the infrastructure of existing tools such as bootctl. It also gained
--esp-path= and --boot-path= options to override the path to the ESP,
and the $BOOT partition. Options --make-entry-directory= and
--entry-token= have been added as well, similar to bootctl's options
of the same name.
* A new kernel-install plugin 60-ukify has been added which will
combine kernel/initrd locally into a UKI and optionally sign them
with a local key. This may be used to switch to UKI mode even on
systems where a local kernel or initrd is used. (Typically UKIs are
built and signed by the vendor.)
* The ukify tool now supports "pesign" in addition to the pre-existing
"sbsign" for signing UKIs.
* systemd-measure and systemd-stub now look for the .uname PE section
that should contain the kernel's "uname -r" string.
* systemd-measure and ukify now calculate expected PCR hashes for a UKI
"offline", i.e. without access to a TPM (physical or
software-emulated).
Memory Pressure & Control:
* The sd-event API gained new calls sd_event_add_memory_pressure(),
sd_event_source_set_memory_pressure_type(),
sd_event_source_set_memory_pressure_period() to create and configure
an event source that is called whenever the OS signals memory
pressure. Another call sd_event_trim_memory() is provided that
compacts the process' memory use by releasing allocated but unused
malloc() memory back to the kernel. Services can also provide their
own custom callback to do memory trimming. This should improve system
behaviour under memory pressure, as on Linux traditionally provided
no mechanism to return process memory back to the kernel if the
kernel was under memory pressure. This makes use of the kernel's PSI
interface. Most long-running services in systemd have been hooked up
with this, and in particular systems with low memory should benefit
from this.
* Service units gained new settings MemoryPressureWatch= and
MemoryPressureThresholdSec= to configure the PSI memory pressure
logic individually. If these options are used, the
$MEMORY_PRESSURE_WATCH and $MEMORY_PRESSURE_WRITE environment
variables will be set for the invoked processes to inform them about
the requested memory pressure behaviour. (This is used by the
aforementioned sd-events API additions, if set.)
* systemd-analyze gained a new "malloc" verb that shows the output
generated by glibc's malloc_info() on services that support it. Right
now, only the service manager has been updated accordingly. This
call requires privileges.
User & Session Management:
* The sd-login API gained a new call sd_session_get_username() to
return the user name of the owner of a login session. It also gained
a new call sd_session_get_start_time() to retrieve the time the login
session started. A new call sd_session_get_leader() has been added to
return the PID of the "leader" process of a session. A new call
sd_uid_get_login_time() returns the time since the specified user has
most recently been continuously logged in with at least one session.
* JSON user records gained a new set of fields capabilityAmbientSet and
capabilityBoundingSet which contain a list of POSIX capabilities to
set for the logged in users in the ambient and bounding sets,
respectively. homectl gained the ability to configure these two sets
for users via --capability-bounding-set=/--capability-ambient-set=.
* pam_systemd learnt two new module options
default-capability-bounding-set= and default-capability-ambient-set=,
which configure the default bounding sets for users as they are
logging in, if the JSON user record doesn't specify this explicitly
(see above). The built-in default for the ambient set now contains
the CAP_WAKE_ALARM, thus allowing regular users who may log in
locally to resume from a system suspend via a timer.
* The Session D-Bus objects systemd-logind gained a new SetTTY() method
call to update the TTY of a session after it has been allocated. This
is useful for SSH sessions which are typically allocated first, and
for which a TTY is added later.
* The sd-login API gained a new call sd_pid_notifyf_with_fds() which
combines the various other sd_pid_notify() flavours into one: takes a
format string, an overriding PID, and a set of file descriptors to
send. It also gained a new call sd_pid_notify_barrier() call which is
equivalent to sd_notify_barrier() but allows the originating PID to
be specified.
* "loginctl list-users" and "loginctl list-sessions" will now show the
state of each logged in user/session in their tabular output. It will
also show the current idle state of sessions.
DDIs:
* systemd-dissect will now show the intended CPU architecture of an
inspected DDI.
* systemd-dissect will now install itself as mount helper for the "ddi"
pseudo-file system type. This means you may now mount DDIs directly
via /bin/mount or /etc/fstab, making full use of embedded Verity
information and all other DDI features.
Example: mount -t ddi myimage.raw /some/where
* The systemd-dissect tool gained the new switches --attach/--detach to
attach/detach a DDI to a loopback block device without mounting it.
It will automatically derive the right sector size from the image
and set up Verity and similar, but not mount the file systems in it.
* When systemd-gpt-auto-generator or the DDI mounting logic mount an
ESP or XBOOTLDR partition the MS_NOSYMFOLLOW mount option is now
implied. Given that these file systems are typically untrusted, this
should make mounting them automatically have less of a security
impact.
* All tools that parse DDIs (such as systemd-nspawn, systemd-dissect,
systemd-tmpfiles, …) now understand a new switch --image-policy= which
takes a string encoding image dissection policy. With this mechanism
automatic discovery and use of specific partition types and the
cryptographic requirements on the partitions (Verity, LUKS, …) can be
restricted, permitting better control of the exposed attack surfaces
when mounting disk images. systemd-gpt-auto-generator will honour such
an image policy too, configurable via the systemd.image_policy= kernel
command line option. Unit files gained the RootImagePolicy=,
MountImagePolicy= and ExtensionImagePolicy= to configure the same for
disk images a service runs off.
* systemd-analyze gained a new verb "image-policy" to validate and
parse image policy strings.
* systemd-dissect gained support for a new --validate switch to
superficially validate DDI structure, and check whether a specific
image policy allows the DDI.
* systemd-dissect gained support for a new --mtree-hash switch to
optionally disable calculating mtree hashes, which can be slow on
large images.
* systemd-dissect --copy-to, --copy-from, --list and --mtree switches
are now able to operate on directories too, other than images.
Network Management:
* networkd's GENEVE support as gained a new .network option
InheritInnerProtocol=.
* The [Tunnel] section in .netdev files has gained a new setting
IgnoreDontFragment for controlling the IPv4 "DF" flag of datagrams.
* A new global IPv6PrivacyExtensions= setting has been added that
selects the default value of the per-network setting of the same
name.
* The predictable network interface naming logic will now include
SR-IOV-R "representor" information in network interface names.
* The DHCPv4 + DHCPv6 + IPv6 RA logic in networkd gained support for
the RFC8910 captive portal option.
Device Management:
* udevadm gained the new "verify" verb for validating udev rules files
offline.
* udev gained a new tool "iocost" that can be used to configure QoS IO
cost data based on hwdb information onto suitable block devices. Also
see https://github.com/iocost-benchmark/iocost-benchmarks.
TPM2 Support + Disk Encryption & Authentication:
* systemd-cryptenroll/systemd-cryptsetup will now install a TPM2 SRK
("Storage Root Key") as first step in the TPM2, and then use that
for binding FDE to, if TPM2 support is used. This matches
recommendations of TCG (see
https://trustedcomputinggroup.org/wp-content/uploads/TCG-TPM-v2.0-Provisioning-Guidance-Published-v1r1.pdf)
* systemd-cryptenroll and other tools that take TPM2 PCR parameters now
understand textual identifiers for these PCRs.
* systemd-veritysetup + /etc/veritytab gained support for a series of
new options: hash-offset=, superblock=, format=, data-block-size=,
hash-block-size=, data-blocks=, salt=, uuid=, hash=, fec-device=,
fec-offset=, fec-roots= to configure various aspects of a Verity
volume.
* systemd-cryptsetup + /etc/crypttab gained support for a new
veracrypt-pim= option for setting the Personal Iteration Multiplier
of veracrypt volumes.
* systemd-integritysetup + /etc/integritytab gained support for a new
mode= setting for controlling the dm-integrity mode (journal, bitmap,
direct) for the volume.
* systemd-analyze gained a new verb "pcrs" that shows the known TPM PCR
registers, their symbolic names and current values.
systemd-tmpfiles:
* The ACL support in tmpfiles.d/ has been updated: if an uppercase "X"
access right is specified this is equivalent to "x" but only if the
inode in question already has the executable bit set for at least
some user/group. Otherwise the "x" bit will be turned off.
* tmpfiles.d/'s C line type now understands a new modifier "+": a line
with C+ will result in a "merge" copy, i.e. all files of the source
tree are copied into the target tree, even if that tree already
exists, resulting in a combined tree of files already present in the
target tree and those copied in.
* systemd-tmpfiles gained a new --graceful switch. If specified lines
with unknown users/groups will silently be skipped.
systemd-notify:
* systemd-notify gained two new options --fd= and --fdname= for sending
arbitrary file descriptors to the service manager (while specifying an
explicit name for it).
* systemd-notify gained a new --exec switch, which makes it execute the
specified command line after sending the requested messages. This is
useful for sending out READY=1 first, and then continuing invocation
without changing process ID, so that the tool can be nicely used
within an ExecStart= line of a unit file that uses Type=notify.
sd-event + sd-bus APIs:
* The sd-event API gained a new call sd_event_source_leave_ratelimit()
which may be used to explicitly end a rate-limit state an event
source might be in, resetting all rate limiting counters.
* When the sd-bus library is used to make connections to AF_UNIX D-Bus
sockets, it will now encode the "description" set via
sd_bus_set_description() into the source socket address. It will also
look for this information when accepting a connection. This is useful
to track individual D-Bus connections on a D-Bus broker for debug
purposes.
systemd-resolved:
* systemd-resolved gained a new resolved.conf setting
StateRetentionSec= which may be used to retain cached DNS records
even after their nominal TTL, and use them in case upstream DNS
servers cannot be reached. This can be sued to make name resolution
more resilient in case of network problems.
* resolvectl gained a new verb "show-cache" to show the current cache
contents of systemd-resolved. This verb communicates with the
systemd-resolved daemon and requires privileges.
Other:
* Meson >= 0.60.0 is now required to build systemd.
* The default keymap to apply may now be chosen at build-time via the
new -Ddefault-keymap= meson option.
* Most of systemd's long-running services now have a generic handler of
the SIGRTMIN+18 signal handler which executes various operations
depending on the sigqueue() parameter sent along. For example, values
0x100…0x107 allow changing the maximum log level of such
services. 0x200…0x203 allow changing the log target of such
services. 0x300 make the services trim their memory similarly to the
automatic PSI-triggered action, see above. 0x301 make the services
output their malloc_info() data to the logs.
* machinectl gained new "edit" and "cat" verbs for editing .nspawn
files, inspired by systemctl's verbs of the same name which edit unit
files. Similarly, networkctl gained the same verbs for editing
.network, .netdev, .link files.
* A new syscall filter group "@sandbox" has been added that contains
syscalls for sandboxing system calls such as those for seccomp and
Landlock.
* New documentation has been added:
https://systemd.io/COREDUMP
https://systemd.io/MEMORY_PRESSURE
smbios-type-11(7)
* systemd-firstboot gained a new --reset option. If specified, the
settings in /etc/ it knows how to initialize are reset.
* systemd-sysext is now a multi-call binary and is also installed under
the systemd-confext alias name (via a symlink). When invoked that way
it will operate on /etc/ instead of /usr/ + /opt/. It thus becomes a
powerful, atomic, secure configuration management of sorts, that
locally can merge configuration from multiple confext configuration
images into a single immutable tree.
* The --network-macvlan=, --network-ipvlan=, --network-interface=
switches of systemd-nspawn may now optionally take the intended
network interface inside the container.
* All our programs will now send an sd_notify() message with their exit
status in the EXIT_STATUS= field when exiting, using the usual
protocol, including PID 1. This is useful for VMMs and container
managers to collect an exit status from a system as it shuts down, as
set via "systemctl exit …". This is particularly useful in test cases
and similar, as invocations via a VM can now nicely propagate an exit
status to the host, similar to local processes.
* systemd-run gained a new switch --expand-environment=no to disable
server-side environment variable expansion in specified command
lines. Expansion defaults to enabled for all execution types except
--scope, where it defaults to off (and prints a warning) for backward
compatibility reasons. --scope will be flipped to enabled by default
too in a future release. If you are using --scope and passing a '$'
character in the payload you should start explicitly using
--expand-environment=yes/no according to the use case.
* The systemd-system-update-generator has been updated to also look for
the special flag file /etc/system-update in addition to the existing
support for /system-update to decide whether to enter system update
mode.
* The /dev/hugepages/ file system is now mounted with nosuid + nodev
mount options by default.
* systemd-fstab-generator now understands two new kernel command line
options systemd.mount-extra= and systemd.swap-extra=, which configure
additional mounts or swaps in a format similar to /etc/fstab. 'fsck'
will be ran on these block devices, like it already happens for
'root='. It also now supports the new fstab.extra and
fstab.extra.initrd credentials that may contain additional /etc/fstab
lines to apply at boot.
* systemd-getty-generator now understands two new credentials
getty.ttys.container and getty.ttys.serial. These credentials may
contain a list of TTY devices – one per line – to instantiate
[email protected] and [email protected] on.
* The getty/serial-getty/container-getty units now import the 'agetty.*'
and 'login.*' credentials, which are consumed by the 'login' and
'agetty' programs starting from util-linux v2.40.
* systemd-sysupdate's sysupdate.d/ drop-ins gained a new setting
PathRelativeTo=, which can be set to "esp", "xbootldr", "boot", in
which case the Path= setting is taken relative to the ESP or XBOOTLDR
partitions, rather than the system's root directory /. The relevant
directories are automatically discovered.
* The systemd-ac-power tool gained a new switch --low, which reports
whether the battery charge is considered "low", similar to how the
s2h suspend logic checks this state to decide whether to enter system
suspend or hibernation.
* The /etc/os-release file can now have two new optional fields
VENDOR_NAME= and VENDOR_URL= to carry information about the vendor of
the OS.
* When the system hibernates, information about the device and offset
used is now written to a non-volatile EFI variable. On next boot the
system will attempt to resume from the location indicated in this EFI
variable. This should make hibernation a lot more robust, while
requiring no manual configuration of the resume location.
* The $XDG_STATE_HOME environment variable (added in more recent
versions of the XDG basedir specification) is now honoured to
implement the StateDirectory= setting in user services.
* A new component "systemd-battery-check" has been added. It may run
during early boot (usually in the initrd), and checks the battery
charge level of the system. In case the charge level is very low the
user is notified (graphically via Plymouth – if available – as well
as in text form on the console), and the system is turned off after a
10s delay. The feature can be disabled by passing
systemd.battery-check=0 through the kernel command line.
* The 'passwdqc' library is now supported as an alternative to the
'pwquality' library and can be selected at build time.
Contributions from: 김인수, 07416, Addison Snelling, Adrian Vovk,
Aidan Dang, Alexander Krabler, Alfred Klomp, Anatoli Babenia,
Andrei Stepanov, Andrew Baxter, Antonio Alvarez Feijoo,
Arian van Putten, Arthur Shau, A S Alam,
Asier Sarasua Garmendia, Balló György, Bastien Nocera,
Benjamin Herrenschmidt, Benjamin Raison, Bill Peterson,
Brad Fitzpatrick, Brett Holman, bri, Chen Qi, Chitoku,
Christian Hesse, Christoph Anton Mitterer, Christopher Gurnee,
Colin Walters, Cornelius Hoffmann, Cristian Rodríguez, cunshunxia,
cvlc12, Cyril Roelandt, Daan De Meyer, Daniele Medri,
Daniel P. Berrangé, Daniel Rusek, Dan Streetman, David Edmundson,
David Schroeder, David Tardon, dependabot[bot],
Dimitri John Ledkov, Dmitrii Fomchenkov, Dmitry V. Levin, dmkUK,
Dominique Martinet, don bright, drosdeck, Edson Juliano Drosdeck,
Egor Ignatov, EinBaum, Emanuele Giuseppe Esposito, Eric Curtin,
Erik Sjölund, Evgeny Vereshchagin, Florian Klink, Franck Bui,
François Rigault, Fran Diéguez, Franklin Yu, Frantisek Sumsal,
Fuminobu TAKEYAMA, Gaël PORTAY, Gerd Hoffmann, Gertalitec,
Gibeom Gwon, Gustavo Noronha Silva, Hannu Lounento,
Hans de Goede, Haochen Tong, HATAYAMA Daisuke, Henrik Holst,
Hoe Hao Cheng, Igor Tsiglyar, Ivan Vecera, James Hilliard,
Jan Engelhardt, Jan Janssen, Jan Luebbe, Jan Macku, Janne Sirén,
jcg, Jeidnx, Joan Bruguera, Joerg Behrmann, jonathanmetzman,
Jordan Rome, Josef Miegl, Joshua Goins, Joyce, Joyce Brum,
Juno Computers, Kai Lueke, Kevin P. Fleming, Kiran Vemula, Klaus,
Klaus Zipfel, Lawrence Thorpe, Lennart Poettering, licunlong,
Lily Foster, Luca Boccassi, Ludwig Nussel, Luna Jernberg,
maanyagoenka, Maanya Goenka, Maksim Kliazovich, Malte Poll,
Marko Korhonen, Masatake YAMATO, Mateusz Poliwczak, Matt Johnston,
Miao Wang, Micah Abbott, Michael A Cassaniti, Michal Koutný,
Michal Sekletár, Mike Yuan, mooo, Morten Linderud, msizanoen,
Nick Rosbrook, nikstur, Olivier Gayot, Omojola Joshua,
Paolo Velati, Paul Barker, Pavel Borecki, Petr Menšík,
Philipp Kern, Philip Withnall, Piotr Drąg, Quintin Hill,
Rene Hollander, Richard Phibel, Robert Meijers, Robert Scheck,
Roger Gammans, Romain Geissler, Ronan Pigott, Russell Harmon,
saikat0511, Samanta Navarro, Sam James, Sam Morris,
Simon Braunschmidt, Sjoerd Simons, Sorah Fukumori,
Stanislaw Gruszka, Stefan Roesch, Steven Luo, Steve Ramage,
Susant Sahani, taniishkaaa, Tanishka, Temuri Doghonadze,
Thierry Martin, Thomas Blume, Thomas Genty, Thomas Weißschuh,
Thorsten Kukuk, Times-Z, Tobias Powalowski, tofylion,
Topi Miettinen, Uwe Kleine-König, Velislav Ivanov,
Vitaly Kuznetsov, Vít Zikmund, Weblate, Will Fancher,
William Roberts, Winterhuman, Wolfgang Müller, Xeonacid,
Xiaotian Wu, Xi Ruoyao, Yuri Chornoivan, Yu Watanabe, Yuxiang Zhu,
Zbigniew Jędrzejewski-Szmek, zhmylove, ZjYwMj,
Дамјан Георгиевски, наб
— Edinburgh, 2023-07-28
CHANGES WITH 253:
Announcements of Future Feature Removals and Incompatible Changes:
* We intend to remove cgroup v1 support from systemd release after the
end of 2023. If you run services that make explicit use of cgroup v1
features (i.e. the "legacy hierarchy" with separate hierarchies for
each controller), please implement compatibility with cgroup v2 (i.e.
the "unified hierarchy") sooner rather than later. Most of Linux
userspace has been ported over already.
* We intend to remove support for split-usr (/usr mounted separately
during boot) and unmerged-usr (parallel directories /bin and
/usr/bin, /lib and /usr/lib, etc). This will happen in the second
half of 2023, in the first release that falls into that time window.
For more details, see:
https://lists.freedesktop.org/archives/systemd-devel/2022-September/048352.html
* We intend to change behaviour w.r.t. units of the per-user service
manager and sandboxing options, so that they work without having to
manually enable PrivateUsers= as well, which is not required for
system units. To make this work, we will implicitly enable user
namespaces (PrivateUsers=yes) when a sandboxing option is enabled in a
user unit. The drawback is that system users will no longer be visible
(and appear as 'nobody') to the user unit when a sandboxing option is
enabled. By definition a sandboxed user unit should run with reduced
privileges, so impact should be small. This will remove a great source
of confusion that has been reported by users over the years, due to
how these options require an extra setting to be manually enabled when
used in the per-user service manager, as opposed as to the system
service manager. We plan to enable this change in the next release
later this year. For more details, see:
https://lists.freedesktop.org/archives/systemd-devel/2022-December/048682.html
Deprecations and incompatible changes:
* systemctl will now warn when invoked without /proc/ mounted
(e.g. when invoked after chroot() into an directory tree without the
API mount points like /proc/ being set up.) Operation in such an
environment is not fully supported.
* The return value of 'systemctl is-active|is-enabled|is-failed' for
unknown units is changed: previously 1 or 3 were returned, but now 4
(EXIT_PROGRAM_OR_SERVICES_STATUS_UNKNOWN) is used as documented.
* 'udevadm hwdb' subcommand is deprecated and will emit a warning.
systemd-hwdb (added in 2014) should be used instead.
* 'bootctl --json' now outputs a single JSON array, instead of a stream
of newline-separated JSON objects.
* Udev rules in 60-evdev.rules have been changed to load hwdb
properties for all modalias patterns. Previously only the first
matching pattern was used. This could change what properties are
assigned if the user has more and less specific patterns that could
match the same device, but it is expected that the change will have
no effect for most users.
* systemd-networkd-wait-online exits successfully when all interfaces
are ready or unmanaged. Previously, if neither '--any' nor
'--interface=' options were used, at least one interface had to be in
configured state. This change allows the case where systemd-networkd
is enabled, but no interfaces are configured, to be handled
gracefully. It may occur in particular when a different network
manager is also enabled and used.
* Some compatibility helpers were dropped: EmergencyAction= in the user
manager, as well as measuring kernel command line into PCR 8 in
systemd-stub, along with the -Defi-tpm-pcr-compat compile-time
option.
* The '-Dupdate-helper-user-timeout=' build-time option has been
renamed to '-Dupdate-helper-user-timeout-sec=', and now takes an
integer as parameter instead of a string.
* The DDI image dissection logic (which backs RootImage= in service
unit files, the --image= switch in various tools such as
systemd-nspawn, as well as systemd-dissect) will now only mount file
systems of types btrfs, ext4, xfs, erofs, squashfs, vfat. This list
can be overridden via the $SYSTEMD_DISSECT_FILE_SYSTEMS environment
variable. These file systems are fairly well supported and maintained
in current kernels, while others are usually more niche, exotic or
legacy and thus typically do not receive the same level of security
support and fixes.
* The default per-link multicast DNS mode is changed to "yes"
(that was previously "no"). As the default global multicast DNS mode
has been "yes" (but can be changed by the build option), now the
multicast DNS is enabled on all links by default. You can disable the
multicast DNS on all links by setting MulticastDNS= in resolved.conf,
or on an interface by calling "resolvectl mdns INTERFACE no".
New components:
* A tool 'ukify' tool to build, measure, and sign Unified Kernel Images
(UKIs) has been added. This replaces functionality provided by
'dracut --uefi' and extends it with automatic calculation of PE file
offsets, insertion of signed PCR policies generated by
systemd-measure, support for initrd concatenation, signing of the
embedded Linux image and the combined image with sbsign, and
heuristics to autodetect the kernel uname and verify the splash
image.
Changes in systemd and units:
* A new service type Type=notify-reload is defined. When such a unit is
reloaded a UNIX process signal (typically SIGHUP) is sent to the main
service process. The manager will then wait until it receives a
"RELOADING=1" followed by a "READY=1" notification from the unit as
response (via sd_notify()). Otherwise, this type is the same as
Type=notify. A new setting ReloadSignal= may be used to change the
signal to send from the default of SIGHUP.
[email protected], systemd-networkd.service, systemd-udevd.service, and
systemd-logind have been updated to this type.
* Initrd environments which are not on a pure memory file system (e.g.
overlayfs combination as opposed to tmpfs) are now supported. With
this change, during the initrd → host transition ("switch root")
systemd will erase all files of the initrd only when the initrd is
backed by a memory file system such as tmpfs.
* New per-unit MemoryZSwapMax= option has been added to configure
memory.zswap.max cgroup properties (the maximum amount of zswap
used).
* A new LogFilterPatterns= option has been added for units. It may be
used to specify accept/deny regular expressions for log messages
generated by the unit, that shall be enforced by systemd-journald.
Rejected messages are neither stored in the journal nor forwarded.
This option may be used to suppress noisy or uninteresting messages
from units.
* The manager has a new
org.freedesktop.systemd1.Manager.GetUnitByPIDFD() D-Bus method to
query process ownership via a PIDFD, which is more resilient against
PID recycling issues.
* Scope units now support OOMPolicy=. Login session scopes default to
OOMPolicy=continue, allowing login scopes to survive the OOM killer
terminating some processes in the scope.
* systemd-fstab-generator now supports x-systemd.makefs option for
/sysroot/ (in the initrd).
* The maximum rate at which daemon reloads are executed can now be
limited with the new ReloadLimitIntervalSec=/ReloadLimitBurst=
options. (Or the equivalent on the kernel command line:
systemd.reload_limit_interval_sec=/systemd.reload_limit_burst=). In
addition, systemd now logs the originating unit and PID when a reload
request is received over D-Bus.
* When enabling a swap device systemd will now reinitialize the device
when the page size of the swap space does not match the page size of
the running kernel. Note that this requires the 'swapon' utility to
provide the '--fixpgsz' option, as implemented by util-linux, and it
is not supported by busybox at the time of writing.
* systemd now executes generator programs in a mount namespace
"sandbox" with most of the file system read-only and write access
restricted to the output directories, and with a temporary /tmp/
mount provided. This provides a safeguard against programming errors
in the generators, but also fixes here-docs in shells, which
previously didn't work in early boot when /tmp/ wasn't available
yet. (This feature has no security implications, because the code is
still privileged and can trivially exit the sandbox.)
* The system manager will now parse a new "vmm.notify_socket"
system credential, which may be supplied to a VM via SMBIOS. If
found, the manager will send a "READY=1" notification on the
specified socket after boot is complete. This allows readiness
notification to be sent from a VM guest to the VM host over a VSOCK
socket.
* The sample PAM configuration file for [email protected] now
includes a call to pam_namespace. This puts children of [email protected]
in the expected namespace. (Many distributions replace their file
with something custom, so this change has limited effect.)
* A new environment variable $SYSTEMD_DEFAULT_MOUNT_RATE_LIMIT_BURST
can be used to override the mount units burst late limit for
parsing '/proc/self/mountinfo', which was introduced in v249.
Defaults to 5.
* Drop-ins for init.scope changing control group resource limits are
now applied, while they were previously ignored.
* New build-time configuration options '-Ddefault-timeout-sec=' and
'-Ddefault-user-timeout-sec=' have been added, to let distributions
choose the default timeout for starting/stopping/aborting system and
user units respectively.
* Service units gained a new setting OpenFile= which may be used to
open arbitrary files in the file system (or connect to arbitrary
AF_UNIX sockets in the file system), and pass the open file
descriptor to the invoked process via the usual file descriptor
passing protocol. This is useful to give unprivileged services access
to select files which have restrictive access modes that would
normally not allow this. It's also useful in case RootDirectory= or
RootImage= is used to allow access to files from the host environment
(which is after all not visible from the service if these two options
are used.)
Changes in udev:
* The new net naming scheme "v253" has been introduced. In the new
scheme, ID_NET_NAME_PATH is also set for USB devices not connected via
a PCI bus. This extends the coverage of predictable interface names
in some embedded systems.
The "amba" bus path is now included in ID_NET_NAME_PATH, resulting in
a more informative path on some embedded systems.
* Partition block devices will now also get symlinks in
/dev/disk/by-diskseq/<seq>-part<n>, which may be used to reference
block device nodes via the kernel's "diskseq" value. Previously those
symlinks were only created for the main block device.