Skip to content

Latest commit

 

History

History
113 lines (84 loc) · 5.17 KB

RESTART-POLICY.md

File metadata and controls

113 lines (84 loc) · 5.17 KB

RESTART POLICY

Finally, the systemctl replacement InitLoop can check for Restart of failed modules. But at first you should know that you can disable the feature right way to be backward compatible with older versions (before v1.5). Just say

systemctl.py -c RESTART_FAILED_UNITS=no

LimitBurst

The defaults for the LimitBurst implementation are just as in the standard SystemD:

StartLimitBurst = 5x StartLimitIntervalSec = 10s

With a default of InitLoopSleep=5 seconds it means that the LimitBurst will never be activated. If you have a lower InitLoopSleep (see below) then it might happen that a module restart was done too often - and the module gets an ActiveState=error. In that state it will never be restarted again - so the container is effectively dead in terms of that service unit.

You can heal the situation explicitly by saying 'systemctl.py reset-failed [unit]'. That's because the ActiveState is a part of the unit.service.status file which gets deleted with that command. It matches with the original definition of the behaviour of SystemD.

RestartSec

The DefaultRestartSec is set at 100ms just like it is in SystemD. However this has no effect on the systemctl behaviour as the InitLoopSleep is at 5 seconds. The implementation of the Restart behaviour is done so that it does check for failed services - and then they are scheduled for restart in the future. So effectively it is StartTime = now + RestartSec. But because of the InitLoop the next check may be far away - so instead of with 100ms it gets the real restart of the failed service after 5000ms.

In order to help with configuration of a shorter restart interval, the implementation will check for the "RestartSec" in the unit service descriptors. If it is atleast 1 second but lower than InitLoopSleep then InitLoopSleep is shortened to that time. For example if you do have a service unit with a "Restart=2s" then you can expect that the real restart from failure will haben about 2 seconds after the failed state was detected - which is again within a time frame of 2 seconds after the failure has occurred. As a result, a restart can happens far as 4 seconds after a unit failure.

A "RestartSec=0" is a special value - it will be increased to an InitLoop interval of 1 second but it has no further delay, so that a restart occurs within 1 second after a unit failure. The current InitLoop of the docker systemctl replacement code can not offer any better.

InitLoopSleep

Using "RestartSec" you can easily build docker containers with a shorter InitLoop interval - that comes from the EXTRA-CONFIGS feature provided by standard SystemD. Suppose you have service unit "my.service" then you are going to create a "my.service.d/restart.conf" like this:

# /usr/lib/systemd/system/my.service"
mkdir /usr/lib/systemd/system/my.service.d
echo >/usr/lib/systemd/system/my.service.d/restart.conf" <<EOF
[Service]
RestartSec=2s
EOF

That will override the global default and any other setting in the my.service descriptor.

Existing Values

The RestartSec value is usually quite big if it is set at all - and in the vast majority of cases it set to zero. So there is no delay after the detection of a failed status. That's what you can use as well with systemctl.py but it will keep the InitLoop interval quite high.

/etc/systemd/system/dbus-org.freedesktop.network1.service:Restart=on-failure
/etc/systemd/system/dbus-org.freedesktop.network1.service:RestartSec=0
/usr/lib/systemd/system/[email protected]=always
/usr/lib/systemd/system/[email protected]:RestartSec=0
/usr/lib/systemd/system/[email protected]=always
/usr/lib/systemd/system/[email protected]:RestartSec=0
/usr/lib/systemd/system/dbus-org.freedesktop.login1.service-Restart=always
/usr/lib/systemd/system/dbus-org.freedesktop.login1.service:RestartSec=0
/usr/lib/systemd/system/[email protected]=always
/usr/lib/systemd/system/[email protected]:RestartSec=0
/usr/lib/systemd/system/ntpd.service:RestartSec=11min
/usr/lib/systemd/system/ntpd.service-Restart=always
/usr/lib/systemd/system/systemd-journald.service-Restart=always
/usr/lib/systemd/system/systemd-journald.service:RestartSec=0
/usr/lib/systemd/system/systemd-logind.service-Restart=always
/usr/lib/systemd/system/systemd-logind.service:RestartSec=0
/usr/lib/systemd/system/systemd-networkd.service-Restart=on-failure
/usr/lib/systemd/system/systemd-networkd.service:RestartSec=0
/usr/lib/systemd/system/systemd-udevd.service-Restart=always
/usr/lib/systemd/system/systemd-udevd.service:RestartSec=0

on-failure

Note that the implementation of the systemctl replacement script does not really check the restart policy - so that Restart=on-failure and Restart=always are actually the same thing. That's because the systemctl replacement script does not know about an exit code of a unit file anyway.

On the other that, there are two settings which disable the restarting behaviour for a service module: either "Restart=no" or "Restart=on-success" will keep the service offline if it comes into a "failed" state. No restart is attempted. All other "Restart=x" settings will try to restart correspondingly.