-
-
Notifications
You must be signed in to change notification settings - Fork 49
Diagnosing boot failures
In general, if your Dinit-based system fails to boot, the first step is to check in with your distribution and their support forums. Boot failures are usually caused by services themselves, and these are established by the distribution. Please don't open bugs against Dinit for boot failures unless you have reason to believe there is a bug in Dinit itself.
That said, there are things you can do to diagnose and repair a failing boot which will be detailed here.
These instructions are intended as a basic guide only. It's impossible to cover the range of possible situations and services, so if you cannot fix the problem yourself, please gather as much information as you can and contact your distribution's support channel.
There are two basic types of failure:
- A critical service fails (i.e. Dinit recognises boot failure and offers option to use recovery mode)
- Boot gets stuck
These require slightly different approaches to diagnose.
When there is a critical service failure, all services will stop and Dinit will display a message similar to:
All services have stopped with no shutdown issued; boot failure?
Choose: (r)eboot, r(e)covery, re(s)tart boot sequence, (p)ower off?
It's possible that, prior to this message, you will see failure messages including an error relating to the root cause of the problem. However, since one failure often leads to a cascade of failures (services which depend on the failed service will also fail), it's also possible that the original message will scroll off the screen.
If you services provide a "recovery" service, you can start it by pressing 'e'. Exactly what happens in this case is up to your service configuration (i.e. up to your distribution), but it may give you access to a (root) shell.
Once you have a root shell, you should run the dinitcheck
command. This may identify the problem immediately, but if not it will at least list which services should start during boot. With any luck, you will be able to execute dinitctl
commands and in particular, you can start services one by one:
dinitctl start fsck
dinitctl start udevd
These are just examples - you should start services that are configured on your system.
Try to start "earlier" services first and progress to later services. You may need to examine the service descriptions to decide which services to start and in what order; look for services that have no dependencies, and start them first. See the dinit(8) man page for information on the default service description directories - this will tell you where you can look to find the service descriptions available on your system (and once found, will allow you to inspect and/or edit them).
By starting services one at a time, you should eventually be able to identify which service is the one that is failing. Consult the log for the service (check the service description to see how it is configured to log output; some services will also log via the syslog facility, so check system logs also). The output may allow you to resolve the issue, or otherwise may be useful information to take to your support channel.
You may also be able to use your distribution's package manager at this point to roll back a package to an earlier version, or check for more recent updates that may resolve issues.
Note: if the recovery
service has the runs-on-console
option set, it won't be possible to start services which need the console from the recovery shell; the dinitctl start ...
command will appear to hang. You can use ^C
(control-C) to terminate dinitctl.
An apparently stuck boot can be caused by two things:
- a critical service gets stuck
- no login service (tty and/or graphical session) is enabled, that is, the boot service isn't configured with a direct or indirect dependency on a login service
The second is a configuration problem; the first is caused by a software issue, often in the program associated with the service that gets stuck.
To diagnose the issue properly, the first step is typically to obtain a shell. Since the recovery option is not presented (the boot is stuck rather than failed) you will need to use a boot-time option to boot to "single-user" or recovery mode. Support for this is dependent on your distribution / set of services, and how to do it depends on your bootloader.
In case your system boots via grub, you can press 'e' at the boot menu to edit the selected boot option. Look for a line that starts with linux
- this specifies the kernel command line. You can add single
to the end of the command line in order to start the single
service instead of the normal boot
service ("single" refers to "single-user", an historical name for a limited or "safe" mode of system operation). Edits made this way apply only to the current boot, they are not permanent.
The single
option is not recognised by the kernel itself, but will be passed through to the init system, and dinit recognises it specially. The name single
is special; to start a different arbitrary service, put -t
before the service name, for example use -t recovery
to start the recovery
service.
Depending on service configuration (as set up by your distribution), booting to the single
or recovery
service may give you a root shell. Then you can:
- Check that a login service, such as a terminal login (often called
ttyX
whereX
is 1-6) or display manager is enabled. Running thedinitcheck
command should list the services that will start during a normal boot, and as a bonus will check for and report various errors in the configuration. - Assuming that a suitable login service is enabled, you can try to start services individually to try and identify which service is getting stuck. See the instructions under "Critical service failure" above.
Copyright (C) 2016-2024 Davin McCall
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at LICENSE file & http://www.apache.org/licenses/LICENSE-2.0