Get a report on Matrix or Slack when a drive (HDD or SDD) is failing.
For large infrastructures with 24/7 availability, available monitoring solutions are designed to keep them up-and-running by analyzing numerous parameters in real-time. For smaller setup, without high availability requirement and/or low resources, maintaining the integrity of the data is the primary resource to monitor. In that setup, other failures (such as RAM, network etc) are dealt with as they arise. Failing Disk Reporter is a simple tool checking periodically that drives are functional and reports when a failing drive is detected. Reporting to a Matrix room and a Slack channel are supported.
Failing drives are detected using Smartmontools using the S.M.A.R.T. interface. Smartmontools supports drives connected directly on the motherboard using SATA ports from the southbridge and drives connected on hardware RAID cards.
To identify failing drives, criteria defined by Blackblaze are used, as translated in this post. Users can define different criteria.
See refs page for tarball and executable.
Executables are statically linked binaries obtained with disabled cgo:
CGO_ENABLED=0 go build *go
Install the failing-disk-reporter package available on the AUR.
- Install
fdr
executable in/usr/bin
(or/usr/local/bin
, in that case change path tofdr
in failing-disk-reporter.service) - Edit FDR configuration file fdr.toml, then copy it to
/etc
- Copy systemd failing-disk-reporter.service and failing-disk-reporter.timer to
/etc/systemd/system
-
Configure FDR in
/etc/fdr.toml
:- [smart]
- ignored_protocols: List of protocols ignored
- [[smart.criteria]]: List of criteria to identify failing drives
- protocol: For example
ATA
,NVMe
- key: SMART attribute
- id: SMART attribute, optional
- name: SMART attribute
- label: Label for report
- max: Threshold for failure
- protocol: For example
- [[reporters]] Configuration of reporters
- [smart]
-
Enable and start the timer:
systemctl enable failing-disk-reporter.timer systemctl start failing-disk-reporter.timer
- Get the access token from the Help & About tab in the user config (details in this post). Input this token in the
TOKEN
parameter of the Matrix reporter. - Get the Internal room ID from the Advanced tab in the config page of the room messages should be sent. Input this room ID in the
ROOM
parameter of the Matrix reporter.
- Create a Webhook.
- Input the
TOKENxxx/Bxxx/Gxxx
inurl
parameter of the Slack reporter.
FDR can be tested with (-debug
for increasing verbosity and -report
for sending reports ignoring intervals configured in fdr.toml
):
fdr -config config/fdr.toml -debug -report
Failing Disk Reporter is distributed under the Mozilla Public License Version 2.0 (see /LICENSE).
Copyright (C) 2020-2022 Charles E. Vejnar