Skip to content

A simple tool to extract zpool status and output Influx line protocol; usable as a telegraf input plugin

License

Notifications You must be signed in to change notification settings

iwvelando/telegraf-exec-zpool-status

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

telegraf-exec-zpool-status

This is a simple tool to extract zpool status and output Influx line protocol; it is designed to be used with a telegraf exec plugin. This parses the output of zpool list -H -p and zpool status -s -p and has been developed against Ubuntu 20.04 with ZFS 0.8.3 and InfluxDB 1.x for generating compatible line protocol.

Reference Output

This is sample zpool -H -p output this tool expects to parse:

testpool111     20401094656     18467381760     1933712896      -       -       72      90      1.00    SUSPENDED       -
testpool222     20401094656     18467233280     1933861376      -       -       68      90      1.00    DEGRADED        -
testpool333     10200547328     139776  10200407552     -       -       0       0       1.00    ONLINE  -

and sample zpool status -s -p:

  pool: testpool111
 state: SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://zfsonlinux.org/msg/ZFS-8000-HC
  scan: scrub in progress since Fri Feb  5 20:00:59 2021
        17.2G scanned at 297K/s, 0B issued at 0B/s, 17.2G total
        0B repaired, 0.00% done, no estimated completion time
config:

        NAME                     STATE     READ WRITE CKSUM  SLOW
        testpool111              UNAVAIL      1     0     0     0  insufficient replicas
          /home/isaac/disk3.img  UNAVAIL      0     0     0     0  corrupted data
          /home/isaac/disk4.img  ONLINE       0     0     0     0

errors: No known data errors

  pool: testpool222
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0B in 0 days 00:00:27 with 0 errors on Fri Feb  5 19:54:00 2021
config:

        NAME                       STATE     READ WRITE CKSUM  SLOW
        testpool222                DEGRADED     0     0     0     0
          mirror-0                 ONLINE       0     0     0     0
            /home/isaac/disk1.img  ONLINE       0     0     0     0
            /home/isaac/disk2.img  ONLINE       0     0     0     0
          mirror-1                 DEGRADED     0     0     0     0
            /home/isaac/disk5.img  UNAVAIL      0     0    18     0  corrupted data
            /home/isaac/disk6.img  ONLINE       0     0     0     0

errors: No known data errors

  pool: testpool333
 state: ONLINE
  scan: none requested
config:

        NAME                       STATE     READ WRITE CKSUM  SLOW
        testpool333                ONLINE       0     0     0     0
          mirror-0                 ONLINE       0     0     0     0
            /home/isaac/disk7.img  ONLINE       0     0     0     0
            /home/isaac/disk8.img  ONLINE       0     0     0     0

errors: No known data errors

Inreractive Run Example

The compiled tool can be run interactively. It assumes by default that the TextFSM template is in your current directory, but that can be set with the --template CLI option.

./telegraf-exec-zpool-status
zpool,alternative_root=-,pool=testpool111 allocated=18467381760i,capacity=90i,checkpoint=0i,dedup=1,expand=0i,fragmentation=72i,free=1933712896i,health=6i,size=20401094656i 1612641748237439221
zpool,alternative_root=-,pool=testpool222 allocated=18467233280i,capacity=90i,checkpoint=0i,dedup=1,expand=0i,fragmentation=68i,free=1933861376i,health=2i,size=20401094656i 1612641748237439221
zpool,alternative_root=-,pool=testpool333 allocated=139776i,capacity=0i,checkpoint=0i,dedup=1,expand=0i,fragmentation=0i,free=10200407552i,health=0i,size=10200547328i 1612641748237439221
zpool_device,device=testpool111,pool=testpool111 checksum_errors=0i,health=5i,notes="insufficient replicas",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_device,device=/home/isaac/disk3.img,pool=testpool111 checksum_errors=0i,health=5i,notes="corrupted data",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_device,device=/home/isaac/disk4.img,pool=testpool111 checksum_errors=0i,health=0i,notes="",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_errors,pool=testpool111 errors="No known data errors",errors_found=0i 1612641748251082882
zpool_scrub,pool=testpool222 bytes_repaired=0u,errors_found=0i 1612641748251082882
zpool_device,device=testpool222,pool=testpool222 checksum_errors=0i,health=2i,notes="",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_device,device=mirror-0,pool=testpool222 checksum_errors=0i,health=0i,notes="",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_device,device=/home/isaac/disk1.img,pool=testpool222 checksum_errors=0i,health=0i,notes="",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_device,device=/home/isaac/disk2.img,pool=testpool222 checksum_errors=0i,health=0i,notes="",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_device,device=mirror-1,pool=testpool222 checksum_errors=0i,health=2i,notes="",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_device,device=/home/isaac/disk5.img,pool=testpool222 checksum_errors=18i,health=5i,notes="corrupted data",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_device,device=/home/isaac/disk6.img,pool=testpool222 checksum_errors=0i,health=0i,notes="",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_errors,pool=testpool222 errors="No known data errors",errors_found=0i 1612641748251082882
zpool_device,device=testpool333,pool=testpool333 checksum_errors=0i,health=0i,notes="",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_device,device=mirror-0,pool=testpool333 checksum_errors=0i,health=0i,notes="",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_device,device=/home/isaac/disk7.img,pool=testpool333 checksum_errors=0i,health=0i,notes="",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_device,device=/home/isaac/disk8.img,pool=testpool333 checksum_errors=0i,health=0i,notes="",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_errors,pool=testpool333 errors="No known data errors",errors_found=0i 1612641748251082882

Telegraf Run Example

This is a sample telegraf exec input that assumes the binary has been installed to /usr/local/bin/telegraf-exec-zpool-status and the TextFSM template to /etc/telegraf/zpool_status_template.txt:

[[inputs.exec]]                                                                 
  commands = ["/usr/local/bin/telegraf-exec-zpool-status --template=/etc/telegraf/zpool_status_template.txt"]
  timeout = "5s"                                                                
  data_format = "influx"      

Then in InfluxDB:

> show field keys from zpool
name: zpool
fieldKey      fieldType
--------      ---------
allocated     integer
capacity      integer
checkpoint    integer
dedup         float
expand        integer
fragmentation integer
free          integer
health        integer
size          integer
> show tag keys from zpool
name: zpool
tagKey
------
alternative_root
host
pool
> show field keys from zpool_device
name: zpool_device
fieldKey        fieldType
--------        ---------
checksum_errors integer
health          integer
notes           string
read_errors     integer
slow_ios        integer
write_errors    integer
> show tag keys from zpool_device
name: zpool_device
tagKey
------
device
host
pool
> show field keys from zpool_scrub
name: zpool_scrub
fieldKey       fieldType
--------       ---------
bytes_repaired integer
errors_found   integer
> show tag keys from zpool_scrub
name: zpool_scrub
tagKey
------
host
pool
> show field keys from zpool_errors
name: zpool_errors
fieldKey     fieldType
--------     ---------
errors       string
errors_found integer
> show tag keys from zpool_errors
name: zpool_errors
tagKey
------
host
pool

Health Mapping

In order to facilitate graphing I express the health as an integer. Based on the man page I identified the following states to map:

State Integer
ONLINE 0
OFFLINE 1
DEGRADED 2
FAULTED 3
REMOVED 4
UNAVAIL 5
SUSPENDED 6

The default value if a match isn't found is 99.

Future Work

Once influxdata/telegraf#6724 is merged the zpool -H -p functionality would be redundant and a native telegraf plugin could be used. However the zpool status -s -p functionality is outstanding and would be a useful addition.

Tests should be added especially considering the sensitivity of parsing.

Releases

No releases published

Packages

No packages published

Languages