-
Notifications
You must be signed in to change notification settings - Fork 3
Metrics
Our own implementation, which uses plaintext graphite protocol.
counter - cumulative value, always increments
gauge - uses last set value
histogram (since metrics 0.13.0, timing was used before) - sends average value (uses all values, collected since previos metrics sending)
Uses this implementation of exporter.
client - metrics, which are related to FORCE operations (FORCE operation is operation requested by one node to send data to other node (this operation is performed during data replication))
- put_count
- put_error_count
- put_timer
- get_count
- get_error_count
- get_timer
- exist_count
- exist_error_count
- exist_timer
Imagine that you have cluster with 2 nodes:
- node1
- node2
Data is not replicated (quorum = 1) but sharded (shard is defined by key % 2
operation). You use only node1 to put data.
You put 100 records (from 1 to 100 inclusive), so node1 will put 50 records locally and send 50 requests to put data to node2 (and node2 put_count will be 50 due to these requests (or if errors occur, put_count would be less, and put_error_count = 50 - put_count
for this example)).
All client
metrics for node1 will be equal to 0.
For get/exist operation metrics are counted in the same way.
grinder - cluster requests metrics related with current node (note: put operation is failed (put_error_count increments) only if remote alien put operation failed AND local alien put operation also failed)
- put_count
- put_error_count
- put_timer
- get_count
- get_error_count
- get_timer
- exist_count
- exist_error_count
- exist_timer
Imagine that you have cluster with 2 nodes:
- node1
- node2
Data is not replicated (quorum = 1) but sharded (shard is defined by key % 2
operation). You use only node1 to put data.
You put 100 records (from 1 to 100 inclusive), so node1 will have 100 cluster put operations and grinder.put_count
will be equal to 100 (in case of errors will be equal to 100 - [amount of errors]
).
node2 will have grinder.put_count
equal to 0.
For get/exist operation metrics are counted in the same way.
pearl - counts ALL disk operations, related to current node
- put_count
- put_error_count
- put_timer
- get_count
- get_error_count
- get_timer
Imagine that you have cluster with 2 nodes:
- node1
- node2
Data is not replicated (quorum = 1) but sharded (shard is defined by key % 2
operation) BUT node2 is not accessible. You use only node1 to put data.
You put 100 records (from 1 to 100 inclusive), so node1 will put 50 keys locally and 50 keys in local alien (due to node2 inaccessibility). There are 50 + 50 = 100 put operations on disk, so pearl.put_count
= 100 (in case of errors will be equal to 100 - [amount of errors]
).
For get operations these metrics are more complicated.
Holder in bob is an instance of Pearl storage, which is responsible for some concrete timestamp. If you already have holders for 10 timestamps (10 holders), so one get (any) operation will return:
-
get_count
= 0,get_error_count
= 10, if data doesn't exist in bob; -
get_count
= 1,get_error_count
= e (e in [0, 9], depends on how many holders would be scanned before record will be found), if data exists in storage.
One gauge value (nodes_number
), which in every moment counts amount of accessible nodes for current node.
backend - metrics, which describe node backend state and stats
Metric | Description |
---|---|
backend_state | 0 - starting, 1 - started |
blob_count | (blob files count, doesn't includes aliens) |
alien_count | (alien blob files count) |
index_memory | RAM occupied by indexes |
active_disks | active disks amount |
disks.diskX | describe state of diskX (0 - not ready, 2 - initialized, 3 - works) |