This directory contains the collectors that are used to collect data from given MongoDB.
- CurrentOp Collector
- Oplog status Collector
- Replication status Collector
- Sharding status Collector
- Top command Collector
- Rollback status Collector
- LVM snapshot status Collector
CurrentOp collector collects the current operation details from the given MongoDB from the currentOp command.
It can be used to collect slow queries, long-running queries, and other operations that are currently running on the MongoDB.
Metrics ends with _total such as slow_query_count_total
are total values for a server. Otherwise, it will export with the label database
and collection
.
The collector collects below metrics:
- slow_query_count(_total): The number of running slow queries.
- longest_running_query_secs(_total): The longest running query in seconds.
- collscan_count(_total): The number of running collscan queries.
- waiting_for_lock_count(_total): The number of queries that are waiting for a lock.
- waiting_for_latch_count(_total): The number of queries that are waiting for a latch.
- waiting_for_flow_control_count(_total): The number of queries that are waiting for flow control.
- transaction_count(_total): The number of running transactions.
Query example:
db.adminCommand({aggregate: "currentOp", pipeline: [{$match: ...}]})
Result example:
{
cursor: {
firstBatch: [
{
op: "query",
microsecs_running: 1000000,
secs_running: 1,
ns: "test.test",
command: {
find: "test",
filter: {
_id: 1
}
},
msg: "some message",
planSummary: "IDHACK",
waitingForLock: false,
waitingForLatch: null,
waitingForFlowControl: false,
transaction: null,
},
...
]
}
}
source code: currentop.go #L38
Oplog status collector collects the oplog status from local.oplog.rs. It will be automatically disabled if the given MongoDB is mongos.
The collector collects below metrics:
- logSizeMB: The size of the oplog in megabytes.
- usedMB: The used size of the oplog in bytes.
- firstTs: The timestamp of the first entry in the oplog.
- lastTs: The timestamp of the last entry in the oplog.
- timeDiff: The time difference between the first and last entry in the oplog.
timeDiff
metric is useful when you should know about minimum time range for replication. It can be used to check available PITR (Point-in-Time Recovery) or minimum replication lag.
Query example:
use config
var firstElem = db.oplog.rs.find().sort({$natural: 1}).limit(1)
var lastElem = db.oplog.rs.find().sort({$natural: -1}).limit(1)
var firstTs = firstElem[0].ts
var lastTs = lastElem[0].ts
var timeDiff = lastTs.t - firstTs.t
source code: oplog.go #L37
Replication status collector collects the replication status from the given MongoDB.
The collector collects below metrics:
- heartbeat_delay: The delay of the heartbeat in seconds.
- lag: The replication lag in seconds.
- odd_state: Check the server's state is odd or not. Odd state means the server is not in PRIMARY, SECONDARY or STARTUP2 state.
- elected_before_secs: The time in seconds since the last election.
- version: The version of the replica set.
- term: The term of the replica set.
- protocolVersion: The protocol version of the replica set.
- arbiterOnly: The arbiterOnly status of the replica set.
- buildIndexes: The buildIndexes status of the replica set.
- hidden: The hidden status of the replica set.
- priority: The priority of the replica set.
- votes: The votes of the replica set.
- role: The server's replica role of the replica set. It will be exported with the label
role
, which value can beprimary
,secondary
,other
.
Query example:
var replStatus = db.adminCommand({replSetGetStatus: 1}, {initialSync: 1})
var replConfig = db.adminCommand({replSetGetConfig: 1})
Query outputs are shown in replSetGetStatus and replSetGetConfig documents.
source code: repl.go #L35
Sharding status collector collects the sharding status from configsvr. Sharding status can be collected from config servers. This gives an overview of the sharding status
The collector collects below metrics:
- sharded_databases: The number of sharded databases.
- unsharded_databases: The number of unsharded databases.
- balancer_enabled: The balancer status of the sharding.
- shards: The number of shards.
- draining_shards: The number of draining shards.
- chunks: The number of chunks. Chunks will be exported with the label
database
,collection
andshard
. - last_24h_chunk_moves: The number of chunk moves in the last 24 hours. Chunk moves will be exported with the label
database
andcollection
.
source_codes: sharding.go #L32
Top command collector collects the top command details from the given MongoDB. It can be used to collect the top command details such as query
, insert
, update
, remove
, getmore
and command
.
The collector collects below metrics:
- insert_count: The number of insert commands.
- insert_time: The time of insert commands.
- queries_count: The number of query commands.
- queries_time: The time of query commands.
- update_count: The number of update commands.
- update_time: The time of update commands.
- remove_count: The number of remove commands.
- remove_time: The time of remove commands.
- getmore_count: The number of getmore commands.
- getmore_time: The time of getmore commands.
- commands_count: The number of command commands.
- commands_time: The time of command commands.
Query example:
db.adminCommand({top: 1})
source code: top.go #L34
Rollback status collector collects the rollback status from the given MongoDB. It observers the rollback files of each collections.
Rollback is a process that restores the data to a previous state.
MongoDB will automatically generate rollback files in {{dbpath}}/rollback directory when the rollback occurs. The collector will check the rollback files of each collection. Collector will export with the label database
and collection
.
The collector collects below metrics:
- rollback_directory: The directory of the rollback files for each collection.
source code: rollback.go #L36
LVM snapshot status collector collects the percentage of the used LVM snapshot space. If the LVM snapshot space reaches 100%, your backup will fail.
The collector collects below metrics:
- snapshot_allocation: The percentage of the used LVM snapshot space.
Used command:
df | awk '/<your snapshot area>$/'
sudo lvs | awk '$6!~/[^0-9.]/&&$6>0{print$6}'
source code: lvm.go #L38
Instance status collector collects the binary version of the MongoDB instance.
The collector collects below metrics:
- version: The binary version of the MongoDB instance.
Query example:
db.adminCommand({buildInfo: 1})
source code: instance.go #L39