-
Notifications
You must be signed in to change notification settings - Fork 1
Recipe: Rollback During Startup
Here we want to show that a former leader will rollback its stale entries in the case where it crashed with uncommitted entries and it had a more advanced log than all of its followers. However, on restart this leader is last to start up.. so it cannot win the election. Once it is able to join, the cluster’s new term value must force the old leader to roll back its uncommitted entries from the (old) term in which it was the leader.
This recipe shares initial steps with Completing an Uncommitted Write Following a Reboot
This recipe will start off by using the preparation and first 3 steps from the recipe Completing an Uncommitted Write Following a Reboot.
Be sure to capture the leader UUID from “Completing an Uncommitted Write Following a Reboot” and the default client request timeout (which should have been set to 1 second).
"pumice_db_test_client" : {
"pmdb-test-apps" : [
{
"app-user-id" : "0771f672-0748-11eb-a0df-90324b2d1e89:0:0:0:0",
"status" : "Connection timed out",
"pmdb-seqno" : 0,
"pmdb-write-pending" : false,
"last-request" : "Tue Oct 06 21:40:36 UTC 2020",
"last-request-duration-ms" : 3011,
"last-request-tag" : 1539832552,
"app-sync" : false,
"app-seqno" : 1,
"app-value" : 1983817102,
"app-validated-seqno" : 0
}
],
"pmdb-request-history" : [
{
"app-user-id" : "0771f672-0748-11eb-a0df-90324b2d1e89:0:0:0:0",
"op" : "write",
"status" : "Connection timed out",
"pmdb-req-seqno" : 0,
"pmdb-seqno" : 0,
"pmdb-write-pending" : false,
"submitted-time" : "Tue Oct 06 21:40:36 UTC 2020",
"duration-ms" : 1011,
"last-request-tag" : 1539832552,
"app-seqno" : 1,
"app-value" : 1983817102
}
],
Poll waiting for the completion of the election.
Note the new leader UUID and Term.
Verify that the cluster is a sane state where the running followers and leader agree on these /raft_root_entry/ KVs:
- "term" :
- "commit-idx" : 1
- “last-applied" : 1
- “last-applied-cumulative-crc" :
- "newest-entry-idx" :
- "newest-entry-term" :
- "newest-entry-data-size" :
- "newest-entry-crc" :
##6. Start the Last Peer (which was the leader from Step #1) Poll waiting for it to become a follower and for its commit-idx become 1.
"raft_root_entry" : [
{
"raft-uuid" : "2b310920-081c-11eb-811b-90324b2d1e89",
"peer-uuid" : "2b31df8a-081c-11eb-bc78-90324b2d1e89",
"voted-for-uuid" : "00000000-0000-0000-0000-000000000000",
"leader-uuid" : "2b3271b6-081c-11eb-890a-90324b2d1e89",
"state" : "follower",
"follower-reason" : "leader-already-present",
"client-requests" : "redirect-to-leader",
"term" : 18,
"commit-idx" : 1,
"last-applied" : 1,
"last-applied-cumulative-crc" : 2733010441,
"newest-entry-idx" : 1,
"newest-entry-term" : 18,
"newest-entry-data-size" : 0,
"newest-entry-crc" : 3109780162,
"dev-read-latency-usec" : {},
"dev-write-latency-usec" : {
"1024" : 1
}
}
],
No new object should have been written in this recipe since the request had timed out before the cluster could commit the write. Therefore, issuing a read operation for the object should result in an error of “No such file or directory”.
Execute Step #8 from “Completing an Uncommitted Write Following a Reboot” but use these verifications instead:
- "pmdb-seqno" : 0,
- "app-user-id" :
RNCUI used in write request
- "op" : "read",
- "status" : "No such file or directory",