Skip to content

Recipe: PMDB Coalesced Write Discard Old Leader Uncommitted Writes

Paul Nowoczynski edited this page Sep 29, 2021 · 1 revision

PMDB Coalesced Write Discard Old Leader Uncommitted Writes

Default Parent Recipe: “Healthy Raft Server Cluster: Type 1”

Compatibility: pumicedb-server-test

Additional Requirements:

  • pumicedb-client-test

Objective:

Create a scenario such that the PMDB client write requests will expire as leader paused.

Preparation

Store the leader-uuid, term and commit-idx.

Select a leader for pausing here. Prior to pausing the leader, that leader’s coalesced_writes mode is enabled to write requests that expire as the client will try to send requests. Add the following fault to the leader which will be paused:

APPLY enabled@true
WHERE /fault_injection_points/name@coalesced_writes
OUTFILE /err.out

After applying the fault, ensure that /fault_injection_points/coalesced_writes/enabled == true

1. Start the PumiceDB Client

The pumicedb-client-test should be newly spawned in this recipe. The recipe should be willing to wait for a few seconds for the client to report "leader-viable" : true.

{
	"raft_client_root_entry" : [
                {
                        "raft-uuid" : "3f266232-fde4-11ea-86f8-90324b2d1e89",
                        "client-uuid" : "3f28d148-fde4-11ea-9c5f-90324b2d1e89",
                        "leader-uuid" : "3f27d9fa-fde4-11ea-a172-90324b2d1e89",
                        "state" : "client",
                        "commit-latency-msec" : {},
                        "read-latency-msec" : {},
                        "leader-viable" : true,
                        "leader-alive-cnt" : 82,
                        "last-request-sent" : "Thu Jan 01 00:00:00 UTC 1970",
                        "last-request-ack" : "Thu Jan 01 00:00:00 UTC 1970",
                        "recent-ops-wr" : [],
                        "recent-ops-rd" : []
                }
        ]
}

Leader-viable tells the recipe that the client has been in recent contact with the leader - it is a general indicator of health.

2. Modify the default request timeout on client

After the client has been started, lower the timeout from the default to more test-friendly timeout such as 1 seconds:

APPLY default-request-timeout-sec@1
WHERE /raft_client_root_entry/default-request-timeout-sec
OUTFILE /err.out

Verify that the default-request-timeout-sec key has been modified accordingly.

{
        "raft_client_root_entry" : [
                {
                        "raft-uuid" : "7cbcf2fc-f522-11ea-8890-90324b2d1e89",
                        "client-uuid" : "7cbeb470-f522-11ea-bfde-90324b2d1e89",
                        "leader-uuid" : "7cbe3810-f522-11ea-bae4-90324b2d1e89",
                        "state" : "client",
                        "default-request-timeout-sec" : 1,
                        "commit-latency-msec" : {},
                        "read-latency-msec" : {},
                        "leader-viable" : false,
                        "leader-alive-cnt" : 0,
                        "last-request-sent" : "Thu Jan 01 00:00:00 UTC 1970",
                        "last-request-ack" : "Thu Jan 01 00:00:00 UTC 1970",
                        "recent-ops-wr" : [],
                        "recent-ops-rd" : []
                }
        ]
}

3. Issue the Writes from the Client with app-uuid1

Issue a write command from the client (the recipe should generate its own UUID to replace the one below):

APPLY input@00000000-ffff-ffff-ffff-ffffffffffff:0:0:0:0.write:1
WHERE /pumice_db_test_client/input
OUTFILE /pmdb-write.out

Wait for the +2 seconds (this should be enough time for the request to have expired).

4. Set the election-timeout on all followers

Here we set election-timeout to 2

APPLY election-timeout-ms@2
WHERE /raft_net_info/election-timeout-ms
OUTFILE /err.out

5. Pause the Leader.

5a - Verification

  • /raft_client_root_entry/0/recent-ops-wr/*/status : "Connection timed out"
 "raft_client_root_entry" : [
                {
                        "raft-uuid" : "b4deecc6-1b82-11ec-bdbd-8761d6acdca6",
                        "client-uuid" : "b569722e-1b82-11ec-b8fa-cb98d01e6c03",
                        "leader-uuid" : "b54052ea-1b82-11ec-bb9d-139055d36d46",
                        "state" : "client",
                        "default-request-timeout-sec" : 1,
                        "commit-latency-msec" : {},
                        "read-latency-msec" : {},
                        "leader-viable" : true,
                        "leader-alive-cnt" : 7,
                        "last-request-sent" : "Wed Sep 22 08:56:19 UTC 2021",
                        "last-request-ack" : "Thu Jan 01 00:00:00 UTC 1970",
                        "recent-ops-wr" : [
                                {
                                        "sub-app-user-id" : "cea81092-1b82-11ec-aaa3-5fc18afac25c:0:0:0:0",
                                        "rpc-id" : 6468388818436227143,
                                        "rpc-user-tag" : 1111036264,
                                        "blocking" : false,
                                        "status" : "Connection timed out",
                                        "server" : "0.0.0.0:0",
                                        "submitted" : "Wed Sep 22 08:56:19 UTC 2021",
                                        "attempts" : 1,
                                        "completion-time-ms" : 0,
                                        "timeout-ms" : 1000,
                                        "reply-size" : 0,
                                        "op" : "write"
                                }
                        ],
                        "recent-ops-rd" : [],
                        "pending-ops" : []
                }
        ],

6. Wait until new leader gets elected

6a - Verification

  • /raft_root_entry/0/leader-uuid != <original-leader> (which paused in step 5)
  1. Reset the client request timeout to default timeout i.e. 60
APPLY default-request-timeout-sec@60
WHERE /raft_client_root_entry/default-request-timeout-sec
OUTFILE /err.out

8. Issue the Writes from the Client with New app-uuid2

Issue a write command from the client (the recipe should generate its new UUID to replace the one below):

APPLY input@00000000-ffff-ffff-ffff-ffffffffffff:0:0:0:0.write:1
WHERE /pumice_db_test_client/input
OUTFILE /pmdb-write.out

9. Resume the Leader (which paused in step 5)

9a - Verification

Now old leader(which paused in step) should becomes “follower”

  • /raft_root_entry/0/state : "follower"

10. Make sure write with app-uuid2 should get “success” and write with app-uuid1 should gets “Connection timed out"

 "raft_client_root_entry" : [
		{
			"raft-uuid" : "b4deecc6-1b82-11ec-bdbd-8761d6acdca6",
			"client-uuid" : "b569722e-1b82-11ec-b8fa-cb98d01e6c03",
			"leader-uuid" : "b54052ea-1b82-11ec-bb9d-139055d36d46",
			"state" : "client",
			"default-request-timeout-sec" : 60,
			"commit-latency-msec" : {
				"8" : 1
			},
			"read-latency-msec" : {},
			"leader-viable" : true,
			"leader-alive-cnt" : 5,
			"last-request-sent" : "Wed Sep 22 08:56:29 UTC 2021",
			"last-request-ack" : "Wed Sep 22 08:56:29 UTC 2021",
			"recent-ops-wr" : [
				{
					"sub-app-user-id" : "f9a8cac0-1b82-11ec-a328-c7f169c767bd:0:0:0:0",
					"rpc-id" : 6468388818436227172,
					"rpc-user-tag" : 1227719982,
					"blocking" : false,
					"status" : "Success",
					"server" : "127.0.0.1:12000",
					"submitted" : "Wed Sep 22 08:56:29 UTC 2021",
					"attempts" : 1,
					"completion-time-ms" : 12,
					"timeout-ms" : 60000,
					"reply-size" : 88,
					"op" : "write"
				},
				{
					"sub-app-user-id" : "cea81092-1b82-11ec-aaa3-5fc18afac25c:0:0:0:0",
					"rpc-id" : 6468388818436227143,
					"rpc-user-tag" : 1111036264,
					"blocking" : false,
					"status" : "Connection timed out",
					"server" : "0.0.0.0:0",
					"submitted" : "Wed Sep 22 08:56:19 UTC 2021",
					"attempts" : 1,
					"completion-time-ms" : 0,
					"timeout-ms" : 1000,
					"reply-size" : 0,
					"op" : "write"
				}
			],
			"recent-ops-rd" : [],
			"pending-ops" : []
		}
	],

11. Reset default election timeout for all followers.

Reset election-timeout to its default timeout i.e 300

APPLY election-timeout-ms@300
WHERE /raft_net_info/election-timeout-ms
OUTFILE /err.out
Clone this wiki locally