You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The streaming replication configuration for the other two standby servers is set to "ANY 1" meaning that there's a possibility one of them may have outdated data.
synchronous_standby_names
---------------------------------------------------------------------------------------------------------------------
ANY 1 (server_01,server_03)
Problem Statement
When the cluster is shut down and then restarted, during the pre-promote stage, the LSN location obtained for each node is the starting point of the last segment in the local pg_wal folder, which is inconsistent with the actual LSN location in each node's own pg_wal folder.
We all know that the standby's master score update has a certain interval, creating a time window during which the master score cannot reflect the true lag between the standby and primary. Then, when the cluster starts, if the old primary doesn't start up promptly, it can lead to the other two nodes participating in the election. The node with lowest node name (alphanumeric sort) of master score, among those with a master score of 1000, is promoted first. However, at this point, the LSN location cannot reflect the true LSN location. This scenario might result in electing a stale copy as the new master, leading to data loss.
Oct 08 09:16:22.002567 server_02 pgsqlms(test-db)[64293]: INFO: Promoting instance on node "server_02"
Oct 08 09:16:22.123181 server_02 pgsqlms(test-db)[64314]: INFO: Current node TL#LSN: 6#4362076160 <<<<<<< 0x104000000
Oct 08 09:16:22.129258 server_02 pacemaker-attrd[21903]: notice: Setting lsn_location-test-db[server_02]: (unset) -> 6#4362076160
Oct 08 09:16:22.141162 server_02 pacemaker-attrd[21903]: notice: Setting nodes-test-db[server_02]: (unset) -> server_01 server_02 server_03
Oct 08 09:16:22.158226 server_02 pacemaker-attrd[21903]: notice: Setting lsn_location-test-db[server_01]: (unset) -> 6#4362076160 <<<<<<< 0x104000000
Oct 08 09:16:22.426240 server_02 pacemaker-attrd[21903]: notice: Setting lsn_location-test-db[server_03]: (unset) -> 6#4362076160 <<<<<<< 0x104000000
Root cause analysis
If the standby instance remains without a primary instance to synchronize with, then the value of pg_last_wal_receive_lsn() will always be the initial value, which is the starting point of the last segment in the local pg_wal folder.
Solution
In most cases, the pg_last_wal_receive_lsn() can accurately retrieve the last received LSN location.
We choose pg_last_wal_receive_lsn() because a standby can lag replaying WAL based eg. on its read only activity. That means the standby that received more data from the primary than the others might have replayed less of them during the monitor or promote action.
Therefore, we still use this pg_last_wal_receive_lsn() of obtaining the LSN location in most situations.
In the scenario where the entire cluster has just restarted:
If the last three bytes (or six hexadecimal digits) of the last received LSN are zeros, indicating that the LSN is the starting point of the last WAL segment in the local pg_wal folder, then the current LSN is not accurate.
In this case, we query the last replayed LSN and compare it with the last received LSN. If the value of the last replay LSN is greater than the last received LSN, we use the last replay LSN as the LSN location.
Note: The scope of the changes only involves scenarios where the cluster is restarting.
When the standby is restarted, it must replay the transaction log to bring the database tables back to their correct state.
So in this scenario, the last replayed LSN is accurate.
pg_is_in_recovery()
pg_last_wal_receive_lsn()
pg_last_wal_replay_lsn()
t
1/86000000
1/862B9CC0
@ioguix We've discussed this issue: #225 before, and now I've limited this change solely to the cluster startup scenario. So please take another look at my PR:#227, and I really appreciate it.
The text was updated successfully, but these errors were encountered:
Test environment
3 nodes cluster with DB synchrouous replication enabled.
The streaming replication configuration for the other two standby servers is set to "ANY 1" meaning that there's a possibility one of them may have outdated data.
Problem Statement
When the cluster is shut down and then restarted, during the pre-promote stage, the LSN location obtained for each node is the starting point of the last segment in the local pg_wal folder, which is inconsistent with the actual LSN location in each node's own pg_wal folder.
We all know that the standby's master score update has a certain interval, creating a time window during which the master score cannot reflect the true lag between the standby and primary. Then, when the cluster starts, if the old primary doesn't start up promptly, it can lead to the other two nodes participating in the election. The node with lowest node name (alphanumeric sort) of master score, among those with a master score of 1000, is promoted first. However, at this point, the LSN location cannot reflect the true LSN location. This scenario might result in electing a stale copy as the new master, leading to data loss.
Root cause analysis
If the standby instance remains without a primary instance to synchronize with, then the value of pg_last_wal_receive_lsn() will always be the initial value, which is the starting point of the last segment in the local pg_wal folder.
Solution
In most cases, the pg_last_wal_receive_lsn() can accurately retrieve the last received LSN location.
Therefore, we still use this pg_last_wal_receive_lsn() of obtaining the LSN location in most situations.
In the scenario where the entire cluster has just restarted:
If the last three bytes (or six hexadecimal digits) of the last received LSN are zeros, indicating that the LSN is the starting point of the last WAL segment in the local pg_wal folder, then the current LSN is not accurate.
In this case, we query the last replayed LSN and compare it with the last received LSN. If the value of the last replay LSN is greater than the last received LSN, we use the last replay LSN as the LSN location.
Note: The scope of the changes only involves scenarios where the cluster is restarting.
When the standby is restarted, it must replay the transaction log to bring the database tables back to their correct state.
So in this scenario, the last replayed LSN is accurate.
@ioguix We've discussed this issue: #225 before, and now I've limited this change solely to the cluster startup scenario. So please take another look at my PR:#227, and I really appreciate it.
The text was updated successfully, but these errors were encountered: