-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
score cycling #126
Comments
Hi, Master scores are delivered by the current master of your cluster based on its local data from Of course, you can have some lags between what the master knows and what the standby really received. Moreover, this knowledge might be wrong an instant later. That's why during failover, the designated master recheck its last LSN with other existing standby to make sure it is the best one. |
Yes that is what I was assuming, that the failover as designed is doing the right thing. The reason I wrote the bug is because the bid cycling and the noisy INFO level spamming it produces. For certain in the prioritized sync commit case, the highest priority standby will always be the best choice so any other scoring is just a distraction. Perhaps that log message is no longer needed or alternatively it should just be lowered to debug level. Thoughts? |
In fact, there is a lot a asynchronous piece in this puzzle:
All in all, you could perfectly have a non synchronous standby having synch'ed all WAL entries sent before the one being declared synchronous... Maybe you could try to specify the master must be sync with your 2 standbies? https://www.postgresql.org/docs/9.6/static/runtime-config-replication.html#GUC-SYNCHRONOUS-STANDBY-NAMES |
Thanks for the insight. Keep in mind I am interested in supporting minimal data loss. Performance is secondary to that. I perceive the way to do that with postgres is with sync commit. Bringing in a agent to dynamically pick a slave in any general sort of way is problematic but it seems to me that by using priority commit we can bring some order to our chaotic distributed world if willing to trade-off some performance for that, which I am (and I would think anyone seriously considering sync commit would also). So let me frame this again. It's true there are lots of places for async behavior due to the nature of how the pieces fit together. However if I enable sync commit with priority then it seems like we know exactly which online standby should be cutover because we have told the master which standby that must be synced before a commit will complete (if that standby is online). I am suggesting 1) this is a reasonable and helpful configuration option 2) an agent like PAF could take advantage of these constraints since the information is all in the postgres.conf file, and behave accordingly. This means all pg agents can get to consensus quickly on which standby to attempt to promote without a lot of querying because the priority of the standbys is static. I think this is what it means to an agent that wants to support priority-sync-commit
To digress a little, I apologize if stating the obvious; load balancing / performance vs minimal data loss are competing requirements; can't have both. But it seems like we can choose which one we want and let the agent act accordingly. Or maybe we need a different agent altogether but it seems like there is a lot in common with the current PAF agent and it seems like he is attempting to handle valid postgres configurations. If we can agree on this use case and a general approach, and I can get a little guidance as Im not yet an agent or a PAF guru, I'm happy to implement and contribute back. Comments to your notes are below.
Maybe you could try to specify the master must be sync with your 2 standbies? https://www.postgresql.org/docs/9.6/static/runtime-config-replication.html#GUC-SYNCHRONOUS-STANDBY-NAMES Yes I considered that but it also seems to complicate things in other ways that worry me. I think it requires 2 standbys online at all times or a master commit would hang. Also means to wait for each to complete before completing the master commit. That seems like a big hit when really all I require is that each commit be synced to one standby (the first standby) providing reasonable protection from data loss in case of master failover. |
Hi, Sorry for getting back to you so lately, I just realized this answer was on my todo-list since 4 weeks now :/
We are agree.
No, I meant "simultaneous". They do not sync WAL all in the same time on disk. But anyway, this is the most important in this discussion I think:
My main concern is race-condition here. If another standby received more wals from the master than the one in sync priority, you'll have to rebuild or rewind it. No concern about transactions though, obviously. You would have to deal with this, but I feel like you are ready for this. So let's move on. Next, the master scores are distributed to standbies from the current pgsql master during its recurring "monitor" action. If you want to set the highest score to the standby in sync with the better priority whatever its lag, should a failover happen, you will have to recheck the status among the cluster because the standbies priority/sync might have change in the meantime between the last monitor action on the master and the failure on the master. In short: we'll probably not be able to shortcut the election process. If we want to pay attention to the priority/status during a failover when the synchronous replication is setup, we will have to be able to find exactly what standby had the best priority and was in sync after the master disappeared. That means standbies would have to compare their last known status instead of their last received LSN/location. Right now, I'm not sure if this is possible with existing stats in core. We might have to implement an extension to report this from standbys. And even if we consider this option, not sure the information is available on standby side thought :/ Discussion around this issue is appreciated, do not hesitate if you have some more tech ideas about this. Moreover, if you have some time to dig how we could gather last known status from the standbies themselve with precision, do not hesitate to share your information here. I might be able to find some limited time next week to experiment a little thought. |
PAF Version: 2.2.0-2, 3 node cluster, sync commit, fixed priority
First I've implemented sync commit with the intention that I reduce or eliminate data loss on master failover to standby.
I implemented a fixed priority for my standbys (pghost1, pghost2, pghost3) thinking that would make things a bit more reliable on failover so that 1 standby would always be the most current and become the master and the 2nd standby could then immediately catch up and become the new standby as soon as it re-connected to the new master. IOW, a failover would lead to only a few seconds where the cluster did not have a functional standby. That is pretty much the case from my testing so far with PAF and sync commit and its almost a thing of beauty. Not to digress too far on this, but I'm pretty sure that its conceptually feasible to have this same behavior without prioritized commit; I'm just trying things to see what actually works and so far, this seems to be behaving better than just sync commit and letting postgres pick which standby to write to arbitrarily. I really think it all comes down to how PAF picks the standby and the timing of that decision (yeah I know distributed decisions are hard because facts change).
However, I've noticed something in the logging that concerns me. Turns out this happens with or without sync commit. PAF is apparently adjusting resource scores just based on lag.
Below master is pghost1
It's not yet clear to me PAF is compatible with prioritized sync write, without avoidable data loss (e.g. it picking a server that has lower lag but is in fact farther behind). It seems to me, if I want to optimize to minimize data loss, the score should be based on replication state (pg_last_xlog_receive_location?). Please advise. I'm not certain I am right about this but mostly reacting the log messages above and of course my desire to minimize data loss on failover.
The text was updated successfully, but these errors were encountered: