Multi-datacenter deployment #809
-
I am in the process of evaluating and designing an OpenBao installation here at the Wikimedia Foundation. We have two main datacenters one of which is the primary and one which is the secondary, at any given point in time. I would like to design an OpenBao installation which allows for failing over from the primary to secondary OpenBao instance located in each datacenter. Since Raft streaming replication is not present in the OSS fork, the two options I have explored are:
Both options have trade offs, and I am suffering a bit from analysis paralysis on deciding which one to choose. Does anyone in the community have experience with similar setups, or advice on evaluation criteria which I should consider? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
@lollipopman Some thoughts... Happy to collaborate more; if you want to chat about architecture, I'm willing to have a call. Horizontal scalability is the biggest shortcoming I think we have at the moment that's not already in progress. We have the existing HA mode that I'd look to extend for this, but in this mode presently, a single leader is active and the other nodes cannot service any requests. For Raft in particular, @JanMa recently implemented non-voter node support. This theoretically allows you to do distributed Raft replication, whereby a local cluster (say, 3 voting nodes) could be mirrored by external, non-voters in other DR zones. While they still contribute to bandwidth usage (to ship the updates), they don't participate in write confirmation votes and so won't impact latency as much (whereas a voter in a different replication zone counts to quorum and thus if you have a sufficient ratio of local vs remote, you will need votes from remote nodes to continue and you risk having a remote node elected leader). You would need a DR failover process in the event all voting nodes go down with that approach, promoting some non-voters to voters to restore services. For PostgreSQL... there is HA support. What happens is nodes race to acquire a lock and it is assumed the database itself handles replication (or all nodes are connected to the same instance). I'm not quite as familiar with the properties of PostgreSQL streaming replication and how that'd impact HA mode or performance. I think a similar thing as non-voters (but with the with PostgreSQL HA lock acquisition) might also be useful to do, long-term, especially for horizontal scalability: if your node is talking to a secondary replica database, you'd probably don't want to have it attempt to become a leader and instead have anything talking to primary cluster be considered for active node status. The thing I've liked about Raft is that it is relatively known and supported upstream. PostgreSQL wasn't, and I suspect there are edge cases people just haven't run into. One that I've been told should theoretically impact us is that all data is stored in a single large, pseudo-KV table, which could result in poor performance as the number of entries grows. However, Raft also seems to run into performance problems north of 15GB+ depending on your hardware performance and who you talk to. Long-term, I'm playing the idea of splitting storage into segments based on namespaces, potentially even using different storage engines for different namespaces and distributing leadership across the cluster segmented by namespaces (so each namespaces has a single writer for strong consistency via local locking still, but you can get higher write throughput by adding additional nodes assuming your writes are distributed across different mounts in different namespaces). I am starting to put together a working group for horizontal scaling support; if you're interested in participating, let me know. |
Beta Was this translation helpful? Give feedback.
@lollipopman Some thoughts... Happy to collaborate more; if you want to chat about architecture, I'm willing to have a call.
Horizontal scalability is the biggest shortcoming I think we have at the moment that's not already in progress. We have the existing HA mode that I'd look to extend for this, but in this mode presently, a single leader is active and the other nodes cannot service any requests.
For Raft in particular, @JanMa recently implemented non-voter node support. This theoretically allows you to do distributed Raft replication, whereby a local cluster (say, 3 voting nodes) could be mirrored by external, non-voters in other DR zones. While they still contribute to bandwidth usag…