Setting up an on-premise production environment for Polis (Recommended General Guide) #1319

lochnessgitmonster · 2022-02-19T22:42:31Z

lochnessgitmonster
Feb 19, 2022

My university is currently looking at deploying a local on-premise production environment for Polis. Initially we'll be looking to run it for a couple of pilot research projects but in the longer term we would also like to look at offering it as a common service that can be leveraged by any research group within the university who may want to use Polis.

Looking at https://github.com/compdemocracy/polis it suggests that the instructions that are currently available are only suitable to use as a guide for development environments. I was wondering what additional considerations we should be looking at to make an appropriate production level deployment?

Answered by metasoarous

Mar 3, 2022

Hi @lochnessgitmonster.

Thanks for asking about this. We get this question a lot, so it's great to have a public record here.

The answer greatly depends on what scale you're hoping to achieve. Regardless of that, however, you'll need to be running:

A PostgreSQL database
A CDN or static file server for static assets (compiled JS, etc)
A math worker node
Some number of server nodes, depending on scale
If more than one server node, a load balancer to distribute requests

For a relatively small deployment (only a single conversation at a time, with only a few hundred participants at a time or so), you can probably get by with a single server and relatively small math worker. The server nodes…

View full answer

metasoarous · 2022-03-03T20:35:01Z

metasoarous
Mar 3, 2022
Maintainer

Hi @lochnessgitmonster.

Thanks for asking about this. We get this question a lot, so it's great to have a public record here.

The answer greatly depends on what scale you're hoping to achieve. Regardless of that, however, you'll need to be running:

A PostgreSQL database
A CDN or static file server for static assets (compiled JS, etc)
A math worker node
Some number of server nodes, depending on scale
If more than one server node, a load balancer to distribute requests

For a relatively small deployment (only a single conversation at a time, with only a few hundred participants at a time or so), you can probably get by with a single server and relatively small math worker. The server nodes can be pretty light weight, but the math worker should have at least 1GB of memory (it's possible to run on 0.5GB, but you're likely to end up having to use swap, so performance will not be ideal).

For larger conversations (or multiple conversations at a time), you'll need to be able to scale the number of server nodes, and to increase the amount of memory on the math worker (8GB should get you pretty far). If you have multiple conversations running at once, it can help to have multiple cores available on the math worker, which can take advantage of threading to process in parallel.

If your participant body is broadly distributed geographically, you might specifically consider a CDN and caching layer (like Cloudflare+AWS S3) for static assets to improve load time. But for something small and/or local, using a basic static file server should be fine (though frankly, setting up a CDN is pretty easy, and may be the better way to go overall).

Docker infrastructure

We do have docker infrastructure available for setting up the necessary environments for each of these components, and are currently using them ourselves for the math worker and server components (via Heroku). Additionally, there are dockerfiles for setting up PostgreSQL and file server images, which we don't currently use in production, but which could be used.

The README instructions for setting up a development environment make clear that they are not intended for setting up a production deployment, but are probably a bit strongly worded, as they may lead people to believe that the docker infrastructure itself should not be trusted. We should reorganize that a bit to include a "Production Deployment" section, which might highlight some of what I've described above (or even just point to this discussion); If someone wants to submit a PR for that, we'd be very grateful.

That having been said, the piece of the "docker infrastructure" writ large which is not fit for production (at the moment) is specifically the docker compose infrastructure. Specifically, the main docker-compose.yml file has things like ports open which should not be in production. This could be gotten around by using a firewall, but there may be other aspects of the setup which would not be appropriate for a production deployment; In short, it just hasn't been fully vetted for the purpose yet.

Right now, we're in the process of splitting out development concerns into a separate docker-compose.dev.yml file which would isolate the development specific configuration. There's also a docker-compose.debug.yml file from a separate branch with similar intent which should get merged into the .dev.yml file. The other thing which would be necessary to support larger production deployments in the main docker-compose.yml is a load balancing layer. I spent some time looking into how we might do this a while ago, but haven't been able to follow up (I've been juggling too many balls for a while now). Collectively, this work is being tracked in issues #555 and #583.

In conclusion

So for those looking to set up a production deployment, there are a few potential paths you might take:

Set up all the infrastructure yourself, using the Dockerfile and docker-compose.yml assets as guides, taking care not to cargo cult anything unnecessary (this is not recommended, unless you have a lot of time on your hands, or have very specific deployment constraints for which Docker is inappropriate).
Use the Dockerfiles to deploy, and the docker-compose.yml file only as a reference (again, taking care not to cargo cult unnecessary port exposure and such).
- How these pieces are orchestrated is entirely up to you; You could potentially use something like rancher or kubernetes.
Take the time to extract the dev-only artifacts of the docker-compose.yml into docker-compose.dev.yml, and deploy using docker-compose/docker-swarm.
- If scaling the number of servers is necessary, adding a load balancer would also be required.
As we're doing, use the Heroku platform to deploy, using the heroku.yml manifest, which is set up to build math & server docker images from the Dockerfiles in this repo
- It's possible to use addons to have Heroku also set up the database for you from this manifest, but manually attaching the database is not too difficult.
- There's been some work to automatically run the migrations on deploy, but this hasn't been finalized, so you'll need to do this manually as well.
- Static assets still need to be separately served, which we currently do through S3+cloudflare. We're also currently working on automating this through heroku, but this setup may end up being a bit opinionated for most people's tastes.
- Keep in mind that Heroku charges quite a premium on top of what you would pay just running on AWS, but also solves a lot of operational headache, and takes care of things like load balancing for you.

For what it's worth, explicit support for production deployment with docker-compose is still somewhat new (just a year or two), so you may find it a bit challenging still to find good documentation and such on this. Our position has been for a while though that this is the right way to go for turnkey deployment support, since sharing as much infrastructure as possible between dev and production is generally a good thing. This picture has gotten a little more complicated of late with the issues around Docker Desktop (necessary if you're on Mac or Windows; Linux is unaffected by this) switching it's licensing model to require payment for large and/or well funded companies; There's been some grumbling and discontent over this in the dev community, and so we're a little more open to alternative approaches or technology, but for now enough of the work is already done that we're unlikely to do the work of making a switch ourselves any time soon unless the situation changes significantly (but if you're inspired to submit a PR, please open an issue so we can discuss). There are some other approaches emerging to replace the need for Docker Desktop (e.g. see #1135), so there may be other ways around this as well. Again, for now the plan is to adapt the docker-compose infrastructure as described above into a turn-key ready deployment option.

I hope this gives a good outline of what is needed in the full spectrum of cases; Obviously there's a bit of choose your own adventure here, and again, the details depend quite a bit on your situation. Please respond with comments if you have any further questions which might, and feel free to open up any issues or submit PRs which might help us get closer to our ultimate goal of turn-key deployment as you go.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting up an on-premise production environment for Polis (Recommended General Guide) #1319

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Setting up an on-premise production environment for Polis (Recommended General Guide) #1319

lochnessgitmonster Feb 19, 2022

Replies: 1 comment

metasoarous Mar 3, 2022 Maintainer

Docker infrastructure

In conclusion

lochnessgitmonster
Feb 19, 2022

metasoarous
Mar 3, 2022
Maintainer