Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review MaRDI infrastructure #47

Open
1 task
physikerwelt opened this issue Oct 31, 2024 · 3 comments
Open
1 task

Review MaRDI infrastructure #47

physikerwelt opened this issue Oct 31, 2024 · 3 comments

Comments

@physikerwelt
Copy link
Member

The main MaRDI server caused problems when migrating to the next OpenStack version.
We should review it the current setup with a single instance docker-compose setup is still suitable for the growing size of data, users, and bot usage.
Also maybe we can establish a community of practice within NFDI (and EOSC) to run infrastructure like the MaRDI portal.

See also:

@eloiferrer
Copy link
Member

These are some of the problems I could think of with the current setup:

  • Our instance is booting from the single volume we use (3000Gb). If for some reason (maintenance) the volume cannot be properly reattached after being detached, this can result in the instance not being able to boot.
  • DB backups are saved in the same single volume. Ideally these should be saved in a smaller volume from which we can easily create a snapshot.
  • Deployment process results in (minutes?) of down time.
  • Backup processes sometimes seem to also block or even bring down the portal.

Given this situation, it would make sense to migrate into a solution in which we use different OpenStack instances to run the different services. This could managed with Kubernetes with a fixed set of instances in OpenStack. One instance would act as a control plane to manage the cluster and some further worker nodes would be there for the different services.

For instance, there would be a node running the databases (mysql, mongo, elasticsearch) and a volume attached to it. Other nodes would not require persistent storage. So, the different services could be grouped in an instance depending on the resources they need. Ideally, we would provision these instances with Terraform or equivalent + Ansible, to quickly spin them up if we lose one of them.

If we get to this point, we could then assess whether we want Kubernetes to automatically autoscale new instances in OpenStack, but that would a next step.

@physikerwelt
Copy link
Member Author

@eloiferrer I think it becomes to complicated when we manage OpenStack and kubernetes separately. I did experiment with docker-composer stack (mostly to create a master slave db architecture, which would be especially helpful for the backup problem), but my feeling was that managing the network connection was too complicated. Thus I think we should aim for an integrated solution that manges the kubernetes cluster from within OpenStack. We can test magnum in our horizon.wikimedia.org cluster see https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Magnum

@eloiferrer
Copy link
Member

Sounds good, but we should also check whether Magnum is supported in our OpenStack environment, as it is currently not activated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants