Review MaRDI infrastructure #47

physikerwelt · 2024-10-31T11:35:56Z

The main MaRDI server caused problems when migrating to the next OpenStack version.
We should review it the current setup with a single instance docker-compose setup is still suitable for the growing size of data, users, and bot usage.
Also maybe we can establish a community of practice within NFDI (and EOSC) to run infrastructure like the MaRDI portal.

See also:

Consider migration to kubernetes portal-compose#79

The text was updated successfully, but these errors were encountered:

eloiferrer · 2024-11-01T11:18:23Z

These are some of the problems I could think of with the current setup:

Our instance is booting from the single volume we use (3000Gb). If for some reason (maintenance) the volume cannot be properly reattached after being detached, this can result in the instance not being able to boot.
DB backups are saved in the same single volume. Ideally these should be saved in a smaller volume from which we can easily create a snapshot.
Deployment process results in (minutes?) of down time.
Backup processes sometimes seem to also block or even bring down the portal.

Given this situation, it would make sense to migrate into a solution in which we use different OpenStack instances to run the different services. This could managed with Kubernetes with a fixed set of instances in OpenStack. One instance would act as a control plane to manage the cluster and some further worker nodes would be there for the different services.

For instance, there would be a node running the databases (mysql, mongo, elasticsearch) and a volume attached to it. Other nodes would not require persistent storage. So, the different services could be grouped in an instance depending on the resources they need. Ideally, we would provision these instances with Terraform or equivalent + Ansible, to quickly spin them up if we lose one of them.

If we get to this point, we could then assess whether we want Kubernetes to automatically autoscale new instances in OpenStack, but that would a next step.

physikerwelt · 2024-11-01T12:24:29Z

@eloiferrer I think it becomes to complicated when we manage OpenStack and kubernetes separately. I did experiment with docker-composer stack (mostly to create a master slave db architecture, which would be especially helpful for the backup problem), but my feeling was that managing the network connection was too complicated. Thus I think we should aim for an integrated solution that manges the kubernetes cluster from within OpenStack. We can test magnum in our horizon.wikimedia.org cluster see https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Magnum

eloiferrer · 2024-11-01T15:23:41Z

Sounds good, but we should also check whether Magnum is supported in our OpenStack environment, as it is currently not activated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review MaRDI infrastructure #47

Review MaRDI infrastructure #47

physikerwelt commented Oct 31, 2024

eloiferrer commented Nov 1, 2024

physikerwelt commented Nov 1, 2024

eloiferrer commented Nov 1, 2024

Review MaRDI infrastructure #47

Review MaRDI infrastructure #47

Comments

physikerwelt commented Oct 31, 2024

eloiferrer commented Nov 1, 2024

physikerwelt commented Nov 1, 2024

eloiferrer commented Nov 1, 2024