Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Celery worker concurrency setting #1010

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
- [Feature] Allow the user to configure Celery worker concurrency. (by @arbrandes)
5 changes: 5 additions & 0 deletions docs/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,11 @@ This defines the version that will be pulled from just the Open edX platform git

By default, there are 2 `uwsgi worker processes <https://uwsgi-docs.readthedocs.io/en/latest/Options.html#processes>`__ to serve requests for the LMS and the CMS. However, each worker requires upwards of 500 Mb of RAM. You should reduce this value to 1 if your computer/server does not have enough memory.

- ``OPENEDX_LMS_CELERY_WORKERS`` (default: ``"0"``)
- ``OPENEDX_CMS_CELERY_WORKERS`` (default: ``"0"``)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding new configuration settings to Tutor core is a personal trigger of mine 🧨 Do we really want to make changes to the default production values? If yes, can we:

  1. propose better defaults?
  2. make these custom changes possible via a patch instead of two new configuration settings?

Copy link
Author

@arbrandes arbrandes Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I understand the reluctance to add new config items. It's just that in this case, it was the most straightforward way to achieve what I was after. There's precedent, too: OPENEDX_LMS_UWSGI_WORKERS is there for very similar reasons.

Regarding the defaults, I'm not actually changing them: I'm just making them explicit, where before they were implicit. (The implicit default is to scale the workers to however many CPUs you have, and that's what "0" means.)

As for using patches, I wouldn't mind except for the fact that, as mentioned above, this is just doing what OPENEDX_LMS_UWSGI_WORKERS does, except for Celery workers. If we have that configuration, I don't see why we shouldn't have this one.

All of this said, I really like the idea of changing certain things automatically for development environments, whether they have corresponding config items or not. For instance, after I issued this PR it came to my attention that Tutor's importing * from devstack.py for the development settings, and that means that we aren't using Celery workers at all! (See https://github.com/openedx/edx-platform/blob/master/lms/envs/devstack.py#L35.) So why is tutor dev even firing up workers?

In any case, the latter sounds like it warrants a separate PR. My question regarding this PR, though, is whether we do or do not want OPENEDX_LMS_CELERY_WORKERS. It might not make sense to change this in a Kubernetes setting, but I'm willing to defend that it does on any Docker deployment where you have more CPUs than you have RAM (so to speak).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this for not starting workers at all in dev mode? #1041

The question remains whether we still want to let people configure the number of Celery workers manually. (I say we let them.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the fact that we can disable workers in dev. I commented on #1041.

Let's now focus on the possibility to customize the number of celery runners. I agree that this would be a useful feature. If we really have to, we'll introduce new configuration values, but I'd like to see if we can avoid it. For instance, could we avoid that by creating a celery config file? This file would include a {{ patch("edx-platform-celery-config") }} statement. That way, we wouldn't have to create new configuration settings for every celery parameter.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@regisb
Agree! There should be an option to disable workers in dev, it could be disabled by default on dev mode. Personally, I like of having a config file with a patch. By default the config file should have the minimum config to start.
Nevertheless, I feel that won't resolve every additional configuration:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my answer here: #1126 (comment)
I propose that all remaining comments are made on issue #1126


This sets the number of Celery workers at startup. Defaults to the number of CPUs detected on the system (represented by "0").

- ``OPENEDX_CELERY_REDIS_DB`` (default: ``0``)
- ``OPENEDX_CACHE_REDIS_DB`` (default: ``1``)

Expand Down
2 changes: 2 additions & 0 deletions tutor/templates/config/defaults.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,9 @@ OPENEDX_AWS_ACCESS_KEY: ""
OPENEDX_AWS_SECRET_ACCESS_KEY: ""
OPENEDX_CACHE_REDIS_DB: 1
OPENEDX_CELERY_REDIS_DB: 0
OPENEDX_CMS_CELERY_WORKERS: 0
OPENEDX_CMS_UWSGI_WORKERS: 2
OPENEDX_LMS_CELERY_WORKERS: 0
OPENEDX_LMS_UWSGI_WORKERS: 2
OPENEDX_MYSQL_DATABASE: "openedx"
OPENEDX_MYSQL_USERNAME: "openedx"
Expand Down
4 changes: 2 additions & 2 deletions tutor/templates/k8s/deployments.yml
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ spec:
containers:
- name: cms-worker
image: {{ DOCKER_IMAGE_OPENEDX }}
args: ["celery", "--app=cms.celery", "worker", "--loglevel=info", "--hostname=edx.cms.core.default.%%h", "--max-tasks-per-child", "100", "--exclude-queues=edx.lms.core.default"]
args: ["celery", "--app=cms.celery", "worker", "--loglevel=info", "--hostname=edx.cms.core.default.%%h", "--concurrency={{ OPENEDX_CMS_CELERY_WORKERS }}", "--max-tasks-per-child", "100", "--exclude-queues=edx.lms.core.default"]
Copy link

@dkaliberda dkaliberda Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --concurrency argument specifies the number of worker processes. This is not an ideal practice in Kubernetes environments because:

  1. Kubernetes prefers to manage scalability and replication at the container orchestration level, using the replicas field in a Deployment to manage the number of pod instances.
  2. Setting --concurrency inside a container limits the scalability to the process level inside the pod, rather than allowing Kubernetes to manage multiple pods across nodes for better fault tolerance and load distribution.
  3. It violate "one process per container" principle. This is important because with multiple processes in the same container, it is harder to troubleshoot the container because logs from different processes will be mixed together, and it is harder to manage the processes lifecycle, etc.

So, it's better to just make a hardcode--concurrency=1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That all makes sense, but it might be counter-intuitive to have a configuration item that works for one deployment scenario but not another.

I mean, we have OPENEDX_CMS_UWSGI_WORKERS, and that's also configurable for Kubernetes. 🤷🏼

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, it would be appropriate to add to the documentation a mention that setting --concurrency=1 for K8s is recommended not in the context of saving resources, but in the context of proper resource management. What do you think about this? It will be useful for DevOps to pay attention to this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is that we should allow the operator to decide what is the size of its services/pods. For the LMS/Studio by defining the OPENEDX_CMS_UWSGI_WORKERS and on celery workers the same principle is applied by adding the OPENEDX_CMS_CELERY_WORKERS variable. Bigger pods could allow some installations to optimize for their case.
Nevertheless, my Kubernetes deployment uses --concurrency=1, with an Horizontal Pod Autoscaling configuration.

env:
- name: SERVICE_VARIANT
value: cms
Expand Down Expand Up @@ -250,7 +250,7 @@ spec:
containers:
- name: lms-worker
image: {{ DOCKER_IMAGE_OPENEDX }}
args: ["celery", "--app=lms.celery", "worker", "--loglevel=info", "--hostname=edx.lms.core.default.%%h", "--max-tasks-per-child=100", "--exclude-queues=edx.cms.core.default"]
args: ["celery", "--app=lms.celery", "worker", "--loglevel=info", "--hostname=edx.lms.core.default.%%h", "--concurrency={{ OPENEDX_LMS_CELERY_WORKERS }}", "--max-tasks-per-child=100", "--exclude-queues=edx.cms.core.default"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same problem

env:
- name: SERVICE_VARIANT
value: lms
Expand Down
4 changes: 2 additions & 2 deletions tutor/templates/local/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ services:
environment:
SERVICE_VARIANT: lms
DJANGO_SETTINGS_MODULE: lms.envs.tutor.production
command: celery --app=lms.celery worker --loglevel=info --hostname=edx.lms.core.default.%%h --max-tasks-per-child=100 --exclude-queues=edx.cms.core.default
command: celery --app=lms.celery worker --loglevel=info --hostname=edx.lms.core.default.%%h --concurrency={{ OPENEDX_LMS_CELERY_WORKERS }} --max-tasks-per-child=100 --exclude-queues=edx.cms.core.default
Copy link

@dkaliberda dkaliberda Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker-compose also provides mechanisms for managing replicas. Therefore, it is also better to make --concurrency=1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, yes. I'd be glad to review a PR that does that instead. I just need a way to reduce RAM usage for development. ;)

restart: unless-stopped
volumes:
- ../apps/openedx/settings/lms:/openedx/edx-platform/lms/envs/tutor:ro
Expand All @@ -177,7 +177,7 @@ services:
environment:
SERVICE_VARIANT: cms
DJANGO_SETTINGS_MODULE: cms.envs.tutor.production
command: celery --app=cms.celery worker --loglevel=info --hostname=edx.cms.core.default.%%h --max-tasks-per-child 100 --exclude-queues=edx.lms.core.default
command: celery --app=cms.celery worker --loglevel=info --hostname=edx.cms.core.default.%%h --concurrency={{ OPENEDX_CMS_CELERY_WORKERS }} --max-tasks-per-child 100 --exclude-queues=edx.lms.core.default
Copy link

@dkaliberda dkaliberda Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker-compose also provides mechanisms for managing replicas. Therefore, it is also better to make --concurrency=1

restart: unless-stopped
volumes:
- ../apps/openedx/settings/lms:/openedx/edx-platform/lms/envs/tutor:ro
Expand Down
Loading