Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to run 10,000 tasks #128

Open
mulongfu opened this issue Jun 9, 2020 · 1 comment
Open

Ability to run 10,000 tasks #128

mulongfu opened this issue Jun 9, 2020 · 1 comment

Comments

@mulongfu
Copy link

mulongfu commented Jun 9, 2020

Hi,

When I use pymesos to run 10, 100, 1000 tasks at same time, it runs perfectly.
However, for 10000 tasks at same time, some status of tasks are TASK_LOST.

I'm not sure the problem is pymesos or the setting I set.

Mesos Version: 1.9.0
Pymesos: git clone the latest (2020/6/9)
Total CPU 412, MEM 5.2TB, Disk 983.9
For one task, it needs 0.01 cpu, 1M mem

For the task starts is TASK_LOST, The mesos master shows:
Sending status update TASK_LOST for task task-xx of framework xxx 'Task launched with inva
lid offers: Offer xxx is no longer valid'

I guess the cause is that two or above tasks use the same offer id. When one of these tasks finished, the offer will release, and the other task using same offer id cannot use this offer anymore.

@ja8zyjits
Copy link

ja8zyjits commented Dec 11, 2020

We recently had this issue.

We could find a small co-relation that when cpu usage of the scheduler goes high i.e 50-60%(Verified via docker stats) the invalid offer issue shoots up. Our scheduler runs on docker with 1 cpu and 1 gb ram.

We isolated high cpu usage code block, we re-formated them or moved to an async framework like celery. Most of these function were to communicate with external Micro-services and not with mesos; hence it was safe to reformat them.

Soon the issue was not visible. can you give it a try?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants