Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task scheduling and the packing problem #238

Open
dsschult opened this issue Jul 12, 2017 · 4 comments
Open

task scheduling and the packing problem #238

dsschult opened this issue Jul 12, 2017 · 4 comments

Comments

@dsschult
Copy link
Collaborator

We're leaving a lot of resources idle right now because of packing inefficiency. In the GPU case, where it matters most, we have too many high memory jobs.

Example:

pilot announces: 10 cpu, 8 gpu, 40GB memory
new task: 1 cpu, 1 gpu, 25GB memory
pilot announces: 9 cpu, 7 gpu, 15GB memory
new task: 1 cpu, 1 gpu, 12GB memory
pilot announces: 8 cpu, 6 gpu, 3GB memory
match failure - no new task

We just wasted 6 gpus.

A better packing might be:

pilot announces: 10 cpu, 8 gpu, 40GB memory
new task: 1 cpu, 1 gpu, 12GB memory
new task: 1 cpu, 1 gpu, 4GB memory
new task: 1 cpu, 1 gpu, 4GB memory
new task: 1 cpu, 1 gpu, 4GB memory
new task: 1 cpu, 1 gpu, 4GB memory
new task: 1 cpu, 1 gpu, 4GB memory
new task: 1 cpu, 1 gpu, 4GB memory
new task: 1 cpu, 1 gpu, 4GB memory
pilot announces: 2 cpu, 0 gpu, 0GB memory
match failure - no new task

This packing uses all gpus and memory - much more efficient. Though the 25GB task doesn't run then, so there are some problems with it.

One thing that may help significantly is if we can allocate multiple tasks at once, instead of one by one: #237. This would allow proper packing of a single pilot, though packing over multiple pilots and over time would still need work.

@gonzalomerino
Copy link

I believe there are many possibilities for how to do this matching. One option I could see is, start assuming that our higher value resource are gpus, then for the example above the pilot would know that the slot is managing has 8 GPUs with an average of 5GB memory per GPU.

In the algorithm we use for the matching, I think the pilot should be reluctant to carve slots with much more memory than the average. The first carved slot could be allowed go above average, but not the rest. Pretty much what you have displayed in your example above.

the tricky part I see here is that probably we want to have some capability for "configuring" this partitioning policy... if we want to ensure for instance that high mem jobs do get executed, because we decide they are important... or the opposite ...

with this "partitionable iceprod pilot" route you have chosen to take the matchmaking job of the negotiatior... tough job! :-)

@dsschult
Copy link
Collaborator Author

Yeah. I didn't really want to do the matchmaking job in iceprod. But I need to get around the limitations of condor, which led to partitionable pilots.

I'm currently thinking of implementing a variation on fair queueing, like the Linux scheduler uses. This is probably the easiest general way to solve the problem.

@dsschult
Copy link
Collaborator Author

Whiteboard solution:

multi-level queues:

  • gpu (for req_gpus >= 1)
  • mem (for req_memory >= 4)
  • default

scheduling priority within each queue (lowest priority wins):

priority = max(0, req_cpu + req_gpu + req_memory + req_disk + req_time - wait_sec//600)

where wait_sec is the time in seconds since a task was queued

So, when a new resource is available, choose tasks from queues like:

  • if gpus >= 1:
    • gpu, gpu..., default, default...
  • elif memory >= 4:
    • mem, default, default...
  • else:
    • default, default...

@dsschult
Copy link
Collaborator Author

This seems to perform "better", though still not completely solved. Memory is still a limiting factor, especially memory fragmentation.

#239 - memory oversubscription - may be what is really needed, since we apparently suck at requirement predictions (or there are spikes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants