You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be useful if the idea of a job could be abstracted a little bit. To this end, I propose that a job be abstracted into a collection of "tasks". Each task must have marked what other task it is dependent on, and are deferred until each of these are completed.
This ensures that shuffling jobs will not cause a dependent task to cross host boundaries. It also has the benefit of allowing some amount of shared memory between tasks, though of course some limitations will need to be imposed on usage.
One question: how does apply_async() actually work? If you have such calls nested, is that even supported? Or do they all need to be done from the main process?
If the former, we can keep using the map API fairly easily. If the latter, that might be a challenge.
It would be useful if the idea of a job could be abstracted a little bit. To this end, I propose that a job be abstracted into a collection of "tasks". Each task must have marked what other task it is dependent on, and are deferred until each of these are completed.
This ensures that shuffling jobs will not cause a dependent task to cross host boundaries. It also has the benefit of allowing some amount of shared memory between tasks, though of course some limitations will need to be imposed on usage.
So a generic version of this might look like:
The text was updated successfully, but these errors were encountered: