You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently pipeliner is greedy: as soon as router announces a new pipeline, it gets this pipeline to take care of.
This approach, for a first point of view, can lead to a greater throughput, but if we analyse it better, it is not a good thing to do, because:
Pipeliner's process will stop with MemoryError exception as soon as there are so many pipelines to use the maximum amount of memory allowed by the operating system;
As the list of pipelines in Pipeliner will be huge, it'll slow down Pipeliner's operation and can lead in delays to start new jobs, send messages of 'finished pipelines' etc.
We (maybe?) can have problems with huge queues on ZeroMQ (think in 1M pipelines finishing -- the amount of used memory of this queue and the time Pipeliner will 'stop' doing things just to process it).
So, the new approach should be like Broker's: limit the number of pipelines that Pipeliner takes care at a time. This number can be fixed or dynamic (dynamic is better, I think) and probably will be based on:
Number of workers in the cluster
Rate of finished pipelines
The text was updated successfully, but these errors were encountered:
Currently pipeliner is greedy: as soon as router announces a new pipeline, it gets this pipeline to take care of.
This approach, for a first point of view, can lead to a greater throughput, but if we analyse it better, it is not a good thing to do, because:
MemoryError
exception as soon as there are so many pipelines to use the maximum amount of memory allowed by the operating system;So, the new approach should be like Broker's: limit the number of pipelines that Pipeliner takes care at a time. This number can be fixed or dynamic (dynamic is better, I think) and probably will be based on:
The text was updated successfully, but these errors were encountered: