-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run in parallel #8
base: dev
Are you sure you want to change the base?
Conversation
Updates to use parallelizable & collect
Hi, @skrawcz
I have no clue why it need to pickle |
Yes multiprocessing has this issue. I'll take a look in a bit -- or if you have the stacktrace could you post that? Note: Multiprocessing will do a better job if the files are explicit standalone modules -- it looks like they are in the example but putting here just to double check. Otherwise I would try installing Ray and using the RayTaskExecutor instead. It should do a better job of serialization. E.g. https://github.com/DAGWorks-Inc/hamilton/blob/main/examples/parallelism/file_processing/run.py#L36 |
e.g. we probably want to avoid this:
and instead have this be a real python module -- multiprocessing might be able to handle that better. |
Thanks! I read the doc in create_temporary_module, that's why I put the module outside. Ray is a good option, I will consider about it : ) |
switching to ray works it seems -- I haven't tracked down what is causing the SERDE issue for multi-processing, but Ray does a better job there. from hamilton import driver
from hamilton.execution import executors
import worker
import mapper
import ray
from hamilton.plugins import h_ray
if __name__ == '__main__':
drivers = []
inputs = []
for i in range(4):
dr = driver.Builder().with_modules(worker).build()
drivers.append(dr)
inputs.append({'a': i})
dr = (
driver.Builder()
.with_modules(mapper)
.enable_dynamic_execution(allow_experimental_mode=True)
# .with_local_executor(executors.SynchronousLocalTaskExecutor())
# .with_remote_executor(executors.MultiProcessingExecutor(8))
.with_remote_executor(h_ray.RayTaskExecutor(8))
.build()
)
dr.execute(
final_vars=["reducer"],
inputs={"drivers": drivers, "inputs": inputs, "final_vars": ['double']},
) |
Yes! I also made it yesterday. |
Updates to use parallelizable & collect