-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restructure Parallel Runner #22
base: main
Are you sure you want to change the base?
Conversation
bed7752
to
cefa1c1
Compare
I think I know why this is happening. Could you try again with the latest version from github? I don't want to work-around bugs in irace. I'd rather fix irace and avoid having to do weird things in iracepy. |
The downside is that irace will set defaults and do other things that change the scenario object but are crucial for a correct behavior. Perhaps an intermediate solution is to convert the scenario returned by irace back to python once and save it. |
I see. Perhaps I could use |
Sure, that is its purpose. |
Oh but I noticed there's a fixme: "FIXME: This function should only do checks and return TRUE/FALSE." Does this mean the function might return TRUE/FALSE in the future and whatever relying on it will break? |
That is the long-term plan but the chances of this happening soon are very small. It will need to be split into two functions anyway and one of them will set the defaults. |
a7be18b
to
4c5b379
Compare
Codecov Report
@@ Coverage Diff @@
## main #22 +/- ##
==========================================
- Coverage 92.37% 91.66% -0.71%
==========================================
Files 10 11 +1
Lines 354 384 +30
==========================================
+ Hits 327 352 +25
- Misses 27 32 +5
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
4c5b379
to
c9bcba7
Compare
There's a bug with this. Let me fix it first it. |
15f8cb6
to
0eb8401
Compare
1a39749
to
b3604a2
Compare
b3604a2
to
3c88734
Compare
@@ -149,10 +187,15 @@ def set_initial(self, x): | |||
|
|||
def run(self): | |||
"""Returns a Pandas DataFrame, one column per parameter and the row index are the configuration ID.""" | |||
self.scenario['targetRunner'] = self.r_target_runner | |||
self.r_target_runner_parallel = make_target_runner_parallel(self.target_aq, self.target_rq, self._pkg.check_output_target_runner, self.scenario.copy()) | |||
self.scenario['targetRunnerParallel'] = self.r_target_runner_parallel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to always use self.scenario['targetRunnerParallel']
but it should only be used if the user actually wants to run in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I thought to just spawn a worker regardless but that might create a problem on platforms where a user's process is limited. I'll change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the change, I just made the target runner parallel to call the function directly if parallel
is 0 or 1. Hope this is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If everything is otherwise the same, I would prefer to use targetRunnerParallel
instead of using targetRunner
even when user does not use parallel because then I could use pure python function and pure python objects through the queues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think that your targetRunnerParallel
is much more complex than the code it replaces. Is the main benefit to avoid conversions between R/python and increase speed? Also forcing users to use targetRunnerParallel
means that they cannot use any of the other parallelization mechanisms in irace, such as MPI/qsub and their own targetRunnerParallel
, they cannot use the automatic retries and maybe other things that I have not even thought about.
Thus, I would rather prefer to add an option to irace.init(), for example, "python_parallel=True" that selects to either use your python-based targetRunnerParallel
or simply uses the R implementation without changing targetRunnerParallel
as it is now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I think it's in part due to me not understanding how to use MPI and not seeing the benefit of it. I started it because I thought it would make it faster, and there was also a problem with garbage collection with the old code which I can't reproduce nor know what's causing it, but I thought spawning new python threads from an embedded R might trigger some edge case in the gc.
It is possible to avoid conversion between R/python without this targetRunnerParallel
. But does MPI create new threads for each target runner run, or does it reuse the same thread like a worker pool? If it creates a new thread every time, it might have more overhead than the worker queue that I wrote here.
The only benefit, it remains, is that it fixes the crash on Windows (#16). I think the proper way to fix it is to fix it in the irace R code / MPI / rpy2. But it seems that the error is due to some low level dynamically linked library / shared symbol. I don't think I have the expertise to understand and fix it.
Thus, I would rather prefer to add an option to irace.init(), for example, "python_parallel=True" that selects to either use your python-based targetRunnerParallel or simply uses the R implementation without changing targetRunnerParallel as it is now.
I would actually not prefer this. This creates some confusion for the user. It also create more work for us to maintain two separate logic for stuff like #30. So I would propose that we either close this PR and declare it a failed experiment, or we try to replicate the irace functions like using MPI/qsub, allowing users to set targetRunnerParallel
and implement retires.
9b5c721
to
5bf567a
Compare
5bf567a
to
8b94a72
Compare
if 'targetRunnerParallel' in scenario: | ||
raise NotImplementedError("targetRunnerParallel is not yet supported. If you need this feature, consider opening an issue to show us some people actually want to use this.") | ||
|
||
def run_irace(irace, args, q: Queue): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this function doing? It does not seem to be used anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry forgot to delete this.
self.target_aq = None | ||
self.target_rq = None | ||
self.workers: list[Process] = [] | ||
if self.worker_count != 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not move this code within the above if self.worker_count != 1
?
@@ -4,16 +4,12 @@ | |||
from irace import irace | |||
import pandas as pd | |||
from multiprocessing import Process | |||
import os | |||
|
|||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discovered that for daemonic processes, no child process can be started, meaning that the target runner cannot create new threads. The python multiprocessing.Pool
uses daemonic processes, so this is here to prevent regression.
from rpy2.rinterface_lib.sexp import NACharacterType | ||
from multiprocessing import Queue, Process | ||
import json | ||
from rpy2.rinterface_lib.sexp import NACharacterType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove it.
from rpy2.robjects.conversion import localconverter | ||
from rpy2 import rinterface as ri | ||
from rpy2.rinterface_lib import na_values | ||
from rpy2.rinterface_lib.sexp import NACharacterType | ||
from multiprocessing import Queue, Process | ||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it should belong to #30
@@ -149,10 +187,15 @@ def set_initial(self, x): | |||
|
|||
def run(self): | |||
"""Returns a Pandas DataFrame, one column per parameter and the row index are the configuration ID.""" | |||
self.scenario['targetRunner'] = self.r_target_runner | |||
self.r_target_runner_parallel = make_target_runner_parallel(self.target_aq, self.target_rq, self._pkg.check_output_target_runner, self.scenario.copy()) | |||
self.scenario['targetRunnerParallel'] = self.r_target_runner_parallel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think that your targetRunnerParallel
is much more complex than the code it replaces. Is the main benefit to avoid conversions between R/python and increase speed? Also forcing users to use targetRunnerParallel
means that they cannot use any of the other parallelization mechanisms in irace, such as MPI/qsub and their own targetRunnerParallel
, they cannot use the automatic retries and maybe other things that I have not even thought about.
Thus, I would rather prefer to add an option to irace.init(), for example, "python_parallel=True" that selects to either use your python-based targetRunnerParallel
or simply uses the R implementation without changing targetRunnerParallel
as it is now.
self.scenario = r_to_python(self.r_scenario) | ||
self.scenario.pop('targetRunnerParallel', None) | ||
scenario_a[0] = self.scenario | ||
scenario_a[1] = self.r_scenario |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may need a comment to explain what is going on: You pass scenario_a
to make_target_runner_parallel
using None
values, but then assign values here? How does that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make_target_runner_parallel
needs to know the scenario but to make the scenario (pass the checkScenario
), the targetRunnerParallel
must be set, creating a chicken and egg problem. scenario_a
kind of acts like a pointer. I can give it a null pointer first and then set the value later.
Instead of letting irace launch new target runners, we let python control the workers of target runner. This improves performance and also fixes parllel runner crashing on windows (fixes #16). I ran the
tests/test_dual_annealing
. Excluding the import and setup, the run time of the tuning was reduced from 1.46s to 0.87s.A few implementation details to explain:
exec.target.runner
, but instead usescheck.output.target.runner
(PR: Export check output target runner MLopez-Ibanez/irace#49) to check the output of the target runner. I did this because I noticed that passing an r function crashes the queue so hard that it just becomes empty forever without raising any exception. Not sure why. Perhaps one day when I have more time I'll investigate it more and report the bug / submit PR to rpy2 or python. As a result, this does not benefit from the retrying function inexec.target.runner
but it doesn't seem to be missing anything else.parllel
field from scenario before passing it to irace because otherwise irace seems to still read the parallel field and tries to do something that makes it crash on windows.Depends on MLopez-Ibanez/irace#49.