Restructure Parallel Runner #22

DE0CH · 2022-12-13T16:37:41Z

Instead of letting irace launch new target runners, we let python control the workers of target runner. This improves performance and also fixes parllel runner crashing on windows (fixes #16). I ran the tests/test_dual_annealing. Excluding the import and setup, the run time of the tuning was reduced from 1.46s to 0.87s.

A few implementation details to explain:

This does not make use of exec.target.runner, but instead uses check.output.target.runner (PR: Export check output target runner MLopez-Ibanez/irace#49) to check the output of the target runner. I did this because I noticed that passing an r function crashes the queue so hard that it just becomes empty forever without raising any exception. Not sure why. Perhaps one day when I have more time I'll investigate it more and report the bug / submit PR to rpy2 or python. As a result, this does not benefit from the retrying function in exec.target.runner but it doesn't seem to be missing anything else.
Instead of passing the scenario object given by irace to the target runner, it keeps track of the scenario object (before it's been processed by irace). I think this is faster because there wouldn't be r to python conversion and also sometimes r objects crashes a queue as well.
It removes the parllel field from scenario before passing it to irace because otherwise irace seems to still read the parallel field and tries to do something that makes it crash on windows.

Depends on MLopez-Ibanez/irace#49.

MLopez-Ibanez · 2022-12-14T13:43:06Z

* It removes the `parllel` field from scenario before passing it to irace because otherwise irace seems to still read the parallel field and tries to do something that makes it crash on windows.

I think I know why this is happening. Could you try again with the latest version from github? I don't want to work-around bugs in irace. I'd rather fix irace and avoid having to do weird things in iracepy.

MLopez-Ibanez · 2022-12-14T14:00:17Z

* Instead of passing the scenario object given by irace to the target runner, it keeps track of the scenario object (before it's been processed by irace). I think this is faster because there wouldn't be r to python conversion and also sometimes r objects crashes a queue as well.

The downside is that irace will set defaults and do other things that change the scenario object but are crucial for a correct behavior. Perhaps an intermediate solution is to convert the scenario returned by irace back to python once and save it.

DE0CH · 2022-12-15T11:12:06Z

The downside is that irace will set defaults and do other things that change the scenario object but are crucial for a correct behavior. Perhaps an intermediate solution is to convert the scenario returned by irace back to python once and save it.

I see. Perhaps I could use checkScenario in irace?

MLopez-Ibanez · 2022-12-16T23:45:43Z

I see. Perhaps I could use checkScenario in irace?

Sure, that is its purpose.

DE0CH · 2022-12-17T18:13:36Z

Oh but I noticed there's a fixme: "FIXME: This function should only do checks and return TRUE/FALSE." Does this mean the function might return TRUE/FALSE in the future and whatever relying on it will break?

MLopez-Ibanez · 2022-12-17T18:50:01Z

Oh but I noticed there's a fixme: "FIXME: This function should only do checks and return TRUE/FALSE." Does this mean the function might return TRUE/FALSE in the future and whatever relying on it will break?

That is the long-term plan but the chances of this happening soon are very small. It will need to be split into two functions anyway and one of them will set the defaults.

codecov-commenter · 2022-12-17T19:33:53Z

Codecov Report

Merging #22 (68ff4ae) into main (b2177a6) will decrease coverage by 0.70%.
The diff coverage is 87.03%.

❗ Current head 68ff4ae differs from pull request most recent head 8b94a72. Consider uploading reports for the commit 8b94a72 to get more accurate results

@@            Coverage Diff             @@
##             main      #22      +/-   ##
==========================================
- Coverage   92.37%   91.66%   -0.71%     
==========================================
  Files          10       11       +1     
  Lines         354      384      +30     
==========================================
+ Hits          327      352      +25     
- Misses         27       32       +5

Impacted Files	Coverage Δ
tests/utils.py	`73.33% <73.33%> (ø)`
src/irace/__init__.py	`88.98% <86.95%> (-0.85%)`	⬇️
tests/test_data_conversion.py	`86.20% <100.00%> (+5.25%)`	⬆️
tests/test_dual_annealing.py	`94.87% <100.00%> (+1.12%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

DE0CH · 2022-12-19T15:12:03Z

There's a bug with this. Let me fix it first it.

MLopez-Ibanez · 2022-12-14T13:40:35Z

src/irace/__init__.py

@@ -149,10 +187,15 @@ def set_initial(self, x):

    def run(self):
        """Returns a Pandas DataFrame, one column per parameter and the row index are the configuration ID."""
-        self.scenario['targetRunner'] = self.r_target_runner
+        self.r_target_runner_parallel = make_target_runner_parallel(self.target_aq, self.target_rq, self._pkg.check_output_target_runner, self.scenario.copy())
+        self.scenario['targetRunnerParallel'] = self.r_target_runner_parallel


This seems to always use self.scenario['targetRunnerParallel'] but it should only be used if the user actually wants to run in parallel.

Good catch. I thought to just spawn a worker regardless but that might create a problem on platforms where a user's process is limited. I'll change it.

In the change, I just made the target runner parallel to call the function directly if parallel is 0 or 1. Hope this is ok.

If everything is otherwise the same, I would prefer to use targetRunnerParallel instead of using targetRunner even when user does not use parallel because then I could use pure python function and pure python objects through the queues.

I still think that your targetRunnerParallel is much more complex than the code it replaces. Is the main benefit to avoid conversions between R/python and increase speed? Also forcing users to use targetRunnerParallel means that they cannot use any of the other parallelization mechanisms in irace, such as MPI/qsub and their own targetRunnerParallel, they cannot use the automatic retries and maybe other things that I have not even thought about.

Thus, I would rather prefer to add an option to irace.init(), for example, "python_parallel=True" that selects to either use your python-based targetRunnerParallel or simply uses the R implementation without changing targetRunnerParallel as it is now.

To be honest, I think it's in part due to me not understanding how to use MPI and not seeing the benefit of it. I started it because I thought it would make it faster, and there was also a problem with garbage collection with the old code which I can't reproduce nor know what's causing it, but I thought spawning new python threads from an embedded R might trigger some edge case in the gc.

It is possible to avoid conversion between R/python without this targetRunnerParallel. But does MPI create new threads for each target runner run, or does it reuse the same thread like a worker pool? If it creates a new thread every time, it might have more overhead than the worker queue that I wrote here.

The only benefit, it remains, is that it fixes the crash on Windows (#16). I think the proper way to fix it is to fix it in the irace R code / MPI / rpy2. But it seems that the error is due to some low level dynamically linked library / shared symbol. I don't think I have the expertise to understand and fix it.

Thus, I would rather prefer to add an option to irace.init(), for example, "python_parallel=True" that selects to either use your python-based targetRunnerParallel or simply uses the R implementation without changing targetRunnerParallel as it is now.

I would actually not prefer this. This creates some confusion for the user. It also create more work for us to maintain two separate logic for stuff like #30. So I would propose that we either close this PR and declare it a failed experiment, or we try to replicate the irace functions like using MPI/qsub, allowing users to set targetRunnerParallel and implement retires.

MLopez-Ibanez · 2023-01-04T21:18:24Z

src/irace/__init__.py

+    if 'targetRunnerParallel' in scenario:
+        raise NotImplementedError("targetRunnerParallel is not yet supported. If you need this feature, consider opening an issue to show us some people actually want to use this.")
+
+def run_irace(irace, args, q: Queue):


What is this function doing? It does not seem to be used anywhere.

Sorry forgot to delete this.

MLopez-Ibanez · 2023-01-04T21:21:11Z

src/irace/__init__.py

+            self.target_aq = None
+            self.target_rq = None
+        self.workers: list[Process] = []
+        if self.worker_count != 1:


Why not move this code within the above if self.worker_count != 1?

MLopez-Ibanez · 2023-01-04T21:32:59Z

tests/test_daemon_process.py

@@ -4,16 +4,12 @@
 from irace import irace
 import pandas as pd
 from multiprocessing import Process
-import os

 import json


Why is this needed here?

I discovered that for daemonic processes, no child process can be started, meaning that the target runner cannot create new threads. The python multiprocessing.Pool uses daemonic processes, so this is here to prevent regression.

MLopez-Ibanez · 2023-01-04T21:33:26Z

src/irace/__init__.py

+from rpy2.rinterface_lib.sexp import NACharacterType
+from multiprocessing import Queue, Process
+import json
+from rpy2.rinterface_lib.sexp import NACharacterType


Duplicated?

Will remove it.

MLopez-Ibanez · 2023-01-04T21:33:34Z

src/irace/__init__.py

 from rpy2.robjects.conversion import localconverter
 from rpy2 import rinterface as ri
 from rpy2.rinterface_lib import na_values
+from rpy2.rinterface_lib.sexp import NACharacterType
+from multiprocessing import Queue, Process
+import json


Is this unused?

Yeah, it should belong to #30

MLopez-Ibanez · 2023-01-04T21:43:13Z

src/irace/__init__.py

@@ -149,10 +187,15 @@ def set_initial(self, x):

    def run(self):
        """Returns a Pandas DataFrame, one column per parameter and the row index are the configuration ID."""
-        self.scenario['targetRunner'] = self.r_target_runner
+        self.r_target_runner_parallel = make_target_runner_parallel(self.target_aq, self.target_rq, self._pkg.check_output_target_runner, self.scenario.copy())
+        self.scenario['targetRunnerParallel'] = self.r_target_runner_parallel


I still think that your targetRunnerParallel is much more complex than the code it replaces. Is the main benefit to avoid conversions between R/python and increase speed? Also forcing users to use targetRunnerParallel means that they cannot use any of the other parallelization mechanisms in irace, such as MPI/qsub and their own targetRunnerParallel, they cannot use the automatic retries and maybe other things that I have not even thought about.

Thus, I would rather prefer to add an option to irace.init(), for example, "python_parallel=True" that selects to either use your python-based targetRunnerParallel or simply uses the R implementation without changing targetRunnerParallel as it is now.

MLopez-Ibanez · 2023-01-04T21:50:23Z

src/irace/__init__.py

+        self.scenario = r_to_python(self.r_scenario)
+        self.scenario.pop('targetRunnerParallel', None)
+        scenario_a[0] = self.scenario
+        scenario_a[1] = self.r_scenario


This may need a comment to explain what is going on: You pass scenario_a to make_target_runner_parallel using None values, but then assign values here? How does that work?

make_target_runner_parallel needs to know the scenario but to make the scenario (pass the checkScenario), the targetRunnerParallel must be set, creating a chicken and egg problem. scenario_a kind of acts like a pointer. I can give it a null pointer first and then set the value later.

DE0CH mentioned this pull request Dec 13, 2022

Export check output target runner MLopez-Ibanez/irace#49

Merged

DE0CH force-pushed the restructure-parallel branch 2 times, most recently from bed7752 to cefa1c1 Compare December 13, 2022 16:52

DE0CH force-pushed the restructure-parallel branch from a7be18b to 4c5b379 Compare December 17, 2022 19:33

DE0CH changed the base branch from dev to main December 17, 2022 19:59

DE0CH force-pushed the restructure-parallel branch from 4c5b379 to c9bcba7 Compare December 19, 2022 01:44

DE0CH marked this pull request as draft December 19, 2022 15:11

DE0CH force-pushed the restructure-parallel branch 3 times, most recently from 15f8cb6 to 0eb8401 Compare December 19, 2022 17:59

DE0CH mentioned this pull request Dec 19, 2022

exit on null for target output MLopez-Ibanez/irace#50

Merged

DE0CH force-pushed the restructure-parallel branch 2 times, most recently from 1a39749 to b3604a2 Compare December 21, 2022 03:22

DE0CH marked this pull request as ready for review December 21, 2022 03:23

DE0CH requested a review from MLopez-Ibanez December 21, 2022 03:23

DE0CH force-pushed the restructure-parallel branch from b3604a2 to 3c88734 Compare December 21, 2022 03:29

MLopez-Ibanez reviewed Dec 22, 2022

View reviewed changes

DE0CH force-pushed the restructure-parallel branch 5 times, most recently from 9b5c721 to 5bf567a Compare December 23, 2022 08:51

DE0CH mentioned this pull request Dec 23, 2022

Pass instances #30

Open

DE0CH requested a review from MLopez-Ibanez December 23, 2022 09:01

restructure parallel runner

8b94a72

DE0CH force-pushed the restructure-parallel branch from 5bf567a to 8b94a72 Compare December 28, 2022 16:53

MLopez-Ibanez requested changes Jan 4, 2023

View reviewed changes

DE0CH mentioned this pull request Jan 6, 2023

Parallel on Windows not Working #16

Open

DE0CH mentioned this pull request Jan 19, 2023

pass instances #34

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructure Parallel Runner #22

Restructure Parallel Runner #22

DE0CH commented Dec 13, 2022 •

edited

Loading

MLopez-Ibanez commented Dec 14, 2022

MLopez-Ibanez commented Dec 14, 2022

DE0CH commented Dec 15, 2022

MLopez-Ibanez commented Dec 16, 2022

DE0CH commented Dec 17, 2022

MLopez-Ibanez commented Dec 17, 2022

codecov-commenter commented Dec 17, 2022 •

edited

Loading

DE0CH commented Dec 19, 2022

MLopez-Ibanez Dec 14, 2022

DE0CH Dec 22, 2022

DE0CH Dec 22, 2022

DE0CH Dec 22, 2022

MLopez-Ibanez Jan 4, 2023

DE0CH Jan 5, 2023

MLopez-Ibanez Jan 4, 2023

DE0CH Jan 5, 2023

MLopez-Ibanez Jan 4, 2023

MLopez-Ibanez Jan 4, 2023

DE0CH Jan 5, 2023

MLopez-Ibanez Jan 4, 2023

DE0CH Jan 5, 2023

MLopez-Ibanez Jan 4, 2023

DE0CH Jan 5, 2023

MLopez-Ibanez Jan 4, 2023

MLopez-Ibanez Jan 4, 2023

DE0CH Jan 5, 2023

Restructure Parallel Runner #22

Are you sure you want to change the base?

Restructure Parallel Runner #22

Conversation

DE0CH commented Dec 13, 2022 • edited Loading

MLopez-Ibanez commented Dec 14, 2022

MLopez-Ibanez commented Dec 14, 2022

DE0CH commented Dec 15, 2022

MLopez-Ibanez commented Dec 16, 2022

DE0CH commented Dec 17, 2022

MLopez-Ibanez commented Dec 17, 2022

codecov-commenter commented Dec 17, 2022 • edited Loading

Codecov Report

DE0CH commented Dec 19, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DE0CH commented Dec 13, 2022 •

edited

Loading

codecov-commenter commented Dec 17, 2022 •

edited

Loading