-
-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Parallel mode for monte-carlo simulations #619
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing feature, as the results show the MonteCarlo
class has great potential for parallelization.
The only blocking issue I see with this PR is the serialization
code. It still does not support all of rocketpy
features and requires a lot of maintanance and updates on our end.
Do you see any other option for performing the serialization of inputs?
@phmbressan we should make all the classes json serializable, it's an open issue at #522 . In the meantime, maybe we could still use the _encoders module to serialize inputs. I agree with you that implementing flight class serialization within this PR may conflict create maintenance issues for us. The simplest solution would be to delete the |
MNT: Refactor Parallel MonteCarlo and Stochastic Seeding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a temporary review, I intend on finishing it later
The monte_carlo_class_usage
notebook currently does not work with parallel, I did not have time to look into it, and so I did not review the parallel part of the code
Overall, it seems there are a lot of structural changes to the class that does not seem necessary for parallel and might make maintaining it difficult. These aren't big changes, so it should be easy to adjust them
The parallel structure seems good, but it is not working currently
I will get back into this as soon as other priorities are dealt with
rocketpy/simulation/monte_carlo.py
Outdated
# Create data files for inputs, outputs and error logging | ||
open_mode = "r+" if append else "w+" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use r+
or w+
? This changes the way the file is written on, since the stream is positioned at the start of the file instead of the end. This changes the behavior of the code in an undesired way, I believe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stream is positioned in the beginning of the file solely for the purpose of the readlines()
method (if it were at the end, it always return 0). I guess this check of id_i
and id_o
might be optional? What is your suggestion here?
The issue is that the MonteCarlo
class only knows whether it should overwrite or append to the files when the simulate
method is called, since it has the append
parameter.
rocketpy/simulation/monte_carlo.py
Outdated
@staticmethod | ||
def __sim_producer( | ||
sto_env, | ||
sto_rocket, | ||
sto_flight, | ||
sim_monitor, | ||
export_list, | ||
export_sample_time, | ||
export_queue, | ||
error_file, | ||
mutex, | ||
seed, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are there all these functions as @staticmethod
? Its just unnecessary I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pickler
for creating separate processes does not work on true methods, only staticmethods
or common functions
. Therefore every method relating to the Process
building must be static.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forget what I mentioned, the issue of the pickler is not with true methods, but with some nested functions that it cannot handle. After the refactor I made in the earlier commits, those functions were removed.
Thanks to your comment, I was motivated to take another look at it and it worked. Good job on reviewing. I refactored removing the clutter of staticmethods
in the latest commit. The main thing to keep in mind when dealing with those processes is that each one of them will have a different self
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems these new changes now lead to this error when running parallel:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[20], [line 1](vscode-notebook-cell:?execution_count=20&line=1)
----> [1](vscode-notebook-cell:?execution_count=20&line=1) test_dispersion.simulate(
[2](vscode-notebook-cell:?execution_count=20&line=2) number_of_simulations=100,
[3](vscode-notebook-cell:?execution_count=20&line=3) append=False,
[4](vscode-notebook-cell:?execution_count=20&line=4) parallel=True,
[5](vscode-notebook-cell:?execution_count=20&line=5) n_workers=5,
[6](vscode-notebook-cell:?execution_count=20&line=6) )
File c:\mateus\github\rocketpy\rocketpy\simulation\monte_carlo.py:235, in MonteCarlo.simulate(self, number_of_simulations, append, parallel, n_workers)
[233](file:///C:/mateus/github/rocketpy/rocketpy/simulation/monte_carlo.py:233) # Run simulations
[234](file:///C:/mateus/github/rocketpy/rocketpy/simulation/monte_carlo.py:234) if parallel:
--> [235](file:///C:/mateus/github/rocketpy/rocketpy/simulation/monte_carlo.py:235) self.__run_in_parallel(n_workers)
[236](file:///C:/mateus/github/rocketpy/rocketpy/simulation/monte_carlo.py:236) else:
[237](file:///C:/mateus/github/rocketpy/rocketpy/simulation/monte_carlo.py:237) self.__run_in_serial()
File c:\mateus\github\rocketpy\rocketpy\simulation\monte_carlo.py:365, in MonteCarlo.__run_in_parallel(self, n_workers)
[362](file:///C:/mateus/github/rocketpy/rocketpy/simulation/monte_carlo.py:362) processes.append(sim_producer)
[364](file:///C:/mateus/github/rocketpy/rocketpy/simulation/monte_carlo.py:364) for sim_producer in processes:
--> [365](file:///C:/mateus/github/rocketpy/rocketpy/simulation/monte_carlo.py:365) sim_producer.start()
[367](file:///C:/mateus/github/rocketpy/rocketpy/simulation/monte_carlo.py:367) sim_consumer = mp.Process(
[368](file:///C:/mateus/github/rocketpy/rocketpy/simulation/monte_carlo.py:368) target=self.__sim_consumer,
[369](file:///C:/mateus/github/rocketpy/rocketpy/simulation/monte_carlo.py:369) args=(
(...)
[373](file:///C:/mateus/github/rocketpy/rocketpy/simulation/monte_carlo.py:373) ),
[374](file:///C:/mateus/github/rocketpy/rocketpy/simulation/monte_carlo.py:374) )
[376](file:///C:/mateus/github/rocketpy/rocketpy/simulation/monte_carlo.py:376) sim_consumer.start()
File c:\Users\mateu\.conda\envs\rpy\Lib\multiprocessing\process.py:121, in BaseProcess.start(self)
[118](file:///C:/Users/mateu/.conda/envs/rpy/Lib/multiprocessing/process.py:118) assert not _current_process._config.get('daemon'), \
[119](file:///C:/Users/mateu/.conda/envs/rpy/Lib/multiprocessing/process.py:119) 'daemonic processes are not allowed to have children'
[120](file:///C:/Users/mateu/.conda/envs/rpy/Lib/multiprocessing/process.py:120) _cleanup()
--> [121](file:///C:/Users/mateu/.conda/envs/rpy/Lib/multiprocessing/process.py:121) self._popen = self._Popen(self)
[122](file:///C:/Users/mateu/.conda/envs/rpy/Lib/multiprocessing/process.py:122) self._sentinel = self._popen.sentinel
[123](file:///C:/Users/mateu/.conda/envs/rpy/Lib/multiprocessing/process.py:123) # Avoid a refcycle if the target function holds an indirect
[124](file:///C:/Users/mateu/.conda/envs/rpy/Lib/multiprocessing/process.py:124) # reference to the process object (see bpo-30775)
File c:\Users\mateu\.conda\envs\rpy\Lib\multiprocessing\context.py:224, in Process._Popen(process_obj)
[222](file:///C:/Users/mateu/.conda/envs/rpy/Lib/multiprocessing/context.py:222) @staticmethod
[223](file:///C:/Users/mateu/.conda/envs/rpy/Lib/multiprocessing/context.py:223) def _Popen(process_obj):
--> [224](file:///C:/Users/mateu/.conda/envs/rpy/Lib/multiprocessing/context.py:224) return _default_context.get_context().Process._Popen(process_obj)
File c:\Users\mateu\.conda\envs\rpy\Lib\multiprocessing\context.py:337, in SpawnProcess._Popen(process_obj)
[334](file:///C:/Users/mateu/.conda/envs/rpy/Lib/multiprocessing/context.py:334) @staticmethod
[335](file:///C:/Users/mateu/.conda/envs/rpy/Lib/multiprocessing/context.py:335) def _Popen(process_obj):
[336](file:///C:/Users/mateu/.conda/envs/rpy/Lib/multiprocessing/context.py:336) from .popen_spawn_win32 import Popen
--> [337](file:///C:/Users/mateu/.conda/envs/rpy/Lib/multiprocessing/context.py:337) return Popen(process_obj)
File c:\Users\mateu\.conda\envs\rpy\Lib\multiprocessing\popen_spawn_win32.py:94, in Popen.__init__(self, process_obj)
[92](file:///C:/Users/mateu/.conda/envs/rpy/Lib/multiprocessing/popen_spawn_win32.py:92) try:
[93](file:///C:/Users/mateu/.conda/envs/rpy/Lib/multiprocessing/popen_spawn_win32.py:93) reduction.dump(prep_data, to_child)
---> [94](file:///C:/Users/mateu/.conda/envs/rpy/Lib/multiprocessing/popen_spawn_win32.py:94) reduction.dump(process_obj, to_child)
[95](file:///C:/Users/mateu/.conda/envs/rpy/Lib/multiprocessing/popen_spawn_win32.py:95) finally:
[96](file:///C:/Users/mateu/.conda/envs/rpy/Lib/multiprocessing/popen_spawn_win32.py:96) set_spawning_popen(None)
File c:\Users\mateu\.conda\envs\rpy\Lib\multiprocessing\reduction.py:60, in dump(obj, file, protocol)
[58](file:///C:/Users/mateu/.conda/envs/rpy/Lib/multiprocessing/reduction.py:58) def dump(obj, file, protocol=None):
[59](file:///C:/Users/mateu/.conda/envs/rpy/Lib/multiprocessing/reduction.py:59) '''Replacement for pickle.dump() using ForkingPickler.'''
---> [60](file:///C:/Users/mateu/.conda/envs/rpy/Lib/multiprocessing/reduction.py:60) ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'Function.__set_interpolation_func.<locals>.spline_interpolation'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have just noticed that, I am on it.
rocketpy/simulation/monte_carlo.py
Outdated
self, | ||
filename, | ||
environment, | ||
rocket, | ||
flight, | ||
export_list=None, | ||
batch_path=None, | ||
export_sample_time=0.1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is batch_path
for? It does not seem to do anything new
rocketpy/simulation/monte_carlo.py
Outdated
inputs_dict = MonteCarlo.prepare_export_data( | ||
inputs_dict, export_sample_time, remove_functions=True | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are the functions removed here? They are just ignored for the inputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something I did not change from the first Parallel implementation, since I was most concerned with the Parallel architecture itself.
But now seems a good moment to discuss it:
-
If this is set to
False
the same discretization is applied to everylambda Function
. Which is not reasonable and takes a huge time to process and generates large inputs (e.g. 100 MB). -
if this is set to
True
, theFunctions
are ignored.
This is quite related to the Encoders feature, so I don't think a robust solution is this PR is needed, but the False
option is almost unusable right now.
I know your review was just temporary, but could you be a bit more specific on the parallel side not working? It might be an OS related issue that we should fix of course, but here things were working fine. |
Open the The parameter After the sim is done, nothing is saved to the If you set |
Co-authored-by: MateusStano <[email protected]>
Co-authored-by: MateusStano <[email protected]>
rocketpy/simulation/monte_carlo.py
Outdated
@staticmethod | ||
def _reprint(msg, end="\n", flush=False): | ||
""" | ||
Prints a message on the same line as the previous one and replaces the | ||
previous message with the new one, deleting the extra characters from | ||
the previous message. | ||
|
||
Parameters | ||
---------- | ||
msg : str | ||
Message to be printed. | ||
end : str, optional | ||
String appended after the message. Default is a new line. | ||
flush : bool, optional | ||
If True, the output is flushed. Default is False. | ||
|
||
Returns | ||
------- | ||
None | ||
""" | ||
padding = "" | ||
|
||
if len(msg) < _SimMonitor._last_print_len: | ||
padding = " " * (_SimMonitor._last_print_len - len(msg)) | ||
|
||
_SimMonitor._last_print_len = len(msg) | ||
|
||
print(msg + padding, end=end, flush=flush) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Does this method need to be a static method?
-
Does it also need to be part of _SimMonitor? It seems to only be used outside of it
I am guessing this structure is because of the parallelism. If not I suggest we change this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phmbressan really good modifications to this PR. Great work.
Before merging, please run 1000 simulations so the example becomes better illustrated on the documentation, please.
I have pushed a fix for the issue on file writing when running on Windows (more accurately on processes Issues solved by this PR:
Points of Improvement:
Some of these points could become issues of the repository. Stating them here for proper PR documentation. Future Considerations:
|
@phmbressan I like the way this PR was refactored. Many thanks for your effort. Please fix the pylint errors and solve all the open conversations in this PR so we can approve and merge it onto develop! Optionally, try to rebase the PR to get the latest commits from develop. |
if n_workers is None or n_workers > os.cpu_count(): | ||
n_workers = os.cpu_count() | ||
|
||
if n_workers < 2: | ||
raise ValueError("Number of workers must be at least 2 for parallel mode.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should print the number of workers being used with _SimMonitor.reprint
here.
|
||
sim_consumer.start() | ||
|
||
for seed in seeds: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor, but I think the consumer should start after the producers
) | ||
processes.append(sim_producer) | ||
|
||
for sim_producer in processes: | ||
sim_producer.start() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an extra for loop here
) | |
processes.append(sim_producer) | |
for sim_producer in processes: | |
sim_producer.start() | |
) | |
processes.append(sim_producer) | |
sim_producer.start() | |
while sim_monitor.keep_simulating(): | ||
sim_idx = sim_monitor.increment() - 1 | ||
|
||
self.environment._set_stochastic(seed) | ||
self.rocket._set_stochastic(seed) | ||
self.flight._set_stochastic(seed) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every single iteration needs to be re-seeded?
If this was done before the while loop, wouldn't it be enough?
@@ -253,114 +491,52 @@ def __run_single_simulation(self, input_file, output_file): | |||
] | |||
for item in d.items() | |||
) | |||
inputs_dict["idx"] = sim_idx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inputs_dict["idx"] = sim_idx | |
inputs_dict["index"] = sim_idx |
For clarity on the files
outputs_dict = { | ||
export_item: getattr(monte_carlo_flight, export_item) | ||
for export_item in self.export_list | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outputs_dict = { | |
export_item: getattr(monte_carlo_flight, export_item) | |
for export_item in self.export_list | |
} | |
outputs_dict = { | |
export_item: getattr(monte_carlo_flight, export_item) | |
for export_item in self.export_list | |
} | |
outputs_dict["index"] = sim_idx |
Really useful to have index on both input and output
Converted to draft until you solve the remaining issues, specially the random number generation problem, |
This pull request implements the option to run simulations in parallel to the
MonteCarlo
class. The feature is using a context manager namedMonteCarloManager
to centralize all workers and shared objects, ensuring proper termination of the sub-processes.A second feature is the possibility to export (close to) all simulation inputs and outputs to an .h5 file. The file can be visualized via HDF View (or similar) software. Since it's a not so conventional file, method to read and a structure to post-process multiple simulations was also added under
rocketpy/stochastic/post_processing
. There's a cache handling the data manipulation where a 3D numpy array is returned with all simulations, the shape corresponds to (simulation_index, time_index, column).column
is reserved for vector data, where x,y and z, for example, may be available under the same data. For example, undercache.read_inputs('motors/thrust_source')
time and thrust will be found.Pull request type
Checklist
black rocketpy/ tests/
) has passed locallypytest tests -m slow --runslow
) have passed locallyCHANGELOG.md
has been updated (if relevant)Current behavior
In the current moment, montecarlo simulations must run in parallel and all outputs a txt file
New behavior
The montecarlo simulations may now be executed in parallel and all outputs may be exported to a txt or an h5 file, saving some key data or everything.
Breaking change
Additional information
None