-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add multiprocessing #92
base: developer
Are you sure you want to change the base?
Conversation
- use pip install to make it available -minor adaptions to original scritp w.r.t. to imputs
Fix Multiprocessing
src/move/data/perturbations.py
Outdated
target_idx = con_dataset_names.index(target_dataset_name) # dataset index | ||
splits = np.cumsum([0] + baseline_dataset.con_shapes) | ||
slice_ = slice(*splits[target_idx : target_idx + 2]) | ||
|
||
num_features = baseline_dataset.con_shapes[target_idx] | ||
#num_features = baseline_dataset.con_shapes[target_idx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? Or does it still need to be changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's correct now. I cannot check right now because I've been having problems connecting to Esrum all morning, but I'll check as soon as it works again (hopefully soon)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still can't connect, it's very annoying :( . I'll let you know as soon as I can again, but I think the code should be fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to connect finally today at noon :). The file was correct, because those functions are not used at all for multiprocessing, I had just changed that to test some things with the previous functions. But it is true that it led to confusion, so I reverted the changes so that the not used functions have their original code
- see if results betw. single and multirun match.
- decide wheather to log2 transform - default: do not in order to allow negative features which are then standard normalized
# This will flatten the array, so we get all bayes_abs for all perturbed features | ||
# vs all continuous features in one 1D array | ||
# Then, we sort them, and get the indexes in the flattened array. So, we get an | ||
# list of sorted indexes in the flatenned array | ||
sort_ids = np.argsort(bayes_abs, axis=None)[::-1] # 1D: N x C |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the flattening here means that the probabilities and FDR are calculated across perturbations (meaning that actually pertubations increase the number of total probabilities)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, yes.
To solve: Reloading trained models from single process with single process yields not exactly the same |
- use everywhere.
- need to switch dataloader constructor for type of pertubation (cat or cont)
I finally found the issue. The multiprocessing did no yet have categorical vs continous pertubations implemented. I moved the masking into the main function and update the bayes_worker fct accordingly. I think it's ready to be checked now @ri-heme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Henry. Left some comments here and there (some type hints were changed, some comments are still there, and the some questions/suggestions).
Thanks for your help!
) | ||
if reconstruction_path.exists(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is never True, right? Because saving the reconstructions is commented out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but like this it is easier to compare bayes_approach
with bayes_parallel
@@ -270,7 +259,8 @@ def _bayes_approach( | |||
mask = feature_mask[:, [i]] | nan_mask # 2D: N x C | |||
diff = np.ma.masked_array(mean_diff[i, :, :], mask=mask) # 2D: N x C | |||
prob = np.ma.compressed(np.mean(diff > 1e-8, axis=0)) # 1D: C | |||
bayes_k[i, :] = np.log(prob + 1e-8) - np.log(1 - prob + 1e-8) | |||
computed_bayes_k = np.log(prob + 1e-8) - np.log(1 - prob + 1e-8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why create this variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to show where a worker function could be called.
@ri-heme just merge if you thinks it's fine now:) |
No description provided.