add multiprocessing #92

LFT18 · 2024-04-19T13:02:26Z

No description provided.

- use pip install to make it available -minor adaptions to original scritp w.r.t. to imputs

ref: https://pytorch.org/docs/stable/notes/multiprocessing.html#avoid-cpu-oversubscription

Fix Multiprocessing

enryH · 2024-06-08T08:49:11Z

src/move/data/perturbations.py

    target_idx = con_dataset_names.index(target_dataset_name)  # dataset index
    splits = np.cumsum([0] + baseline_dataset.con_shapes)
    slice_ = slice(*splits[target_idx : target_idx + 2])

-    num_features = baseline_dataset.con_shapes[target_idx]
+    #num_features = baseline_dataset.con_shapes[target_idx]


Is this correct? Or does it still need to be changed?

I think it's correct now. I cannot check right now because I've been having problems connecting to Esrum all morning, but I'll check as soon as it works again (hopefully soon)

I still can't connect, it's very annoying :( . I'll let you know as soon as I can again, but I think the code should be fine

I was able to connect finally today at noon :). The file was correct, because those functions are not used at all for multiprocessing, I had just changed that to test some things with the previous functions. But it is true that it led to confusion, so I reverted the changes so that the not used functions have their original code

- see if results betw. single and multirun match.

- decide wheather to log2 transform - default: do not in order to allow negative features which are then standard normalized

enryH · 2024-07-09T11:48:15Z

src/move/tasks/identify_associations.py

+    # This will flatten the array, so we get all bayes_abs for all perturbed features
+    # vs all continuous features in one 1D array
+    # Then, we sort them, and get the indexes in the flattened array. So, we get an
+    # list of sorted indexes in the flatenned array
    sort_ids = np.argsort(bayes_abs, axis=None)[::-1]  # 1D: N x C


So the flattening here means that the probabilities and FDR are calculated across perturbations (meaning that actually pertubations increase the number of total probabilities)?

I think so, yes.

enryH · 2024-07-09T12:31:53Z

To solve: Reloading trained models from single process with single process yields not exactly the same fdr array everytime. The multiprocess version in general higher fdr values -> we need to figure out why

- use everywhere.

- need to switch dataloader constructor for type of pertubation (cat or cont)

enryH · 2024-07-10T12:15:31Z

I finally found the issue. The multiprocessing did no yet have categorical vs continous pertubations implemented. I moved the masking into the main function and update the bayes_worker fct accordingly. I think it's ready to be checked now @ri-heme

ri-heme

Hi Henry. Left some comments here and there (some type hints were changed, some comments are still there, and the some questions/suggestions).

Thanks for your help!

.gitignore

src/move/tasks/encode_data.py

ri-heme · 2024-07-12T10:53:30Z

src/move/tasks/identify_associations.py

        )
+        if reconstruction_path.exists():


This is never True, right? Because saving the reconstructions is commented out.

Yes, but like this it is easier to compare bayes_approach with bayes_parallel

ri-heme · 2024-07-12T10:54:34Z

src/move/tasks/identify_associations.py

@@ -270,7 +259,8 @@ def _bayes_approach(
        mask = feature_mask[:, [i]] | nan_mask  # 2D: N x C
        diff = np.ma.masked_array(mean_diff[i, :, :], mask=mask)  # 2D: N x C
        prob = np.ma.compressed(np.mean(diff > 1e-8, axis=0))  # 1D: C
-        bayes_k[i, :] = np.log(prob + 1e-8) - np.log(1 - prob + 1e-8)
+        computed_bayes_k = np.log(prob + 1e-8) - np.log(1 - prob + 1e-8)


Why create this variable?

to show where a worker function could be called.

src/move/tasks/identify_associations.py

src/move/tasks/analyze_latent.py

src/move/tasks/identify_associations.py

src/move/conf/schema.py

enryH · 2024-07-12T13:00:11Z

@ri-heme just merge if you thinks it's fine now:)

qgh533 and others added 23 commits April 18, 2024 17:55

Add files for multiprocessing

0e2242e

Update identify_associations_multiprocess.py

2add9b9

Clean multiprocessing script

f645ea4

Update __main__.py multiprocessing

f471704

Update schema.py multiprocessing

85c28e5

Update __init__.py multiprocessing

6330e92

Update preprocessing.py

bbe1b4e

🔥 clean-up duplicated src/move files (pkg was in main folder)

820c554

✨ add identify_associations_multiprocess to src/move/tasks

ce9a9dc

- use pip install to make it available -minor adaptions to original scritp w.r.t. to imputs

🐛 make mutliprocessing not stale: assign # of threads for each process

5327223

ref: https://pytorch.org/docs/stable/notes/multiprocessing.html#avoid-cpu-oversubscription

Merge pull request #1 from enryH/main

eaa858a

Fix Multiprocessing

Updated identify_associations_multiprocess.py

5ab5e59

Update config files for small tries

33f565a

Multiprocessing for analyze_latent

63f128b

Analyze latent multiprocessing

ca389d2

Analyze latent multiprocessing

e08a94b

Fix bayes_k calculation

e94ef90

Fix analyze_latent_multiprocessing

f4f0aa3

Update and new functions

6a0b665

Delete files and fix multiloop

a5310a6

Clean identify_association_multiprocess.py

e67bb75

Clean analyze_latent multiprocessing.py

86bfed5

Update perturbations.py

f9d4961

enryH reviewed Jun 8, 2024

View reviewed changes

LFT18 added 6 commits June 10, 2024 13:10

Update perturbations.py

c2c49e8

Delete src/move/tasks/analyze_latent_efficient.py

0a4bcae

Delete src/move/tasks/analyze_latent_multiprocessing.py

ce20dac

Delete src/move/tasks/identify_associations_multiprocess_loop.py

4a72842

Delete src/move/tasks/identify_associations_multiprocess_may.py

f537d21

Delete src/move/tasks/identify_associations_selected.py

568aaa8

Henry added 4 commits July 9, 2024 10:18

⚡ do not run t-test check (for now)

8d65528

- see if results betw. single and multirun match.

⚡ bump up bayes factor training

1dd6788

🎨 train both refits with 100 epochs

dc9020e

✨ add log2 option

9cd2a7b

- decide wheather to log2 transform - default: do not in order to allow negative features which are then standard normalized

enryH reviewed Jul 9, 2024

View reviewed changes

Henry added 13 commits July 9, 2024 15:35

🎨 document some more

a4911d7

⚡ test multiprocess on continuous tutorial

e0421bd

🐛 remove non-exisitng key

c70d328

✨ build dataloader fct

1c72316

- use everywhere.

🐛 fix minor bug (wrongly assigned feat)

58f08e4

⚡ move masking code into main fct of module

f895237

🎨 move feat_mask creation out

5eb7954

🚑 temp. fix of CI

8c4e53b

- need to switch dataloader constructor for type of pertubation (cat or cont)

⚡ do not build dataloaders for multiprocessing

49a93d0

🚧 test t-test again, re-run pert. w/o model training

709c674

✨ add categorical pert. to multiprocessing

6e65cc6

🔥 remove unused code

4efbdd9

🎨 remove unused argument

dab767a

enryH marked this pull request as ready for review July 10, 2024 12:15

enryH requested review from mpielies and ri-heme July 12, 2024 09:21

ri-heme reviewed Jul 12, 2024

View reviewed changes

Henry added 4 commits July 12, 2024 14:01

⏪ checkout developer version

980bbce

🎨 move shared key to base class

c26b2dd

🔥 remove comments and code duplications

05c1735

🎨 update type hints, remove unused import

c5002cd

Merge branch 'developer' into main

fe8c48b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add multiprocessing #92

add multiprocessing #92

LFT18 commented Apr 19, 2024

enryH Jun 8, 2024

LFT18 Jun 8, 2024

LFT18 Jun 9, 2024

LFT18 Jun 10, 2024

enryH Jul 9, 2024

mpielies Aug 12, 2024

enryH commented Jul 9, 2024

enryH commented Jul 10, 2024

ri-heme left a comment

ri-heme Jul 12, 2024

enryH Jul 12, 2024

ri-heme Jul 12, 2024

enryH Jul 12, 2024

enryH commented Jul 12, 2024

add multiprocessing #92

Are you sure you want to change the base?

add multiprocessing #92

Conversation

LFT18 commented Apr 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enryH commented Jul 9, 2024

enryH commented Jul 10, 2024

ri-heme left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enryH commented Jul 12, 2024