Splitting the dataset definition to build and apply the DM algo separately #10

alainamstutz · 2025-01-06T13:45:18Z

Hi Venexia,
Meanwhile, I worked on splitting my dataset definition into two bits, based on our discussion before the holidays:

generate_dataset_dm_algo (extract the min. dataset needed for the DM algo) and process it and run the DM algo (data_process_dm_algo), resulting in a dataset with just 8 variables
generate_dataset (extract remaining variables) and process it and merge it with the output from data_process_dm_algo (data_process)

Now, I can get a larger dummy dataset (10'000 instead of 5000).

I used the base merge function (data_processed <- merge(data_processed, data_processed_dm_algo, by = "patient_id", all.x = TRUE), see data_process) hope that's ok.

Otherwise, I have not changed anything. If you have some time to double-check it (esp. the yaml), I would appreciate!
Thank you very much,
-Alain

alainamstutz added 7 commits December 20, 2024 14:37

updated flow

ca86d67

Update project.yaml

6e62079

Update data_process_dm_algo.R

23a5daa

Update project.yaml

b887821

Update project.yaml

63d68d6

Update data_process.R

0a01393

adapted the yaml to include the split correctly in the data flow

8b08a48

alainamstutz requested a review from venexia January 6, 2025 13:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Splitting the dataset definition to build and apply the DM algo separately #10

Splitting the dataset definition to build and apply the DM algo separately #10

alainamstutz commented Jan 6, 2025

Splitting the dataset definition to build and apply the DM algo separately #10

Are you sure you want to change the base?

Splitting the dataset definition to build and apply the DM algo separately #10

Conversation

alainamstutz commented Jan 6, 2025