Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitting the dataset definition to build and apply the DM algo separately #10

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

alainamstutz
Copy link
Contributor

Hi Venexia,
Meanwhile, I worked on splitting my dataset definition into two bits, based on our discussion before the holidays:

  1. generate_dataset_dm_algo (extract the min. dataset needed for the DM algo) and process it and run the DM algo (data_process_dm_algo), resulting in a dataset with just 8 variables
  2. generate_dataset (extract remaining variables) and process it and merge it with the output from data_process_dm_algo (data_process)

Now, I can get a larger dummy dataset (10'000 instead of 5000).

I used the base merge function (data_processed <- merge(data_processed, data_processed_dm_algo, by = "patient_id", all.x = TRUE), see data_process) hope that's ok.

Otherwise, I have not changed anything. If you have some time to double-check it (esp. the yaml), I would appreciate!
Thank you very much,
-Alain

@alainamstutz alainamstutz requested a review from venexia January 6, 2025 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant