Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modularising code and adding measures framework #3

Merged
merged 4 commits into from
Jan 7, 2025

Conversation

milanwiedemann
Copy link
Member

@milanwiedemann milanwiedemann commented Jan 6, 2025

The initial code looked like a great start! Here the main changes I tried to implement in this PR.

  1. breaking the code down into more modules to make it easier to reuse code
  2. Introduce the measures framework to run the same query for multiple time intervals, something you reloved with a workaround before in the dataset definition before

Some other things I noticed or added:

  • Currently you're only looking at whether someone has a coded ADHD event. There is also Attention Deficit Hyperactivity Disorder in remission Codelist that you could look at. For example, rule 2 of the LDOB091 indicator (see Learning Disabilities data 2019-2020 business rules version 3.0) is defined as:

    Select patients passed to this rule whose latest diagnosis of ADHD is not in remission up to and including the reporting period end date. Reject the remaining patients.

  • I removed your analysis script report.py for now as I wasnt sure the content was still needed. We can explore in our next session what analysis would be good to run as a first test

  • I categoried age into age bands so that this can be used as a group_by in the measures for the prevalence following this example in our docs.

My main aim with these changes was to make it easier for us to execute code in the secure environment soon and also to introduce ehrQL concepts you hadn't used before. See what you think and let's discuss later this week.

Creating a separate module will make it easier to share this code across different dataset definitions that we might be working on for this project. This also fixes the path to themethylphenidate_codelist, before the path was pointing to the same NHSD Primary Care Refset for ADHD.
The main change here is to take out looping through the years, there is another framework in ehrQL, measures, that does this for us. I also switched the naming of patient attributes to have the prefix has (instead of had), we often use this naming convention for our studies. To shorten the code I also removed some of the variables that were only used to define the population and explicitly mention them only inside the define_population function.
This is using the ehrQL measures framework to run the same query for multiple time intervals. I tried to match your original 7 years, although I'd recommend to reduce this to 3 years for the first run to keep the run times short.
@milanwiedemann milanwiedemann requested a review from quan14 January 6, 2025 17:56
@quan14
Copy link
Contributor

quan14 commented Jan 7, 2025

All the changes looks good

@quan14 quan14 merged commit 112f278 into main Jan 7, 2025
1 check passed
@quan14 quan14 deleted the milanwiedemann/dataset_def_review branch January 7, 2025 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants