add omop teva module (#61)

aphp · Jun 11, 2024 · 3a70b96 · 3a70b96
1 parent da3ed10
commit 3a70b96
Show file tree

Hide file tree

Showing 21 changed files with 30,670 additions and 2 deletions.
diff --git a/.gitignore b/.gitignore
@@ -120,3 +120,6 @@ ENV/
 Biology_summary/*
 my_custom_config.csv
 eds_scikit/biology/viz_other/
+
+# Plot test
+omop_teva/*
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -38,7 +38,7 @@ repos:
     hooks:
       - id: blacken-docs
         additional_dependencies: [black==20.8b1]
-        exclude: notebooks/
+        exclude: '(notebooks/|(.*)configuration-omop)'
   - repo: https://github.com/pycqa/flake8
     rev: 4.0.1
     hooks:

diff --git a/changelog.md b/changelog.md
@@ -6,6 +6,9 @@
 - Pyarrow fix now work on spark executors.
 - Fix OMOP _date columns issue
 
+### Added
+- omop teva module
+
 ## v0.1.7 (2024-04-12)
 ### Changed
 - Support for pyarrow > 0.17.0

diff --git a/docs/functionalities/omop-teva/configuration-omop.md b/docs/functionalities/omop-teva/configuration-omop.md
@@ -0,0 +1,86 @@
+# OMOP Teva - Config
+
+All plots generated by ```generate_omop_teva``` are based on the configuration file ```eds_scikit.plot.default_omop_teva_config```.
+
+## Table configuration
+
+A table configuration is defined by 3 parameters :
+
+- __category columns__ list
+- __date column__
+- category columns __mapping__
+
+Here is two possible configurations for OMOP condition table :
+
+=== "Default condition teva configuration"
+
+    ```python
+    "condition_occurrence": {
+        "category_columns": [
+            "visit_occurrence_id",
+            "care_site_short_name",
+            "condition_source_value",
+            "stay_source_value",
+            "visit_source_value",
+            "admission_reason_source_value",
+            "visit_type_source_value",
+            "destination_source_value",
+            "cdm_source",
+        ],
+        "date_column": "condition_start_datetime",
+        "mapper": {
+            "visit_occurrence_id": {"not NaN": ".*"},
+            "condition_source_value": {"not NaN": ".*"},
+        },
+    },
+    ```
+
+=== "Custom diabete condition teva configuration"
+
+    ```python
+    "condition_occurrence": {
+        # (1) Some columns were removed .
+        "category_columns": [
+            "visit_occurrence_id",
+            "care_site_short_name",
+            "condition_source_value",
+            "visit_source_value",
+            "visit_type_source_value",
+            "cdm_source",
+        ],
+        # (2) Date column remain the same .
+        "date_column": "condition_start_datetime",
+        "mapper": {
+            "visit_occurrence_id": {"not NaN": ".*"},
+            # (3) Mapping to diabetic conditions .
+            "condition_source_value": {"has_diabete": r"^E10|^E11|^E12|^E13|^E14|O24"},
+        },
+    },
+    ```
+
+
+## Specifying table configuration
+
+To specify configuration, simply update ```default_omop_teva_config``` and pass it to ```generate_omop_teva```.
+
+```python
+from eds_scikit.plot import generate_omop_teva
+from eds_scikit.io.omop_teva_default_config import default_omop_teva_config
+
+omop_teva_config = default_omop_teva_config
+
+condition_mapper = {
+    "condition_source_value": {"has_diabete": r"^E10|^E11|^E12|^E13|^E14|O24"}
+}
+
+omop_teva_config["condition_occurrence"]["mapper"].update(condition_mapper)
+
+start_date, end_date = "2021-01-01", "2021-12-01"
+generate_omop_teva(data=data,
+                   start_date=start_date,
+                   end_date=end_date,
+                   teva_config=omop_teva_config)
+```
+
+!!! warning "Adding a new table in default_omop_teva_config"
+    Feel free to add any new table in the configuration. Just make sure it has a ```visit_occurrence_id``` column.