MASLD Risk Prediction Project

Early Warning Tool for prioritizing individuals for screenings based on risk of MASLD related liver complications)

Documentation

Formulation

For every patient

who has had at least one clinical visit in the past 3 years
is 18+ years of age (and alive)
has not been diagnosed yet with MASLD related liver_complications or hepatitis or alcohol related or other liver complications

Predict the top k individuals (based on intervention capacity) who are at risk of having MASLD related liver complications in the next 3 years

Analysis to be done

Predict risk of having MASLD related liver complications in the next 3 years
Define Baselines:
- age, most recent fib4, co-morbidities
- clinical guidelines - t2dm, obesity, high tg, glucose, hdl, bp, ast or alt
Metric(s):
- Primary: Precision (PPV) or Recall (sensitivity) at top k (:warning: need to determine k based on capacity)
- AUC (if capacity is TBD)
Fairness metric: TPR disparity by Race, Gender

Methodology

Define cohort based on formulation: All patients > 18 years, at least one outpatient visit in the past 3 years, no previous diagnosis of liver-related complications or other liver-related diagnosis exclusions. sql file used in config
Define Outcome/Label based on formulation (will get diagnosed with X in the next z months): Liver complications (defined as development of cirrhosis or liver-related complications) developed in the next 3 years following prediction date sql file used in config
Define Training and Validation sets over time
Define and generate predictors: All features defined in [these config files](triage_config_files /feature_groups/)
Train Models on each training set and score all patients in the corresponding validation set
Evaluate all models for each validation time according to each metric (PPV at top k)
Select "Best" model based on results over time
Explore the high performing models to understand who they rank high, how they compare to the cohort, and important predictors
Check and/or correct for bias issues

Triage background

We are using Triage to build and select models. Some background and tutorials on Triage:

Tutorial on Google Colab - Are you completely new to Triage? Run through a quick tutorial hosted on google colab (no setup necessary) to see what triage can do!
Dirty Duck Tutorial - Want a more in-depth walk through of triage's functionality and concepts? Go through the dirty duck tutorial here with sample data
QuickStart Guide - Try Triage out with your own project and data
Suggested workflow
Understanding the configuration file
Installation: install triage in a python virtual environment

Running models and triage

Assuming Triage is installed and the data is in a postgres database. To run,

activate virtual environment source env/bin/activate
python run.py -c configfilename
- if running on sample database add --sampledb flag

Triage running - choices to Make

replace flag (set to false until we want to nuke everything)
save predictions (don't for the beginning)
number of processors to use

Config files, Model Selection, and Bias Analysis

current config file

The current one is here
File with design choices: google doc

Config file choices to make: example config file

cohort and label query: need to write a query that takes two parameters {as_of_date} and {label_timespan} and returns data in two columns (entity_id, outcome) specifying all the patients in the cohort as of {as_of_date} and outcomes can be 1 (got diagnosed with NASH/NAFLD related liver complications within the time period {label_timespan) from the {as_of_date}, 0 (did not get diagnosed), null (don't know or not sure). We can later turn nulls into 0s or ignore them.
temporal config parameters
features and imputation
subsets to analyze: prior NASH or NAFLD diagnosis
attributes to do bias audits for: sex, race

FIB-4 related analysis

Of all the patients who have FIB-4 components, how many should be in our cohort (don't have nash/nafld related complications yet), and what % of them end up having them in the next 3 years?

Variations Tested

The following feature sets were tested using manual_modeling.py. Takes 4 parameters:

training matrix
test matrix
model to build and test
feature set

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MASLD Risk Prediction Project

Early Warning Tool for prioritizing individuals for screenings based on risk of MASLD related liver complications)

Documentation

Formulation

Analysis to be done

Methodology

Triage background

Running models and triage

Config files, Model Selection, and Bias Analysis

current config file

Config file choices to make: example config file

FIB-4 related analysis

Variations Tested

All Features

Only using features for 12 months and all history (without 1,3,6 month features)

based on guidelines only

without liver-related labs and diagnosis

using top x features from the xgboost model

Files

README.md

Latest commit

History

README.md

File metadata and controls

MASLD Risk Prediction Project

Early Warning Tool for prioritizing individuals for screenings based on risk of MASLD related liver complications)

Documentation

Formulation

Analysis to be done

Methodology

Triage background

Running models and triage

Config files, Model Selection, and Bias Analysis

current config file

Config file choices to make: example config file

FIB-4 related analysis

Variations Tested

All Features

Only using features for 12 months and all history (without 1,3,6 month features)

based on guidelines only

without liver-related labs and diagnosis

using top x features from the xgboost model