Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model for prediction of distant relapse #5

Open
plbenveniste opened this issue Jun 18, 2024 · 1 comment
Open

Model for prediction of distant relapse #5

plbenveniste opened this issue Jun 18, 2024 · 1 comment

Comments

@plbenveniste
Copy link
Owner

In this issue, we detail the work done to build a model able to predict the occurrence of a distant relapse (meaning non-local).

For each participant, the values were averaged across nodules. We consider to have a distant relapse if any of these relapses occurred:

  • rechute_homo
  • rechute_contro
  • rechute_horspoum
  • rechute_med
    We computed the delay of occurrence of a distant relapse as the average of the delays of occurrence of any of the above mentioned relapses. Here is the distribution of the delays of occurrence of these relapses.
    figure_distant_relapse

Number of subjects which had a distant relapse: 67
Number of subject for training: 130
Number of subject for testing: 33

Here are the performances of the 3 models trained:

  1. Model without any occurrence deadline:
    Model performance:

    • ROC AUC Score: 0.726923076923077
    • Brier score: 0.19839296981051371
    • Average precision: 0.7777777777777778
    • Average Recall: 0.5384615384615384
    • Accuracy Score: 0.7575757575757576
    • AUC-PR score: 0.749028749028749
  2. Model for prediction of distant relapse within 1 year:
    Number of subjects that had a distant relapse within 1 year: 23
    Number of subjects that had a distant relapse within 1 year (train): 21
    Number of subjects that had a distant relapse within 1 year (test): 2
    Model performance on the deadline of 1 year

    • ROC AUC Score: 0.11290322580645162
    • Brier score: 0.1145299780607971
    • Average precision: 0.0
    • Average Recall: 0.0
    • Accuracy Score: 0.8787878787878788
    • AUC-PR score: 0.030303030303030304
  3. Model for prediction of distant relapse within 3 years:
    Number of subjects that had a distant relapse within 3 year (train): 48
    Number of subjects that had a distant relapse within 3 year (test): 13
    Model performance on the deadline of 3 year

    • ROC AUC Score: 0.6269230769230769
    • Brier score: 0.2566913466094455
    • Average precision: 0.5454545454545454
    • Average Recall: 0.46153846153846156
    • Accuracy Score: 0.6363636363636364
    • AUC-PR score: 0.6095571095571095

Hyper-parameter wasn't done since it proved to lower performance as explained in this comment.

@plbenveniste
Copy link
Owner Author

plbenveniste commented Jun 20, 2024

Same as in this comment, I added train/test splitting across sites, removal of features and hyperparameter finetuning.

Here is the model output:

Feature data shape: (163, 139)
Target data shape: (163, 2)
Number of subjects which had a distant relapse: 67

Number of subject for training: 136
Number of subject for testing: 27


Model performance without any occurence deadline
ROC AUC Score:  0.6172839506172839
Brier score: 0.2704013019646528
Average precision: 0.5
Average Recall: 0.5555555555555556
Accuracy Score:  0.6666666666666666
AUC-PR score: 0.6018518518518519

Number of subjects that had a distant relapse within 1 year: 23
Number of subjects that had a distant relapse within 1 year (train): 18
Number of subjects that had a distant relapse within 1 year (test): 5
Model performance on the deadline of 1 year
ROC AUC Score: 0.42727272727272725
Brier score: 0.18168762483647793
Average precision: 0.0
Average Recall: 0.0
Accuracy Score: 0.7777777777777778
AUC-PR score: 0.09259259259259259


Number of subjects that had a distant relapse within 3 year: 61

Number of subjects that had a distant relapse within 3 year (train): 52
Number of subjects that had a distant relapse within 3 year (test): 9
Model performance on the deadline of 3 year
ROC AUC Score: 0.6851851851851851
Brier score: 0.26541833623235794
Average precision: 0.4166666666666667
Average Recall: 0.5555555555555556
Accuracy Score: 0.5925925925925926
AUC-PR score: 0.5601851851851852


Initial number of features: 28
Number of subject for training: 136
Number of subject for testing: 27


Model performance without any occurence deadline
ROC AUC Score:  0.47530864197530864
Brier score: 0.3582369294948105
Average precision: 0.36363636363636365
Average Recall: 0.4444444444444444
Accuracy Score:  0.5555555555555556
AUC-PR score: 0.49663299663299665


Number of features after variance thresholding: 25
Number of features removed by variance thresholding: 3


Model performance after variance thresholding
ROC AUC Score:  0.5802469135802469
Brier score: 0.3231744471903118
Average precision: 0.4
Average Recall: 0.4444444444444444
Accuracy Score:  0.5925925925925926
AUC-PR score: 0.5148148148148148


Number of features after correlation thresholding: 23
Number of features removed by correlation thresholding: 2


Model performance after feature selection based on correlation
ROC AUC Score:  0.617283950617284
Brier score: 0.32686233103625645
Average precision: 0.42857142857142855
Average Recall: 0.6666666666666666
Accuracy Score:  0.5925925925925926
AUC-PR score: 0.6031746031746031


Number of features after correlation with target thresholding: 13
Number of features removed by correlation with target thresholding: 10


Final features of the model:
['age', 'BMI', 'OMS', 'tabac_sevre', 'histo', 'T', 'etalement', 'MORPHOLOGICAL_Compacity', 'INTENSITY-BASED_IntensityInterquartileRange', 'INTENSITY-BASED_AreaUnderCurveCIVH', 'GLCM_ClusterProminence', 'GLRLM_RunLengthNonUniformity', 'NGTDM_Contrast']


Model performance after feature selection based on correlation with target
ROC AUC Score:  0.5802469135802469
Brier score: 0.3452606850303094
Average precision: 0.4
Average Recall: 0.4444444444444444
Accuracy Score:  0.5925925925925926
AUC-PR score: 0.5148148148148148


Model performance after hyperparameter tuning
ROC AUC Score:  0.691358024691358
Brier score: 0.30101305793088085
Average precision: 0.5
Average Recall: 0.7777777777777778
Accuracy Score:  0.6666666666666666
AUC-PR score: 0.6759259259259259

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant