Skip to content

Latest commit

 

History

History
57 lines (42 loc) · 3.36 KB

readme.md

File metadata and controls

57 lines (42 loc) · 3.36 KB

Usage Information:

DSCI_train

Description:

Takes training patient features as input and outputs trained model for privileged information prediction. Used ML model: KRVFL+ from the paper (P.-B. Zhang and Z.-X. Yang, “A new learning paradigm for random vector functional-link network: Rvfl+,” Neural Networks, vol. 122, pp. 94–105, 2020.)

Requirements:

  • All three input .csv or .xlsx files must have the same number of rows (i.e. same number of patients).
  • All three input .csv or .xlsx files must have column names indicating feature names.
  • All three input .csv or .xlsx files must have "patientId" as the first column and "group" as the second column, indicating patient names and labels respectively.

Inputs:

  • [BL_path]: .csv or .xlsx file containing Biological features. Name of file must contain the substring "train_bl".
  • [CL_path]: .csv or .xlsx file containing Clinical features. Name of file must contain the substring "train_cl".
  • [CT_path]: .csv or .xlsx file containing Condyle Trabecular features and 3D measurement features. Name of file must contain the substring "train_ct".
  • [folder_path]: directory to output trained model to.

Output:

  • file named "TMJPI_train.mat" that contains the best performing ensemble of models trained from the input features. The file is output to the [folder path] directory

  • file named "TMJPI_train_results.csv" that contains the scores for each evaluation metrixs.

Example usage:

./DSCI_train [BL_path] [CL_path] [CT_path] [folder path]


DSCI_predict

Description:

Takes testing patient features as input and outputs scores and predictions for each testing patient.

Requirements:

  • All two input .csv or .xlsx files must have the same number of rows (i.e. same number of patients).
  • All two input .csv or .xlsx files must have column names indicating feature names.
  • All two input .csv or .xlsx files must have "patientId" as the first column indicating patient names. No "group" column or patient label column should be present.
  • The order of features in the test files must be identical to those in training inputs.

Inputs:

  • [CL_path]: .csv or .xlsx file containing Clinical features. Name of file must contain the substring "test_cl".
  • [CT_path]: .csv or .xlsx file containing Condyle Trabecular features and 3D measurement features. Name of file must contain the substring "test_ct".
  • [model path]: path to previously trained "TMJPI_train.mat" file.
  • [folder_path]: directory to output prediction file to.

Outputs:

Files output to [Folder path].

  • .mat file named "TMJPI_pred.mat" that contains the model scores and prediction (either 1 or 0 for OA or non-OA) for each testing patient.
  • .csv file named "TMJPI_pred.csv" that contains the model scores and prediction (either 1 or 0 for OA or non-OA) for each testing patient.

Example usage:

./DSCI_predict[CL_path] [CT_path] [model path] [folder path]

Notes:

  • One more random seed for repeated cross-validation has been added to TMJPI compared to the published paper. We do 11 times five-fold CV to avoid the edge case in which the half/half model votes for negative and positive.
  • We calculated the final prediction score by binarizing the prediction probability with the optimal training threshold in each model. The most frequency category counted for the final prediction result.