Continuing to work on Modelo

Modelo is the model I made when I participated in the AI data readiness challenge for the National Cancer Institute in April of 2024. It is a binary classifier that was trained on data from the National Lung Screening trial. It can predict whether Cancer is present or not in a CT lung cancer scan.

After the competition was over I revisited the model to see what I could improve. This is the result of that work.

The original model: Metric results

Accuracy 0.53055
Recall 0.58799
F-1 Score 0.60821
Precision 0.64713

After my changes: Metric results

Accuracy 0.73301
Recall 0.85185
F-1 Score 0.73290
Precision 0.77500

The original model can be found at the National Cancer Institute: https://computational.cancer.gov/model/aidr-challenge-tier-1-mcfarlin and this repo: https://github.com/oohtmeel1/AI-Data-Readiness-Challenge-for-the-NCI-Cancer-Research-Data-Commons

Requirements and usage: In order to be able to use the model you will need the following: Files will be in the folders of this repo if available.

The model takes about 800mb of space per forward pass.

pytorch 2.1.0 (requirements.txt should take care of that)

python 3.11.7

folder of JPEG images <- A single folder of JPEG images. Containing both positive and negative image files of lung cancer. I used DICOM2jpeg to convert the files. If you want to demo the model you can download all of the transformed jpeg images at this link (They require about 1GB of space). https://drive.google.com/drive/folders/1MLwxhcQmn7qXqy_iP2zDjLGyxa_G_jLS

train val test <- CSV files containing training labels. Demo files are located in the files folder of this repo.

loading_data_files.py <- Dataloader file, uses Pytorch.

defining_directories.py <- Defines file locations.

model_architecture.py <- File containing the various layers of my model.

begin_experiment.py <- File to run the experiment. Creates a tensorboard directory to log data, and saves models at several checkpoints. saves resulting metrics to a csv results.csv.

python3 -m venv <myenvname> <- Make sure to init that virtual enviornment.

Metrics

After the model makes a prediction it will compute the following metrics:

recall,accuracy,F1,precision and the predicted and true labels. In case other calculations are needed.

Additionally a csv file will be saved that will contain the names of the best models, and the metrics from above. I'll add more as I work on it.

Also here is the Citation for the images from the NCI I used from the IDC.

I downloaded everything using the IDC index package.It works really well now.

Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S., Aerts, H. J. W. L., Homeyer, A., Lewis, R., Akbarzadeh, A., Bontempi, D., Clifford, W., Herrmann, M. D., Höfener, H., Octaviano, I., Osborne, C., Paquette, S., Petts, J., Punzo, D., Reyes, M., Schacherer, D. P., Tian, M., White, G., Ziegler, E., Shmulevich, I., Pihl, T., Wagner, U., Farahani, K. & Kikinis, R. NCI Imaging Data Commons. Cancer Res. 81, 4188–4193 (2021). http://dx.doi.org/10.1158/0008-5472.CAN-21-0950 https://portal.imaging.datacommons.cancer.gov/explore/filters/?collection_id=nlst

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
code		code
files		files
Contributors.md		Contributors.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Continuing to work on Modelo

About

Releases

Packages

Languages

License

oohtmeel1/NLST_continuing_work

Folders and files

Latest commit

History

Repository files navigation

Continuing to work on Modelo

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages