Skip to content

ImagingDataCommons/IDC-Tutorials

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Welcome!

This repository contains tutorial materials (for the most part, as Python notebooks) that are developed to help you learn about NCI Imaging Data Commons and utilize it in your work.

If this is the first time you hear about IDC, you may want to check out our Getting Started documentation page. Here are some highlights about what IDC has to offer:

  • >85 TB of data: IDC contains radiology, brightfield (H&E) and fluorescence slide microscopy images, along with image-derived data (annotations, segmentations, quantitative measurements) and accompanying clinical data

  • free: all of the data in IDC is publicly available: no registration, no access requests

  • commercial-friendly: >95% of the data in IDC is covered by the permissive CC-BY license, which allows commercial reuse (small subset of data is covered by the CC-NC license); each file in IDC is tagged with the license to make it easier for you to understand and follow the rules

  • cloud-based: all of the data in IDC is available from both Google and AWS public buckets: fast and free to download, no out-of-cloud egress fees

  • harmonized: all of the images and image-derived data in IDC is harmonized into standard DICOM representation

The tutorial notebooks are located in the notebooks, and are organized in the following folders.

"Getting Started" python notebooks are intended to introduce the users to IDC.

  • Basics of using IDC data programmatically: learn how to use idc-index python package to programmatically search and download IDC data, visualize images and annotations, build cohorts and checking acknowledgments and liceses for the data included in your cohort.
  • Searching clinical data: identify clinical and other non-imaging data accompanying imaging collections in IDC using idc-index python package and duckdb.
  • Advanced searching using BigQuery: access all of the metadata to build comprehensive queries and detailed cohort selection criteria.

Notebooks in this folder focus on topics that will require understanding of the basics, and aim to address more narrow use cases of IDC usage.

  • Searching DICOM private tags: all of DICOM attributes for the imaging data in IDC are searchable using BigQuery. DICOM private tags often contain critical information, such as diffusion b-values, but are a bit more tricky to access from BigQuery. In this tutorial you will learn how to accomplish this.
  • Using BigQuery for searching IDC clinical data: BigQuery is an alternative to idc-index and duckdb for searching clinical data. This tutorial demonstrates more capabilities compared to the introductory clinical data usage tutorial.

These notebooks can be used to deploy your own cloud-based instance of OHIF or Slim viewers using Google Firebase, which you can use to visualize analysis results you generated for IDC data, or to work with your own images. These tutorials utilize free tier of Firebase, and so there is no cost to keep the deployed viewers available in the cloud.

This folders contains notebooks that demonstrate the usage of the data in the specific IDC collections. The notebooks in this folder will always have the prefix of the collection_id they correspond to, for easier navigation.

  • Using hiplot for exploring prostate MRI in IDC: this notebook demonstrates how hiplot, an open source package for high-dimensional parameter visualization, for examining various MRI acquisition parameters for the prostate MRI images available in IDC.
  • Visible Human Project exploration: demonstration of searching and visualizing images from the National Library of Medicine Visible Human Project available on IDC.
  • RMS-Mutation-Prediction collection exploration: notebooks in this folder demonstrate selecting images from the RMS-Mutation-Prediction collection based on various attributes of images and expert annotations.
  • NLST collection exploration: explanation of the content included in the IDC NLST collection, and how it is different from the NLST collection you will find in TCIA.
  • Working with NLST clinical data in IDC: demonstration of how to access and search clinical data tables accompanying the IDC NLST collection, and how to combine clinical data with imaging metadata.

This folder is dedicated to the notebooks focused on the digital pathology (pathomics) applications. The use of DICOM standard is relatively new in digital pathology, and this field is being actively developed, thus a dedicated folder for this.

Demonstrations/examples of analyses of images from IDC.

  • MedSAM on IDC: learn how to experiment with MedSAM on the images available from IDC.
  • MHub.ai with IDC data: MHub.ai is a platform for Deep Learning models in medical imaging, which are interoperable with IDC and can be applied directly to the IDC DICOM images. Learn how to get started from this notebook!

Here you will find an archive of the notebooks that were used in tutorials, which at times may demonstrate experimental features. By design, the notebooks presented at specific events may not be updated after the event, and are stored in this folder for archival purposes.

IDC is an actively evolving resource. As we develop new and improved capabilities, we improve our recommended usage practices, and may deprecate notebooks that are no longer maintained and may no longer work. You will find thse in the deprecated folder.

testing

This directory is used for the maintenance of the repository to support testing of the actively supported notebooks.

Support

If you have any questions about the notebooks in this repository, please open a discussion thread in IDC user forum, or open the issue in this repository.