Skip to content

Downloading GDC Data

Zachary Heins edited this page Aug 8, 2019 · 1 revision

The pipeline currently requires that the data being transformed is available on the filesystem. To download data, use the GDC Portal to generate a manifest file containing all of the files desired, then use the GDC Data Transfer Tool to download the files.

GDC Portal: https://portal.gdc.cancer.gov/

GDC Data Transfer Tool: https://docs.gdc.cancer.gov/Data_Transfer_Tool/Users_Guide/Getting_Started/

Downloading data with the data transfer tool can be done on the command line as follows:

gdc-client download -m <MANIFEST_FILE.txt>

After the data has been downloaded, you can call the pipeline like normal:

$JAVA_HOME/bin/java -jar target/gdcpipeline-0.0.1-SNAPSHOT.jar -c <CANCER_STUDY_NAME> -m <MANIFEST_FILE> -o <OUTPUT_DESTINATION> -s <DOWNLOADED_RAW_FILES_LOCATION>

Clone this wiki locally