-
Notifications
You must be signed in to change notification settings - Fork 4
Downloading GDC Data
The pipeline currently requires that the data being transformed is available on the filesystem. To download data, use the GDC Portal to generate a manifest file containing all of the files desired, then use the GDC Data Transfer Tool to download the files.
GDC Portal: https://portal.gdc.cancer.gov/
GDC Data Transfer Tool: https://docs.gdc.cancer.gov/Data_Transfer_Tool/Users_Guide/Getting_Started/
Downloading data with the data transfer tool can be done on the command line as follows:
gdc-client download -m <MANIFEST_FILE.txt>
After the data has been downloaded, you can call the pipeline like normal:
$JAVA_HOME/bin/java -jar target/gdcpipeline-0.0.1-SNAPSHOT.jar -c <CANCER_STUDY_NAME> -m <MANIFEST_FILE> -o <OUTPUT_DESTINATION> -s <DOWNLOADED_RAW_FILES_LOCATION>