fetch-dbgap-files

Code to fetch dbGaP files using sra-toolkit.

Installation and setup

The Dockerfile can be used to build a docker image that can be used to run the fetch.py script.

Alternatively, to run outside of the Docker image, you must install SRAToolkit. The code currently uses v3.0.10; it may work with other versions, but it is not guaranteed.

Preparation

Before running the script, you will need to use the dbGaP File Selector to select which files to download. From the My Requests section of the dbGaP authorized access webpage, locate the Data Access Rquest (DAR) for which you would like to download data. Then click on "Request Files" next to the DAR. On the new page, click on the "dbGaP File Selector" link.

Once in the dbGaP File selector, select which files you would like to download. After you have made your selection, toggle the "Selected" in the "Select" pane. You will need to download two files to use as input for the workflow:

"Cart file": the cart file containing the list of files to download in sratoolkit kart format.
"Files Table": the manifest file listing which files should be used to download.

Local usage

The fetch.py python script can be run locally to download dbGaP data.

Required inputs:

Argument	Description
`--ngc`	The path to the dbGaP project key for your dbGaP application
`--cart`	A cart file generated by the dbGaP File Selector
`--manifest`	A manifest file generated by the dbGaP File Selector
`--outdir`	The output directory where the data should be saved

Optional inputs:

Argument	Description
`--prefetch`	The path to the SRAToolkit prefetch binary
`--untar`	Flag the can be set if the script should untar any `.tar` or `.tar.gz` files into a directory with the same name as the archive (without extension). If set, the original `.tar` or `.tar.gz` archive will be deleted.

Because prefetch somestimes exits without error but without downloading all requested files, the script will attempt to download the files and compare agianst the manfiest; if all files were not downloaded initially, it will retry 3 times. Once all files are successfully downloaded, it will copy the files to the final requested outdir.

Note that if the fetch.py script crashes for some reason, you will have to restart from the beginning.

Running the workflow

A WDL workflow is also provided to download the files. The WDL automatically untars the files and deletes the original archive (by passing the --untar argument to fetch.py under the hood). The inputs to the WDL are as follows:

Required inputs:

Argument	Description
`ngc_file`	The path to the dbGaP project key for your dbGaP application
`cart_file`	A cart file generated by the dbGaP File Selector
`manifest_file`	A manifest file generated by the dbGaP File Selector
`output_directory`	The output directory where the data should be saved

Optional inputs:

Argument	Description
disk_gb	The hard disk size of the instance to use for downloading and untarring. If downloading a large volume of files, you may need to increase this value. (Default: 50)

The workflow can be found on Dockstore.

Caveats

Note that the project key (--ngc or ngc_file) is sensitive; do not share it with people who are not covered by your dbGaP application as it will allow them to download data. We recommend that you do not put the project key file in a Terra/AnVIL workspace that you are planning to share with other people. Instead, store it in a more protected workspace that is only shared with people covered by the dbGaP application.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.dockstore.yml		.dockstore.yml
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
fetch.py		fetch.py
fetch_dbgap_files.wdl		fetch_dbgap_files.wdl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fetch-dbgap-files

Installation and setup

Preparation

Local usage

Running the workflow

Caveats

About

Releases 3

Packages

Contributors 2

Languages

License

UW-GAC/fetch-dbgap-files

Folders and files

Latest commit

History

Repository files navigation

fetch-dbgap-files

Installation and setup

Preparation

Local usage

Running the workflow

Caveats

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Packages