-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel manifest.txt file download for rsync_from_ncbi.pl #890
base: master
Are you sure you want to change the base?
Conversation
changed rsync code block slightly, to allow parallel file downloads (determined by environment variable KRAKEN2_THREAD_CT). First, manifest.txt will be split up into KRAKEN2_THREAD_CT number of temporary files, then rsync will be called on every temporary file, and finally, all temporary files are removed. The linux CLI packaged 'parallel' must be installed.
We wrote k2 wrapper which is part the kraken 2 repo and provides parallel downloads. |
@ch4rr0 k2 download-library --library archaea --threads 48 --fast-build --no-mask --db test gives me usage: kraken2 [-h] {add-to-library,download-library,download-taxonomy,build,classify,inspect,clean} ...
kraken2: error: unrecognized arguments: --threads 48 --fast-build Just FYI, I installed this with conda using the following command... mamba create --name kraken2 -c nvidia -c bioconda -c conda-forge python=3.11 cudatoolkit kraken2 parallel awscli -y activating it... conda activate kraken2 creating a directory... mkdir test adding the taxonomy... k2 download-taxonomy --db test and finally, trying to add a library: k2 download-library --library archaea --threads 48 --fast-build --no-mask --db test |
—fast-build does not work with download-library.To build a library you would need to issue the following steps:k2 download-taxonomy —db fook2 download-library —library archaea —db foo —threads 6k2 build —fast-build —threads 6 —db fooSee k2 —help for more info.On Nov 21, 2024, at 12:24 PM, Jonathan Cosme ***@***.***> wrote:
@ch4rr0
running
k2 download-library --library archaea --threads 48 --fast-build --no-mask --db test
gives me
usage: kraken2 [-h] {add-to-library,download-library,download-taxonomy,build,classify,inspect,clean} ...
kraken2: error: unrecognized arguments: --threads 48 --fast-build
Just FYI, I installed this with conda using the following command...
mamba create --name kraken2 -c nvidia -c bioconda -c conda-forge python=3.11 cudatoolkit kraken2 parallel awscli -y
activating it...
conda activate kraken2
creating a directory...
mkdir test
adding the taxonomy...
k2 download-taxonomy --db test
and finally, trying to add a library:
k2 download-library --library archaea --threads 48 --fast-build --no-mask --db test
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
the --fast-build argument is irrelevant. k2 download-library --library archaea --threads 6 --db test I still get this error kraken2: error: unrecognized arguments: --threads 6 The problem is that k2 download-library doesn't accept the --threads parameter, which means it will default to using 1 thread, i.e. downloading one file at a time. k2 download-library --library archaea --db test it took 4 minutes to download 171 files out of 620 (then it got stuck). kraken2-build --download-library archaea --threads 48 --db test took 27 seconds to download all the files, and process them (I stopped it when it starting the masking task). |
Ah, I know why. It's very likely that you're using the k2 packaged with the latest release of Kraken2. You will need to fetch the latest changes from the Kraken2 repository. In any event with 12 threads it took 29 seconds to download and process the archaea library with masking disabled, 97 seconds with masking turned on. |
Here is proof: Edit: removed incorrect log |
I have not done a good job at marketing the script to the user base. It will be included in the next release of kraken2. I think the conda recipe also has to be updated to reference |
Had the same issue with k2:
All good afterwards! |
changed rsync code block slightly, to allow parallel file downloads (determined by environment variable KRAKEN2_THREAD_CT). First, manifest.txt will be split up into KRAKEN2_THREAD_CT number of temporary files, then rsync will be called on every temporary file, and finally, all temporary files are removed. The linux CLI packaged 'parallel' must be installed.