Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blastn #81

Closed
wants to merge 69 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
b2e2e06
added chunk_busco module
May 23, 2023
a833404
added diamond_blastx module
alxndrdiaz Jun 2, 2023
94b44c9
added blastx_cols and blastx_outext
alxndrdiaz Jun 2, 2023
46ae681
fix DIAMOND_BLASTX path
alxndrdiaz Jun 2, 2023
b5594c4
added module BLOBTOOLKIT_UNCHUNK
alxndrdiaz Jun 3, 2023
4c3e944
diamond_blastx data for test_full
alxndrdiaz Jun 3, 2023
40c212a
removed log files
alxndrdiaz Jun 8, 2023
0ae6480
fix path to uniprot_blastx
alxndrdiaz Jun 8, 2023
a7502f0
renamed files in RUN_BLASTX subworkflow
alxndrdiaz Jun 8, 2023
8de106a
use meta and meta2
alxndrdiaz Jun 8, 2023
5466144
use names in RUN_BLASTX input channels
alxndrdiaz Jun 8, 2023
d667473
added uniprot_blastx
alxndrdiaz Jun 8, 2023
49cb8bf
merge blastx results in BlobDir
Jun 8, 2023
abc2b97
minimum Nextflow version 23.04.1
alxndrdiaz Jun 8, 2023
7bd7873
update uniprot databases
alxndrdiaz Jun 8, 2023
cec0022
updated paths to uniprot databases
alxndrdiaz Jun 9, 2023
ddb9645
Update conf/test.config
alxndrdiaz Jun 19, 2023
d814e5f
use new names for uniprot databases
alxndrdiaz Jun 19, 2023
3bc14aa
Update modules/local/blobtoolkit/chunk.nf
alxndrdiaz Jun 19, 2023
72f040c
update names for uniprot databases
alxndrdiaz Jun 19, 2023
0dbc9b6
fix description
alxndrdiaz Jun 19, 2023
c8bef6c
update uniprot database names
alxndrdiaz Jun 19, 2023
1c9f9c4
check params.blastp and params.blastx
alxndrdiaz Jun 19, 2023
8f3f719
independent channels in RUN_BLASTX
alxndrdiaz Jun 19, 2023
0c61987
added module NOHIT_LIST
alxndrdiaz Jun 22, 2023
8c3936a
added subworkflow RUN_BLASTN
alxndrdiaz Jun 22, 2023
5c6365a
NOHIT_LIST parameters
alxndrdiaz Jun 22, 2023
0a988e4
include RUN_BLASTN subworkflow
alxndrdiaz Jun 22, 2023
20a4784
use shell block
alxndrdiaz Jun 22, 2023
d9411a0
update RUN_BLASTN input
alxndrdiaz Jun 22, 2023
c61f987
use script from bin
alxndrdiaz Jun 22, 2023
3d9ba45
installed module seqtk
alxndrdiaz Jun 26, 2023
a1064bc
include SEQTK_SUBSEQ module
alxndrdiaz Jun 27, 2023
59edf87
added SEQTK_SUBSEQ version
alxndrdiaz Jun 27, 2023
ab5a036
install module BLAST_BLASTN
alxndrdiaz Jun 27, 2023
d8afe33
blastn database mMelMel3.1
Jul 11, 2023
2811438
include BLASTN module and nucleotide database
alxndrdiaz Jul 11, 2023
e5c1bbf
remove typo
alxndrdiaz Jul 11, 2023
e6e1674
use combine instead of join
alxndrdiaz Jul 12, 2023
297bfcd
BLAST_BLASTN args
alxndrdiaz Jul 12, 2023
afcd65b
single quotes in BLAST_BLASTN args
alxndrdiaz Jul 12, 2023
4c5c068
rename outpur channel
alxndrdiaz Jul 12, 2023
f8fffa3
include chunk module
alxndrdiaz Jul 12, 2023
4821996
include unchunk module
alxndrdiaz Jul 13, 2023
3f4bb20
add blastn results to BlobDir
alxndrdiaz Jul 13, 2023
0e3f6ba
RUN_BLASTX description
alxndrdiaz Jul 13, 2023
54413ec
RUN_BLASTN description
alxndrdiaz Jul 13, 2023
c4a25d4
change prefix
alxndrdiaz Jul 13, 2023
3b2743c
new output: taxon_id
alxndrdiaz Jul 18, 2023
43aaf96
local BLAST module
alxndrdiaz Jul 19, 2023
79ae90d
args for BLASTN module
alxndrdiaz Jul 19, 2023
67694df
use local BLASTN module
alxndrdiaz Jul 19, 2023
ded1145
add taxon_id input
alxndrdiaz Jul 19, 2023
7aef344
use -negative_taxids
alxndrdiaz Aug 1, 2023
65df189
conditional blastn search
alxndrdiaz Aug 1, 2023
5ab0f72
ignore template_strings check
alxndrdiaz Aug 1, 2023
00983c6
ignore merge_markers check
alxndrdiaz Aug 1, 2023
2197033
run prettier
alxndrdiaz Aug 1, 2023
1454bb3
removed schema_ignore_params
alxndrdiaz Aug 9, 2023
30860d3
add schema validation options
alxndrdiaz Aug 9, 2023
2bb54a4
overwritten with template file
alxndrdiaz Aug 9, 2023
970a5f0
ignore blastn database files
alxndrdiaz Aug 10, 2023
d77944d
update path to ignore
alxndrdiaz Aug 10, 2023
3c501ac
use paths instead of paths-ignore
alxndrdiaz Aug 10, 2023
cf71ade
update local path
alxndrdiaz Aug 10, 2023
ee02484
remove path to ignore
alxndrdiaz Aug 10, 2023
d3e1258
ignore blastn database files
alxndrdiaz Aug 10, 2023
cea0576
add blastn full test files
Sep 1, 2023
5c441db
Update .editorconfig
alxndrdiaz Sep 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,12 @@ insert_final_newline = unset
trim_trailing_whitespace = unset
indent_style = unset
indent_size = unset

# To prevent errors for these test blastn databases
[/assets/test*/nt_*/*.{ndb,nhr,nin,nog,nos,not,nsq,ntf,nto}]
charset = unset
end_of_line = unset
insert_final_newline = unset
trim_trailing_whitespace = unset
indent_style = unset
indent_size = unset
1 change: 0 additions & 1 deletion .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,4 +110,3 @@ To get started:
Devcontainer specs:

- [DevContainer config](.devcontainer/devcontainer.json)
- [Dockerfile](.devcontainer/Dockerfile)
2 changes: 2 additions & 0 deletions .nf-core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,5 @@ lint:
multiqc_config:
- report_comment
actions_ci: false
template_strings: false
merge_markers: false
Binary file not shown.
Binary file added assets/test/mMelMel3.1.buscoregions.dmnd
Binary file not shown.
Binary file added assets/test/nt_mMelMel3.1/nt_mMelMel3.1.ndb
Binary file not shown.
Binary file added assets/test/nt_mMelMel3.1/nt_mMelMel3.1.nhr
Binary file not shown.
Binary file added assets/test/nt_mMelMel3.1/nt_mMelMel3.1.nin
Binary file not shown.
Binary file added assets/test/nt_mMelMel3.1/nt_mMelMel3.1.nog
Binary file not shown.
Binary file added assets/test/nt_mMelMel3.1/nt_mMelMel3.1.nos
Binary file not shown.
Binary file added assets/test/nt_mMelMel3.1/nt_mMelMel3.1.not
Binary file not shown.
Binary file added assets/test/nt_mMelMel3.1/nt_mMelMel3.1.nsq
Binary file not shown.
Binary file added assets/test/nt_mMelMel3.1/nt_mMelMel3.1.ntf
Binary file not shown.
Binary file added assets/test/nt_mMelMel3.1/nt_mMelMel3.1.nto
Binary file not shown.
Binary file added assets/test_full/gfLaeSulp1.1.buscoregions.dmnd
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added assets/test_full/nt_gfLaeSulp1.1/nt_gfLaeSulp1.1.nin
Binary file not shown.
Binary file not shown.
Binary file added assets/test_full/nt_gfLaeSulp1.1/nt_gfLaeSulp1.1.nos
Binary file not shown.
Binary file added assets/test_full/nt_gfLaeSulp1.1/nt_gfLaeSulp1.1.not
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
15 changes: 15 additions & 0 deletions bin/nohitlist.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/bin/bash

# input
fasta=$1
blast=$2
prefix=$3
E=$4

# find ids of sequences with no hits in the blastx search
grep '>' $fasta | \
grep -v -w -f <(awk -v evalue="$E" '{{if($14<{evalue}){{print $1}}}}' $blast | sort | uniq) | \
cut -f1 | sed 's/>//' > $prefix.nohit.txt



26 changes: 25 additions & 1 deletion conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,16 @@ process {
ext.args = "--evalue 1.0e-25 --max-target-seqs 10 --max-hsps 1"
}

withName: "DIAMOND_BLASTX" {
ext.args = "--evalue 1.0e-25 --max-target-seqs 10 --max-hsps 1"
}

withName: "BLOBTOOLKIT_WINDOWSTATS" {
ext.args = "--window 0.1 --window 0.01 --window 1 --window 100000 --window 1000000"
}

withName: "BLOBTOOLKIT_BLOBDIR" {
ext.args = "--evalue 1.0e-25 --hit-count 10"
ext.args = "--evalue 1.0e-25 --hit-count 10 --update-plot"
publishDir = [
path: { "${params.outdir}/" },
mode: params.publish_dir_mode,
Expand All @@ -66,6 +70,26 @@ process {
]
}

withName: "BLOBTOOLKIT_CHUNK" {
ext.args = "--chunk 100000 --overlap 0 --max-chunks 10 --min-length 1000"
}

withName: "BLOBTOOLKIT_UNCHUNK" {
ext.args = "--count 10"
}

withName: "NOHIT_LIST" {
ext.args = "1.0e-25"
}

withName: "BLAST_BLASTN" {
ext.args = "-outfmt '6 qseqid staxids bitscore std' -max_target_seqs 10 -max_hsps 1 -evalue 1.0e-10 -lcase_masking -dust '20 64 1'"
}

withName: "BLASTN" {
ext.args = "-outfmt '6 qseqid staxids bitscore std' -max_target_seqs 10 -max_hsps 1 -evalue 1.0e-10 -lcase_masking -dust '20 64 1'"
}

withName: "CUSTOM_DUMPSOFTWAREVERSIONS" {
publishDir = [
path: { "${params.outdir}/blobtoolkit_info" },
Expand Down
8 changes: 5 additions & 3 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,9 @@ params {
taxon = "Meles meles"

// Databases
taxdump = "/lustre/scratch123/tol/teams/grit/geval_pipeline/btk_databases/taxdump"
busco = "/lustre/scratch123/tol/resources/nextflow/busco_2021_06_reduced/"
uniprot = "${projectDir}/assets/test/mCerEla1.1.buscogenes.dmnd"
taxdump = "/lustre/scratch123/tol/teams/grit/geval_pipeline/btk_databases/taxdump"
busco = "/lustre/scratch123/tol/resources/nextflow/busco_2021_06_reduced/"
blastp = "${projectDir}/assets/test/mMelMel3.1.buscogenes.dmnd"
blastx = "${projectDir}/assets/test/mMelMel3.1.buscoregions.dmnd"
blastn = "${projectDir}/assets/test/nt_mMelMel3.1"
}
8 changes: 5 additions & 3 deletions conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,9 @@ params {
taxon = "Laetiporus sulphureus"

// Databases
taxdump = "/lustre/scratch123/tol/teams/grit/geval_pipeline/btk_databases/taxdump"
busco = "/lustre/scratch123/tol/resources/busco/v5/"
uniprot = "${projectDir}/assets/test_full/gfLaeSulp1.1.buscogenes.dmnd"
taxdump = "/lustre/scratch123/tol/teams/grit/geval_pipeline/btk_databases/taxdump"
busco = "/lustre/scratch123/tol/resources/busco/v5/"
blastp = "${projectDir}/assets/test_full/gfLaeSulp1.1.buscogenes.dmnd"
blastx = "${projectDir}/assets/test_full/gfLaeSulp1.1.buscoregions.dmnd"
blastn = "${projectDir}/assets/test_full/nt_gfLaeSulp1.1"
}
15 changes: 15 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@
"https://github.com/nf-core/modules.git": {
"modules": {
"nf-core": {
"blast/blastn": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"installed_by": ["modules"]
},
"busco": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
Expand All @@ -21,6 +26,11 @@
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"installed_by": ["modules"]
},
"diamond/blastx": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"installed_by": ["modules"]
},
"fastawindows": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
Expand Down Expand Up @@ -50,6 +60,11 @@
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"installed_by": ["modules"]
},
"seqtk/subseq": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"installed_by": ["modules"]
}
}
},
Expand Down
40 changes: 40 additions & 0 deletions modules/local/blastn.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
process BLASTN {
tag "$meta.id"
label 'process_medium'

conda "bioconda::blast=2.13.0"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/blast:2.13.0--hf3cf87c_0' :
'biocontainers/blast:2.13.0--hf3cf87c_0' }"

input:
tuple val(meta), path(fasta)
path db
val taxid

output:
tuple val(meta), file('*.blastn.txt'), emit: txt
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def exclude_taxon = taxid ? "-negative_taxids ${taxid}" : ''
"""
DB=`find -L ./ -name "*.ndb" | sed 's/\\.ndb\$//'`
blastn \\
-num_threads $task.cpus \\
-db \$DB \\
-query $fasta \\
$exclude_taxon \\
$args \\
-out ${prefix}.blastn.txt
cat <<-END_VERSIONS > versions.yml
"${task.process}":
blast: \$(blastn -version 2>&1 | sed 's/^.*blastn: //; s/ .*\$//')
END_VERSIONS
"""
}
14 changes: 10 additions & 4 deletions modules/local/blobtoolkit/blobdir.nf
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,9 @@ process BLOBTOOLKIT_BLOBDIR {
tuple val(meta), path(window, stageAs: 'windowstats/*')
tuple val(meta1), path(busco)
tuple val(meta2), path(blastp)
tuple val(meta3), path(yaml)
tuple val(meta3), path(blastx)
tuple val(meta4), path(blastn)
tuple val(meta5), path(yaml)
path(taxdump)

output:
Expand All @@ -24,15 +26,19 @@ process BLOBTOOLKIT_BLOBDIR {
script:
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}"
def hits = blastp ? "--hits ${blastp}" : ""
def hits_blastp = blastp ? "--hits ${blastp}" : ""
def hits_blastx = blastx ? "--hits ${blastx}" : ""
def hits_blastn = blastn ? "--hits ${blastn}" : ""
"""
blobtools replace \\
--bedtsvdir windowstats \\
--meta ${yaml} \\
--taxdump ${taxdump} \\
--taxrule buscogenes \\
--taxrule bestdistorder=buscoregions \\
--busco ${busco} \\
${hits} \\
${hits_blastp} \\
${hits_blastx} \\
${hits_blastn} \\
--threads ${task.cpus} \\
$args \\
${prefix}
Expand Down
37 changes: 37 additions & 0 deletions modules/local/blobtoolkit/chunk.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
process BLOBTOOLKIT_CHUNK {
tag "$meta.id"
label 'process_single'

if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
exit 1, "BLOBTOOLKIT_CHUNK module does not support Conda. Please use Docker / Singularity / Podman instead."
}
container "genomehubs/blobtoolkit:4.1.5"

input:
tuple val(meta) , path(fasta)
tuple val(meta2), path(busco_table)

output:
tuple val(meta), path("*.chunks.fasta"), emit: chunks
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def busco = busco_table ? "--busco ${busco_table}" : "--busco None"
"""
btk pipeline chunk-fasta \\
--in ${fasta} \\
${busco} \\
--out ${prefix}.chunks.fasta \\
$args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
blobtoolkit: \$(btk --version | cut -d' ' -f2 | sed 's/v//')
END_VERSIONS
"""
}
34 changes: 34 additions & 0 deletions modules/local/blobtoolkit/unchunk.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
process BLOBTOOLKIT_UNCHUNK {
tag "$meta.id"
label 'process_single'

if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
exit 1, "BLOBTOOLKIT_UNCHUNK module does not support Conda. Please use Docker / Singularity / Podman instead."
}
container "genomehubs/blobtoolkit:4.1.5"

input:
tuple val(meta), path(blast_table)

output:
tuple val(meta), path("*.out"), emit: blast_out
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${blast_table}"
"""
btk pipeline unchunk-blast \\
--in ${blast_table} \\
--out ${prefix}.out \\
$args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
blobtoolkit: \$(btk --version | cut -d' ' -f2 | sed 's/v//')
END_VERSIONS
"""
}
32 changes: 32 additions & 0 deletions modules/local/nohit_list.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
process NOHIT_LIST {
tag "$meta.id"
label 'process_single'

conda "conda-forge::gawk=5.1.0"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/gawk:5.1.0' :
'quay.io/biocontainers/gawk:5.1.0' }"

input:
tuple val(meta), path(blast) //path to blast output table in txt format
tuple val(meta), path(fasta) //path to genome fasta file

output:
tuple val(meta), path ('*.nohit.txt') , emit: nohitlist
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script: // This script is bundled with the pipeline, in sanger-tol/blobtoolkit/bin/
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
"""
nohitlist.sh ${fasta} ${blast} ${prefix} $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
nohit_list: 1.0
END_VERSIONS
"""
}
37 changes: 37 additions & 0 deletions modules/nf-core/blast/blastn/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading