diff --git a/bin/report_modules/templates/kraken2/kraken2.html b/bin/report_modules/templates/kraken2/kraken2.html
index 522df1d8..250eebf3 100644
--- a/bin/report_modules/templates/kraken2/kraken2.html
+++ b/bin/report_modules/templates/kraken2/kraken2.html
@@ -1,8 +1,7 @@
- Kraken2 assigns taxonomic labels to sequencing reads for metagenomics projects. It can also be used to
- detect contamination in genome assemblies.
+ Kraken2 assigns taxonomic labels to sequencing reads for metagenomics projects.
Reference:
diff --git a/docs/images/kraken2.jpg b/docs/images/kraken2.jpg
new file mode 100644
index 00000000..5cba553c
Binary files /dev/null and b/docs/images/kraken2.jpg differ
diff --git a/docs/output.md b/docs/output.md
index 2779ad35..c838c3a1 100644
--- a/docs/output.md
+++ b/docs/output.md
@@ -2,7 +2,7 @@
## Introduction
-This document describes the output produced by the pipeline. Most of the plots are taken from the AssemblyQC report, which summarises results at the end of the pipeline.
+This document describes the output produced by the pipeline. Most of the plots are taken from the AssemblyQC report which summarises results at the end of the pipeline.
The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
@@ -79,8 +79,8 @@ GenomeTools `gt stat` tool calculates a basic set of statistics about features c
- `*.taxonomy.rpt`: [Taxonomy report](https://github.com/ncbi/fcs/wiki/FCS-GX-taxonomy-report#taxonomy-report-output-).
- `*.fcs_gx_report.txt`: A final report of [recommended actions](https://github.com/ncbi/fcs/wiki/FCS-GX#outputs).
- `*.inter.tax.rpt.tsv`: [Select columns](../modules/local/ncbi_fcs_gx_krona_plot.nf) from `*.taxonomy.rpt` used for generation of a Krona taxonomy plot.
- - `*.fcs.gx.krona.cut`: Krona taxonomy file [created](../modules/local/ncbi_fcs_gx_krona_plot.nf) from `*.inter.tax.rpt.tsv`.
- - `*.fcs.gx.krona.html`: Krona taxonomy plot.
+ - `*.fcs.gx.krona.cut`: Taxonomy file for Krona plot [created](../modules/local/ncbi_fcs_gx_krona_plot.nf) from `*.inter.tax.rpt.tsv`.
+ - `*.fcs.gx.krona.html`: Interactive Krona taxonomy plot.
@@ -139,6 +139,21 @@ LTR Assembly Index (LAI) is a reference-free genome metric that [evaluates assem
### Kraken2
+
+Output files
+
+- `kraken2/`
+ - `*.kraken2.report`: [Kraken2 report](https://github.com/DerrickWood/kraken2/wiki/Manual#output-formats).
+ - `*.kraken2.cut`: [Kraken2 output](https://github.com/DerrickWood/kraken2/wiki/Manual#output-formats).
+ - `*.kraken2.krona.cut`: [Select columns](../modules/local/kraken2_krona_plot.nf) from `*.kraken2.cut` used for generation of a Krona taxonomy plot.
+ - `*.kraken2.krona.html`: Interactive Krona taxonomy plot.
+
+
+
+Kraken2 [assigns taxonomic labels](https://ccb.jhu.edu/software/kraken2/) to sequencing reads for metagenomics projects.
+
+
AssemblyQC - Interactive Krona plot from Kraken2 taxonomy
+
### HiC contact map
diff --git a/docs/usage.md b/docs/usage.md
index 18a454df..1f2bd37c 100644
--- a/docs/usage.md
+++ b/docs/usage.md
@@ -8,7 +8,7 @@ You will need to create an assemblysheet with information about the assemblies y
- `fasta:` FASTA file
- `gff3 [Optional]:` GFF3 annotation file if available
- `monoploid_ids [Optional]:` A txt file listing the IDs used to calculate LAI in monoploid mode if necessary
-- `synteny_labels [Optional]:` A two column tsv file listing fasta sequence ids (first column) and labels for the synteny plots (second column) when performing synteny analysis
+- `synteny_labels [Optional]:` A two column tsv file listing fasta sequence ids (first column) and their labels for the synteny plots (second column) when performing synteny analysis
## External databases
@@ -40,7 +40,7 @@ BUSCO lineage databases are downloaded and updated by the BUSCO tool itself. A p
### Assemblathon stats
-`assemblathon_stats_n_limit` is the number of 'N's for the unknown gap size. This number is used to split the scaffolds into contigs to compute contig-related stats. NCBI's recommendation for unknown gap size is 100 .
+`assemblathon_stats_n_limit` is the number of 'N's for the unknown gap size. This number is used to split the scaffolds into contigs to compute contig-related stats. NCBI's recommendation for unknown gap size is 100 .
### NCBI FCS adaptor
@@ -64,8 +64,8 @@ BUSCO lineage databases are downloaded and updated by the BUSCO tool itself. A p
### HiC
- `hic`: Path to reads provided as a SRA ID or as a path to paired reads with pattern '\*{1,2}.(fastq|fq).gz'
-- `hic_skip_fastp`: Skips fastp trimming
-- `hic_skip_fastqc`: Skips QC by fastqc
+- `hic_skip_fastp`: Skip fastp trimming
+- `hic_skip_fastqc`: Skip QC by fastqc
- `hic_fastp_ext_args`: Additional arguments for fastp (default: '--qualified_quality_phred 20 --length_required 50')
### Synteny analysis
@@ -79,7 +79,7 @@ BUSCO lineage databases are downloaded and updated by the BUSCO tool itself. A p
- `synteny_xref_assemblies`: Similar to `--input`, this parameter also provides a CSV sheet listing external reference assemblies which are included in the synteny analysis but are not analysed by other QC tools. See the [example xrefsheet](../assets/xrefsheet.csv) included with the pipeline. Its fields are:
- `tag:` A unique tag which represents the reference assembly in the final report
- `fasta:` FASTA file
- - `synteny_labels:` A two column tsv file listing fasta sequence ids (first column) and labels for the synteny plots (second column)
+ - `synteny_labels:` A two column tsv file listing fasta sequence ids (first column) and their labels for the synteny plots (second column)
## Running the pipeline
@@ -116,9 +116,8 @@ nextflow run plant-food-research-open/assemblyqc -profile docker -params-file pa
with `params.yaml` containing:
```yaml
-input: './samplesheet.csv'
-outdir: './results/'
-<...>
+input: "./assemblysheet.csv"
+outdir: "./results/"
```
You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-co.re/launch).