Skip to content

Commit

Permalink
Updated docs
Browse files Browse the repository at this point in the history
  • Loading branch information
GallVp committed Nov 4, 2024
1 parent 936bb2f commit fb206c4
Show file tree
Hide file tree
Showing 10 changed files with 279 additions and 236 deletions.
5 changes: 4 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v2.2.0dev - [31-Oct-2024]
## v2.2.0dev - [04-Nov-2024]

### `Added`

Expand All @@ -16,6 +16,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7. Added parameter `hic_samtools_ext_args` and set its default value to `-F 3852` [#159](https://github.com/Plant-Food-Research-Open/assemblyqc/issues/159)
8. Added the HiC QC report to the final report so that users don't have to navigate to the results folder [#162](https://github.com/Plant-Food-Research-Open/assemblyqc/issues/162)
9. Added the fastp log to the final report [#163](https://github.com/Plant-Food-Research-Open/assemblyqc/issues/163)
10. Updated the tube map along with the tool list [#166](https://github.com/Plant-Food-Research-Open/assemblyqc/issues/166)
11. Added Orthofinder [#167](https://github.com/Plant-Food-Research-Open/assemblyqc/issues/167)
12. Changed order of tool options in the `nextflow.config` file

### `Fixed`

Expand Down
46 changes: 25 additions & 21 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,15 @@

> Gremme G, Steinbiss S, Kurtz S. 2013. "GenomeTools: A Comprehensive Software Library for Efficient Processing of Structured Genome Annotations," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 3, pp. 645-656, May 2013, doi: <https://doi.org/10.1109/TCBB.2013.68>
- SAMTOOLS, [MIT/Expat](https://github.com/samtools/samtools/blob/develop/LICENSE)
- samtools, [MIT/Expat](https://github.com/samtools/samtools/blob/develop/LICENSE)

> Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H. 2021. Twelve years of SAMtools and BCFtools, GigaScience, Volume 10, Issue 2, February 2021, giab008, <https://doi.org/10.1093/gigascience/giab008>
- NCBI/FCS, [License](https://github.com/ncbi/fcs/blob/main/LICENSE.txt)
- NCBI FCS, [License](https://github.com/ncbi/fcs/blob/main/LICENSE.txt)

> Astashyn A, Tvedte ES, Sweeney D, Sapojnikov V, Bouk N, Joukov V, Mozes E, Strope PK, Sylla PM, Wagner L, Bidwell SL, Clark K, Davis EW, Smith-White B, Hlavina W, Pruitt KD, Schneider VA, Murphy TD. 2023. Rapid and sensitive detection of genome contamination at scale with FCS-GX. bioRxiv 2023.06.02.543519; doi: <https://doi.org/10.1101/2023.06.02.543519>
- KRONA, [License](https://github.com/marbl/Krona/blob/master/KronaTools/LICENSE.txt)
- Krona, [License](https://github.com/marbl/Krona/blob/master/KronaTools/LICENSE.txt)

> Ondov BD, Bergman NH, Phillippy AM. 2011. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011 Sep 30;12:385. doi: <https://doi.org/10.1186/1471-2105-12-385>
Expand All @@ -36,23 +36,23 @@
>
> Forked from: <https://github.com/ucdavis-bioinformatics/assemblathon2-analysis>
- GFASTATS, [MIT](https://github.com/vgl-hub/gfastats/blob/main/LICENSE)
- gfastats, [MIT](https://github.com/vgl-hub/gfastats/blob/main/LICENSE)

> Giulio Formenti, Linelle Abueg, Angelo Brajuka, Nadolina Brajuka, Cristóbal Gallardo-Alba, Alice Giani, Olivier Fedrigo, Erich D Jarvis, Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs, Bioinformatics, Volume 38, Issue 17, September 2022, Pages 4214–4216, <https://doi.org/10.1093/bioinformatics/btac460>
- BUSCO, [MIT](https://gitlab.com/ezlab/busco/-/blob/master/LICENSE)

> Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. 2021. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes, Molecular Biology and Evolution, Volume 38, Issue 10, October 2021, Pages 4647–4654, <https://doi.org/10.1093/molbev/msab199>
- GFFREAD, [MIT](https://github.com/gpertea/gffread/blob/master/LICENSE)
- GffRead, [MIT](https://github.com/gpertea/gffread/blob/master/LICENSE)

> Pertea G, Pertea M. GFF Utilities: GffRead and GffCompare. F1000Res. 2020 Apr 28;9:ISCB Comm J-304. doi: <http://doi.org/10.12688/f1000research.23297.2>. PMID: 32489650; PMCID: PMC7222033.
- TIDK, [MIT](https://github.com/tolkit/telomeric-identifier/blob/main/LICENSE)
- tidk, [MIT](https://github.com/tolkit/telomeric-identifier/blob/main/LICENSE)

> <https://github.com/tolkit/telomeric-identifier>
- SEQKIT, [MIT](https://github.com/shenwei356/seqkit/blob/master/LICENSE)
- SeqKit, [MIT](https://github.com/shenwei356/seqkit/blob/master/LICENSE)

> Shen W, Le S, Li Y, Hu F. 2016. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLoS ONE 11(10): e0163962. <https://doi.org/10.1371/journal.pone.0163962>
Expand All @@ -72,70 +72,74 @@

> Shujun O, Ning J 2018. LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons, Plant Physiology, 176, 2 (2018). <https://doi.org/10.1104/pp.17.01310>
- KRAKEN2, [MIT](https://github.com/DerrickWood/kraken2/blob/master/LICENSE)
- Kraken 2, [MIT](https://github.com/DerrickWood/kraken2/blob/master/LICENSE)

> Wood DE, Salzberg SL, Wood DE, Lu J, Langmead B. 2019. Improved metagenomic analysis with Kraken 2. Genome Biol 20, 257 (2019). <https://doi.org/10.1186/s13059-019-1891-0>
- JUICEBOX.JS, [MIT](https://github.com/igvteam/juicebox.js/blob/master/LICENSE)
- juicebox.js, [MIT](https://github.com/igvteam/juicebox.js/blob/master/LICENSE)

> Robinson JT, Turner D, Durand NC, Thorvaldsdóttir H, Mesirov JP, Aiden EL. 2018. Juicebox.js Provides a Cloud-Based Visualization System for Hi-C Data. Cell Syst. 2018 Feb 28;6(2):256-258.e1. doi: <https://doi.org/10.1016/j.cels.2018.01.001>. Epub 2018 Feb 7. PMID: 29428417; PMCID: PMC6047755.
- FASTP, [MIT](https://github.com/OpenGene/fastp/blob/master/LICENSE)
- fastp, [MIT](https://github.com/OpenGene/fastp/blob/master/LICENSE)

> Chen S, Zhou Y, Chen Y, Gu J. 2018. fastp: an ultra-fast all-in-one FASTQ preprocessor, Bioinformatics, Volume 34, Issue 17, 01 September 2018, Pages i884–i890, <https://doi.org/10.1093/bioinformatics/bty560>
- FASTQC, [GPL v3](https://github.com/s-andrews/FastQC/blob/master/LICENSE.txt)
- FastQC, [GPL v3](https://github.com/s-andrews/FastQC/blob/master/LICENSE.txt)

> <https://github.com/s-andrews/FastQC>
- run-assembly-visualizer.sh, [MIT](https://github.com/aidenlab/3d-dna/blob/master/LICENSE)

> Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, Durand NC, Shamim MS, Machol I, Lander, Aiden AP, Aiden EL 2017. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds.Science356, 92-95(2017). doi: <https://doi.org/10.1126/science.aal3327>. Available at: <https://github.com/aidenlab/3d-dna/commit/63029aa3bc5ba9bbdad9dd9771ace583cc95e273>
- HIC_QC, [AGPL v3](https://github.com/phasegenomics/hic_qc/blob/master/LICENSE)
- hic_qc, [AGPL v3](https://github.com/phasegenomics/hic_qc/blob/master/LICENSE)

> <https://github.com/phasegenomics/hic_qc/commit/6881c3390fd4afb85009a52918b4d068100c58b4>
- JUICEBOX_SCRIPTS, [AGPL v3](https://github.com/phasegenomics/juicebox_scripts/blob/master/LICENSE)

> <https://github.com/phasegenomics/juicebox_scripts/commit/a7ae9915401eb677b8058b0118011ce440999bc0>
- BWA, [GPL v3](https://github.com/lh3/bwa/blob/master/COPYING)
- bwa-mem, [GPL v3](https://github.com/lh3/bwa/blob/master/COPYING)

> Li H. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. <https://doi.org/10.48550/arXiv.1303.3997>
- MATLOCK, [AGPL v3](https://github.com/phasegenomics/matlock/blob/master/LICENSE)
- Matlock, [AGPL v3](https://github.com/phasegenomics/matlock/blob/master/LICENSE)

> <https://github.com/phasegenomics/matlock>; <https://quay.io/biocontainers/matlock:20181227--h4b03ef3_3>
- SAMBLASTER, [MIT](https://github.com/GregoryFaust/samblaster/blob/master/LICENSE.txt)
- samblaster, [MIT](https://github.com/GregoryFaust/samblaster/blob/master/LICENSE.txt)

> Faust GG, Hall IM. 2014. SAMBLASTER: fast duplicate marking and structural variant read extraction, Bioinformatics, Volume 30, Issue 17, September 2014, Pages 2503–2505, <https://doi.org/10.1093/bioinformatics/btu314>
- CIRCOS, [GPL v3](https://www.gnu.org/licenses/gpl-3.0.txt)
- Circos, [GPL v3](https://www.gnu.org/licenses/gpl-3.0.txt)

> Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R. Horsman D, ... Marra MA. 2009. Circos: an information aesthetic for comparative genomics. Genome research, 19(9), 1639-1645. <https://doi.org/10.1101/gr.092759.109>
- MUMMER, [Artistic 2.0](https://github.com/mummer4/mummer/blob/master/LICENSE.md)
- MUMmer, [Artistic 2.0](https://github.com/mummer4/mummer/blob/master/LICENSE.md)

> Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. 2018. MUMmer4: A fast and versatile genome alignment system. PLoS Comput Biol. 2018 Jan 26;14(1):e1005944. doi: <https://doi.org/10.1371/journal.pcbi.1005944>. PMID: 29373581; PMCID: PMC5802927.
- PLOTSR, [MIT](https://github.com/schneebergerlab/plotsr/blob/master/LICENSE)
- Plotsr, [MIT](https://github.com/schneebergerlab/plotsr/blob/master/LICENSE)

> Goel M, Schneeberger K. 2022. plotsr: visualizing structural similarities and rearrangements between multiple genomes. Bioinformatics. 2022 May 13;38(10):2922-2926. doi: <https://doi.org/10.1093/bioinformatics/btac196>. PMID: 35561173; PMCID: PMC9113368.
- SYRI, [MIT](https://github.com/schneebergerlab/syri/blob/master/LICENSE)
- Syri, [MIT](https://github.com/schneebergerlab/syri/blob/master/LICENSE)

> Goel M, Sun H, Jiao WB, Schneeberger K. 2019. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 2019 Dec 16;20(1):277. doi: <https://doi.org/10.1186/s13059-019-1911-0>. PMID: 31842948; PMCID: PMC6913012.
- MINIMAP2, [MIT](https://github.com/lh3/minimap2/blob/master/LICENSE.txt)
- Minimap2, [MIT](https://github.com/lh3/minimap2/blob/master/LICENSE.txt)

> Li H. 2021. New strategies to improve minimap2 alignment accuracy, Bioinformatics, Volume 37, Issue 23, December 2021, Pages 4572–4574, doi: <https://doi.org/10.1093/bioinformatics/btab705>
- MERQURY, [United States Government Work](https://github.com/marbl/merqury?tab=License-1-ov-file#readme)
- Merqury, [United States Government Work](https://github.com/marbl/merqury?tab=License-1-ov-file#readme)

> Rhie, A., Walenz, B.P., Koren, S. et al. 2020. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol 21, 245. doi: <https://doi.org/10.1186/s13059-020-02134-9>
- OrthoFinder, [GPL v3](https://github.com/davidemms/OrthoFinder/blob/master/License.md)

> Emms, D.M., Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20, 238 (2019). doi: <a href="https://doi.org/10.1186/s13059-019-1832-y">10.1186/s13059-019-1832-y</a>
## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)
Expand Down
53 changes: 35 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,26 +12,43 @@

## Introduction

**plant-food-research-open/assemblyqc** is a [Nextflow](https://www.nextflow.io/docs/latest/index.html) pipeline which evaluates assembly quality with multiple QC tools and presents the results in a unified html report. The tools are shown in the [Pipeline Flowchart](#pipeline-flowchart) and their references are listed in [CITATIONS.md](./CITATIONS.md).
**plant-food-research-open/assemblyqc** is a [Nextflow](https://www.nextflow.io/docs/latest/index.html) pipeline which evaluates assembly quality with multiple QC tools and presents the results in a unified html report. The tools are shown in the [Pipeline Flowchart](#pipeline-flowchart) and their references are listed in [CITATIONS.md](./CITATIONS.md). The pipeline includes skip flags to disable execution of many of it tools.

## Pipeline Flowchart

<p align="center"><img src="docs/images/assemblyqc.png"></p>

- [FASTA VALIDATOR](https://github.com/linsalrob/fasta_validator) + [SEQKIT RMDUP](https://github.com/shenwei356/seqkit): FASTA validation
- [GENOMETOOLS GT GFF3VALIDATOR](https://genometools.org/tools/gt_gff3validator.html): GFF3 validation
- [ASSEMBLATHON STATS](https://github.com/PlantandFoodResearch/assemblathon2-analysis/blob/a93cba25d847434f7eadc04e63b58c567c46a56d/assemblathon_stats.pl), [GFASTATS](https://github.com/vgl-hub/gfastats): Assembly statistics
- [GENOMETOOLS GT STAT](https://genometools.org/tools/gt_stat.html): Annotation statistics
- [NCBI FCS ADAPTOR](https://github.com/ncbi/fcs): Adaptor contamination pass/fail
- [NCBI FCS GX](https://github.com/ncbi/fcs): Foreign organism contamination pass/fail
- [BUSCO](https://gitlab.com/ezlab/busco): Gene-space completeness estimation
- [TIDK](https://github.com/tolkit/telomeric-identifier): Telomere repeat identification
- [LAI](https://github.com/oushujun/LTR_retriever/blob/master/LAI): Continuity of repetitive sequences
- [KRAKEN2](https://github.com/DerrickWood/kraken2): Taxonomy classification
- [HIC CONTACT MAP](https://github.com/igvteam/juicebox.js): Alignment and visualisation of HiC data
- [MUMMER](https://github.com/mummer4/mummer)[CIRCOS](http://circos.ca/documentation/) + [DOTPLOT](https://plotly.com) & [MINIMAP2](https://github.com/lh3/minimap2)[PLOTSR](https://github.com/schneebergerlab/plotsr): Synteny analysis
- [MERQURY](https://github.com/marbl/merqury): K-mer completeness, consensus quality and phasing assessment
- [ORTHOFINDER](https://github.com/davidemms/OrthoFinder): Phylogenetic orthology inference for comparative genomics
- `Assembly`
- [fasta_validator](https://github.com/linsalrob/fasta_validator) + [SeqKit rmdup](https://github.com/shenwei356/seqkit): FASTA validation
- [assemblathon_stats](https://github.com/PlantandFoodResearch/assemblathon2-analysis/blob/a93cba25d847434f7eadc04e63b58c567c46a56d/assemblathon_stats.pl), [gfastats](https://github.com/vgl-hub/gfastats): Assembly statistics
- [NCBI FCS-adaptor](https://github.com/ncbi/fcs): Adaptor contamination pass/fail
- [NCBI FCS-GX](https://github.com/ncbi/fcs): Foreign organism contamination pass/fail
- [tidk](https://github.com/tolkit/telomeric-identifier): Telomere repeat identification
- [BUSCO](https://gitlab.com/ezlab/busco): Gene-space completeness estimation
- [LAI](https://github.com/oushujun/LTR_retriever/blob/master/LAI): Continuity of repetitive sequences
- [Kraken 2](https://github.com/DerrickWood/kraken2), [Krona](https://github.com/marbl/Krona): Taxonomy classification
- `Alignment and visualisation of HiC data`
- [sra-tools](https://github.com/ncbi/sra-tools): HiC data download from SRA or use of local FASTQ files
- [fastp](https://github.com/OpenGene/fastp), [FastQC](https://github.com/s-andrews/FastQC): QC and trimming
- [SeqKit sort](https://github.com/shenwei356/seqkit): Alphanumeric sorting of FASTA by sequence ID
- [bwa-mem](https://github.com/lh3/bwa): HiC read alignment
- [samblaster](https://github.com/GregoryFaust/samblaster): Duplicate marking
- [hic_qc](https://github.com/phasegenomics/hic_qc): HiC read and alignment statistics
- [Matlock](https://github.com/phasegenomics/matlock): BAM to juicer conversion
- [3d-dna/visualize](https://github.com/aidenlab/3d-dna/tree/master/visualize): `.hic` file creation
- [juicebox.js](https://github.com/igvteam/juicebox.js): HiC contact map visualisation
- `K-mer completeness, consensus quality and phasing assessment`
- [sra-tools](https://github.com/ncbi/sra-tools): Assembly, maternal and paternal data download from SRA or use of local FASTQ files
- [Merqury hapmers](https://github.com/marbl/merqury/blob/master/trio/hapmers.sh): Hapmer generation if parental data is available
- [Merqury](https://github.com/marbl/merqury): Completeness, consensus quality and phasing assessment
- `Synteny analysis`
- [MUMmer](https://github.com/mummer4/mummer)[Circos](http://circos.ca/documentation/) + [dotplot](https://plotly.com): One-to-all and all-to-all synteny analysis at the contig level
- [Minimap2](https://github.com/lh3/minimap2)[Syri](https://github.com/schneebergerlab/syri)/[Plotsr](https://github.com/schneebergerlab/plotsr): One-one to chromosome synteny analysis at the chromosome level
- `Annotation`
- [GenomeTools gt gff3validator](https://genometools.org/tools/gt_gff3validator.html) + [FASTA/GFF correspondence](subworkflows/gallvp/gff3_gt_gff3_gff3validator_stat/main.nf): GFF3 validation
- [GenomeTools gt stat](https://genometools.org/tools/gt_stat.html): Annotation statistics
- [GffRead](https://github.com/gpertea/gffread), [BUSCO](https://gitlab.com/ezlab/busco): Gene-space completeness estimation in annotation proteins
- [OrthoFinder](https://github.com/davidemms/OrthoFinder): Phylogenetic orthology inference for comparative genomics

## Usage

Expand All @@ -50,9 +67,9 @@ Now, you can run the pipeline using:
```bash
nextflow run plant-food-research-open/assemblyqc \
-revision <version> \
-profile <docker/singularity/.../institute> \
--input assemblysheet.csv \
--outdir <OUTDIR>
-profile <docker/singularity/.../institute> \
--input assemblysheet.csv \
--outdir <OUTDIR>
```

> [!WARNING]
Expand Down
Binary file modified docs/images/assemblyqc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/orthofinder.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit fb206c4

Please sign in to comment.