Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added gfastats #145

Merged
merged 3 commits into from
Oct 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Added`

1. Added Gfastats [#126](https://github.com/Plant-Food-Research-Open/assemblyqc/issues/126)

### `Fixed`

### `Dependencies`
Expand All @@ -16,8 +18,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Deprecated`

1. Reduced the GenomeTools stats figures to 300 DPI
2. Now `synteny_mummer_min_bundle_size` is set to `1000000` by default
1. Reduced the GenomeTools stats figures to 300 DPI [#142](https://github.com/Plant-Food-Research-Open/assemblyqc/issues/142)
2. Now `synteny_mummer_min_bundle_size` is set to `1000000` by default [#142](https://github.com/Plant-Food-Research-Open/assemblyqc/issues/142)

## v2.1.1 - [20-Sep-2024]

Expand Down
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,10 @@
>
> Forked from: <https://github.com/ucdavis-bioinformatics/assemblathon2-analysis>

- GFASTATS, [MIT](https://github.com/vgl-hub/gfastats/blob/main/LICENSE)

> Giulio Formenti, Linelle Abueg, Angelo Brajuka, Nadolina Brajuka, Cristóbal Gallardo-Alba, Alice Giani, Olivier Fedrigo, Erich D Jarvis, Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs, Bioinformatics, Volume 38, Issue 17, September 2022, Pages 4214–4216, <https://doi.org/10.1093/bioinformatics/btac460>

- BUSCO, [MIT](https://gitlab.com/ezlab/busco/-/blob/master/LICENSE)

> Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. 2021. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes, Molecular Biology and Evolution, Volume 38, Issue 10, October 2021, Pages 4647–4654, <https://doi.org/10.1093/molbev/msab199>
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ flowchart LR

VALIDATE_FORMAT ==> GFF_STATS[<span style="white-space: nowrap;">GENOMETOOLS GT STAT</span>]

Run ==> ASS_STATS[<span style="white-space: nowrap;">ASSEMBLATHON STATS</span>]
Run ==> ASS_STATS[<span style="white-space: nowrap;">STATS</span>]
Run ==> BUSCO
Run ==> TIDK
Run ==> LAI
Expand Down Expand Up @@ -72,7 +72,7 @@ flowchart LR

- [FASTA VALIDATOR](https://github.com/linsalrob/fasta_validator) + [SEQKIT RMDUP](https://github.com/shenwei356/seqkit): FASTA validation
- [GENOMETOOLS GT GFF3VALIDATOR](https://genometools.org/tools/gt_gff3validator.html): GFF3 validation
- [ASSEMBLATHON STATS](https://github.com/PlantandFoodResearch/assemblathon2-analysis/blob/a93cba25d847434f7eadc04e63b58c567c46a56d/assemblathon_stats.pl): Assembly statistics
- [ASSEMBLATHON STATS](https://github.com/PlantandFoodResearch/assemblathon2-analysis/blob/a93cba25d847434f7eadc04e63b58c567c46a56d/assemblathon_stats.pl), [GFASTATS](https://github.com/vgl-hub/gfastats): Assembly statistics
- [GENOMETOOLS GT STAT](https://genometools.org/tools/gt_stat.html): Annotation statistics
- [NCBI FCS ADAPTOR](https://github.com/ncbi/fcs): Adaptor contamination pass/fail
- [NCBI FCS GX](https://github.com/ncbi/fcs): Foreign organism contamination pass/fail
Expand Down
4 changes: 4 additions & 0 deletions bin/assemblyqc.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@
from report_modules.parsers.assemblathon_stats_parser import (
parse_assemblathon_stats_folder,
)
from report_modules.parsers.gfastats_parser import (
parse_gfastats_folder,
)
from report_modules.parsers.genometools_gt_stat_parser import (
parse_genometools_gt_stat_folder,
)
Expand All @@ -41,6 +44,7 @@
data_from_tools = {**data_from_tools, **parse_ncbi_fcs_adaptor_folder()}
data_from_tools = {**data_from_tools, **parse_ncbi_fcs_gx_folder()}
data_from_tools = {**data_from_tools, **parse_assemblathon_stats_folder()}
data_from_tools = {**data_from_tools, **parse_gfastats_folder()}
data_from_tools = {**data_from_tools, **parse_genometools_gt_stat_folder()}
data_from_tools = {**data_from_tools, **parse_busco_folder()}
data_from_tools = {
Expand Down
46 changes: 46 additions & 0 deletions bin/report_modules/parsers/gfastats_parser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
import os
from pathlib import Path
import pandas as pd
from tabulate import tabulate
import re

from report_modules.parsers.parsing_commons import sort_list_of_results


def parse_gfastats_folder(folder_name="gfastats"):
dir = os.getcwdb().decode()
reports_folder_path = Path(f"{dir}/{folder_name}")

if not os.path.exists(reports_folder_path):
return {}

list_of_report_files = reports_folder_path.glob("*.assembly_summary")

data = {"GFASTATS": []}

for report_path in list_of_report_files:
report_table = pd.read_csv(report_path, sep="\t")
report_table.columns = ['Stat', 'Value']

file_tokens = re.findall(
r"([\w]+).assembly_summary",
os.path.basename(str(report_path)),
)[0]

data["GFASTATS"].append(
{
"hap": file_tokens,
"report_table": report_table.to_dict("records"),
"report_table_html": tabulate(
report_table,
headers=["Stat", "Value"],
tablefmt="html",
numalign="left",
showindex=False,
),
}
)

return {
"GFASTATS": sort_list_of_results(data["GFASTATS"], "hap")
}
8 changes: 8 additions & 0 deletions bin/report_modules/templates/base.html
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@
<button class="tablinks active" onclick="openTool(event, 'ASSEMBLATHON_STATS')">ASSEMBLATHON STATS</button>
{% endif %}

{% if 'GFASTATS' in all_stats_dicts %}
<button class="tablinks active" onclick="openTool(event, 'GFASTATS')">GFASTATS</button>
{% endif %}

{% if 'GENOMETOOLS_GT_STAT' in all_stats_dicts %}
<button class="tablinks active" onclick="openTool(event, 'GENOMETOOLS_GT_STAT')">GENOMETOOLS GT STAT</button>
{% endif %}
Expand Down Expand Up @@ -100,6 +104,10 @@
{% include 'assemblathon_stats/assemblathon_stats.html' %}
{% endif %}

{% if 'GFASTATS' in all_stats_dicts %}
{% include 'gfastats/gfastats.html' %}
{% endif %}

{% if 'GENOMETOOLS_GT_STAT' in all_stats_dicts %}
{% include 'genometools_gt_stat/genometools_gt_stat.html' %}
{% endif %}
Expand Down
10 changes: 10 additions & 0 deletions bin/report_modules/templates/gfastats/dropdown.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
<div class="dropdown">
<div class="dropdown_content">
<select id="selector_GFASTATS" onchange="showContent('GFASTATS')">
{% set str_hap = 'hap' %} {% for item in range(all_stats_dicts["GFASTATS"]|length) %}
<option value="tabcontent_GFASTATS_{{all_stats_dicts['GFASTATS'][item]['hap']}}">
{{ all_stats_dicts['GFASTATS'][item][str_hap] }} {% endfor %}
</option>
</select>
</div>
</div>
16 changes: 16 additions & 0 deletions bin/report_modules/templates/gfastats/gfastats.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
<div id="GFASTATS" class="tabcontent" style="display: none">
<div class="section-para-wrapper">
<p class="section-para">A fast and exhaustive tool for summary statistics.</p>
<p class="section-para"><b>Reference:</b></p>
<p class="section-para">
Giulio Formenti, Linelle Abueg, Angelo Brajuka, Nadolina Brajuka, Cristóbal Gallardo-Alba, Alice Giani, Olivier
Fedrigo, Erich D Jarvis, Gfastats: conversion, evaluation and manipulation of genome sequences using assembly
graphs, Bioinformatics, Volume 38, Issue 17, September 2022, Pages 4214–4216,
<a href="https://doi.org/10.1093/bioinformatics/btac460" target="_blank">10.1093/bioinformatics/btac460</a>
</p>
<p class="section-para">
<b>Version: {{ all_stats_dicts['VERSIONS']['GFASTATS']['gfastats'] }}</b>
</p>
</div>
{% include 'gfastats/dropdown.html' %} {% include 'gfastats/report_contents.html' %}
</div>
17 changes: 17 additions & 0 deletions bin/report_modules/templates/gfastats/report_contents.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{% set vars = {'is_first': True} %} {% for item in range(all_stats_dicts["GFASTATS"]|length) %} {% set
active_text = 'display: block' if vars.is_first else 'display: none' %}
<div
id="tabcontent_GFASTATS_{{ all_stats_dicts['GFASTATS'][item]['hap'] }}"
class="tabcontent-GFASTATS"
style="{{ active_text }}"
>
<div class="results-section">
<div class="section-heading-wrapper">
<div class="section-heading">{{ all_stats_dicts['GFASTATS'][item]['hap'] }}</div>
</div>
</div>
<div class="table-outer">
<div class="table-wrapper">{{ all_stats_dicts['GFASTATS'][item]['report_table_html'] }}</div>
</div>
</div>
{% if vars.update({'is_first': False}) %} {% endif %} {% endfor %}
10 changes: 10 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,16 @@ process {
]
}

withName: GFASTATS {
ext.args = '--stats -t --nstar-report'
publishDir = [
path: { "${params.outdir}/gfastats" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals("versions.yml") ? null : filename },
pattern: '*.assembly_summary'
]
}

withName: FCS_FCSADAPTOR {
ext.args = params.ncbi_fcs_adaptor_empire ? "--${params.ncbi_fcs_adaptor_empire}" : '--prok'

Expand Down
2 changes: 2 additions & 0 deletions conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ params {

input = 'https://raw.githubusercontent.com/plant-food-research-open/assemblyqc/dev/assets/assemblysheetv2.csv'

gfastats_skip = false

ncbi_fcs_adaptor_skip = false
ncbi_fcs_adaptor_empire = 'euk'

Expand Down
17 changes: 15 additions & 2 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d

- [FASTA and GFF3 validation](#fasta-and-gff3-validation)
- [Assemblathon stats](#assemblathon-stats)
- [Genometools gt stat](#genometools-gt-stat)
- [Gfastats](#gfastats)
- [GenomeTools gt stat](#genometools-gt-stat)
- [NCBI FCS adaptor](#ncbi-fcs-adaptor)
- [NCBI FCS GX](#ncbi-fcs-gx)
- [BUSCO](#busco)
Expand Down Expand Up @@ -45,7 +46,19 @@ The pipeline prints a warning in the pipeline log if FASTA or GFF3 validation fa
> [!WARNING]
> Contig-related stats are based on the assumption that `assemblathon_stats_n_limit` is specified correctly. If you are not certain of the value of `assemblathon_stats_n_limit`, please ignore the contig-related stats.

### Genometools gt stat
### Gfastats

<details markdown="1">
<summary>Output files</summary>

- `gfastats/`
- `*.assembly_summary`: Assembly stats in TSV format.

</details>

Gfastats is a fast and exhaustive tool for summary statistics.

### GenomeTools gt stat

<details markdown="1">
<summary>Output files</summary>
Expand Down
1 change: 1 addition & 0 deletions docs/parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ A Nextflow pipeline which evaluates assembly quality with multiple QC tools and
| Parameter | Description | Type | Default | Required | Hidden |
| ---------------------------- | ----------------------------------------------------------------------- | --------- | ------- | -------- | ------ |
| `assemblathon_stats_n_limit` | The number of 'N's for the unknown gap size. NCBI recommendation is 100 | `integer` | 100 | | |
| `gfastats_skip` | Skip Gfastats | `boolean` | True | | |

## NCBI FCS options

Expand Down
5 changes: 5 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,11 @@
"git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48",
"installed_by": ["modules"]
},
"gfastats": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"gunzip": {
"branch": "master",
"git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48",
Expand Down
1 change: 1 addition & 0 deletions modules/local/createreport.nf
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ process CREATEREPORT {
path ncbi_fcs_adaptor_reports , stageAs: 'ncbi_fcs_adaptor_reports/*'
path fcs_gx_reports , stageAs: 'fcs_gx_reports/*'
path assemblathon_stats , stageAs: 'assemblathon_stats/*'
path gfastats , stageAs: 'gfastats/*'
path genometools_gt_stats , stageAs: 'genometools_gt_stat/*'
path busco_outputs , stageAs: 'busco_outputs/*'
path busco_gff_outputs , stageAs: 'busco_gff_outputs/*'
Expand Down
5 changes: 5 additions & 0 deletions modules/nf-core/gfastats/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

66 changes: 66 additions & 0 deletions modules/nf-core/gfastats/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading