Skip to content

Commit

Permalink
Merge pull request #8 from genotoul-bioinfo/dev
Browse files Browse the repository at this point in the history
Improve documentation
  • Loading branch information
JeanMainguy authored Jan 29, 2024
2 parents 3d1957e + bcbdacd commit 12cdd43
Show file tree
Hide file tree
Showing 6 changed files with 172 additions and 8 deletions.
11 changes: 8 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/binette/README.html) [![Anaconda-Server Badge](https://anaconda.org/bioconda/binette/badges/downloads.svg)](https://anaconda.org/bioconda/binette) [![Anaconda-Server Badge](https://anaconda.org/bioconda/binette/badges/license.svg)](https://anaconda.org/bioconda/binette) [![Anaconda-Server Badge](https://anaconda.org/bioconda/binette/badges/version.svg)](https://anaconda.org/bioconda/binette)

[![Test Coverage](https://genotoul-bioinfo.github.io/Binette/coverage-badge.svg)](https://genotoul-bioinfo.github.io/Binette/) [![CI Status](https://github.com/genotoul-bioinfo/Binette/actions/workflows/binette_ci.yml/badge.svg)](https://github.com/genotoul-bioinfo/Binette/actions/workflows) [![Documentation Status](https://readthedocs.org/projects/binette/badge/?version=latest)](https://binette.readthedocs.io/en/latest/?badge=latest)
[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/binette/README.html) [![Anaconda-Server Badge](https://anaconda.org/bioconda/binette/badges/downloads.svg)](https://anaconda.org/bioconda/binette)
[![Anaconda-Server Badge](https://anaconda.org/bioconda/binette/badges/license.svg)](https://anaconda.org/bioconda/binette)
[![Anaconda-Server Badge](https://anaconda.org/bioconda/binette/badges/version.svg)](https://anaconda.org/bioconda/binette)
[![PyPI version](https://badge.fury.io/py/Binette.svg)](https://badge.fury.io/py/Binette)

[![Test Coverage](https://genotoul-bioinfo.github.io/Binette/coverage-badge.svg)](https://genotoul-bioinfo.github.io/Binette/)
[![CI Status](https://github.com/genotoul-bioinfo/Binette/actions/workflows/binette_ci.yml/badge.svg)](https://github.com/genotoul-bioinfo/Binette/actions/workflows)
[![Documentation Status](https://readthedocs.org/projects/binette/badge/?version=latest)](https://binette.readthedocs.io/en/latest/?badge=latest)


# Binette
Expand Down
10 changes: 8 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,14 @@
% contain the root `toctree` directive.


[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/binette/README.html) [![Anaconda-Server Badge](https://anaconda.org/bioconda/binette/badges/downloads.svg)](https://anaconda.org/bioconda/binette) [![Anaconda-Server Badge](https://anaconda.org/bioconda/binette/badges/license.svg)](https://anaconda.org/bioconda/binette) [![Anaconda-Server Badge](https://anaconda.org/bioconda/binette/badges/version.svg)](https://anaconda.org/bioconda/binette)
[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/binette/README.html) [![Anaconda-Server Badge](https://anaconda.org/bioconda/binette/badges/downloads.svg)](https://anaconda.org/bioconda/binette)
[![Anaconda-Server Badge](https://anaconda.org/bioconda/binette/badges/license.svg)](https://anaconda.org/bioconda/binette)
[![Anaconda-Server Badge](https://anaconda.org/bioconda/binette/badges/version.svg)](https://anaconda.org/bioconda/binette)
[![PyPI version](https://badge.fury.io/py/Binette.svg)](https://badge.fury.io/py/Binette)

[![Test Coverage](https://genotoul-bioinfo.github.io/Binette/coverage-badge.svg)](https://genotoul-bioinfo.github.io/Binette/) [![CI Status](https://github.com/genotoul-bioinfo/Binette/actions/workflows/binette_ci.yml/badge.svg)](https://github.com/genotoul-bioinfo/Binette/actions/workflows) [![Documentation Status](https://readthedocs.org/projects/binette/badge/?version=latest)](https://binette.readthedocs.io/en/latest/?badge=latest)
[![Test Coverage](https://genotoul-bioinfo.github.io/Binette/coverage-badge.svg)](https://genotoul-bioinfo.github.io/Binette/)
[![CI Status](https://github.com/genotoul-bioinfo/Binette/actions/workflows/binette_ci.yml/badge.svg)](https://github.com/genotoul-bioinfo/Binette/actions/workflows)
[![Documentation Status](https://readthedocs.org/projects/binette/badge/?version=latest)](https://binette.readthedocs.io/en/latest/?badge=latest)


# Binette
Expand All @@ -32,6 +37,7 @@ Binette is inspired from the metaWRAP bin-refinement tool but it effectively sol
installation
usage
contributing
tests.md
api/api_ref
```

19 changes: 17 additions & 2 deletions docs/installation.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@

# Installation

## With Bioconda
## Installation of Binette

### With Bioconda

Binette can be esailly installed with conda

Expand All @@ -23,7 +25,7 @@ For quicker installation and potential resolution of conflicting dependencies, c
```


## Installing from Source Code within a conda environnement
### From the source code within a conda environnement

A straightforward method to install Binette from the source code is by utilizing a conda environment that includes all the necessary dependencies.

Expand Down Expand Up @@ -58,6 +60,19 @@ binette -h
```


### With PyPI

Binette is available on [PyPI](https://pypi.org/project/Binette/) and can be installed using pip as follows:

```bash
pip install binette[main_deps]
```

Omitting the `[main_deps]` option will result in the installation of Binette without any Python dependencies.

In addition to Python dependencies, Binette requires [Diamond](https://github.com/bbuchfink/diamond) to be installed and executable.


## Downloading the CheckM2 database

Before using Binette, it is necessary to download the CheckM2 database:
Expand Down
102 changes: 102 additions & 0 deletions docs/tests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Tests

Tests have been implemented to ensure the correctness of Binette.


## Unit tests

Unit tests have been implmented in the tests directory using pytest.


To run the test suit you would need to have install Binette from the source code. For that, you can follow installation instructions [here](./installation.md#installing-from-source-code-within-a-conda-environnement).


To install pytest in you environement you can run :

```bash
pip install .[dev]
```

Next, you can simply run the following at the root of the directory:

```bash
pytest
```

To get the percentage of coverage of the test suit can be obtain as follow:

```bash
pytest --cov=binette
```


```{note}
Test coverage is updated by a github workflow in the Action Tab. The test coverage report is then deployed on the github-pages and avaible [here](https://genotoul-bioinfo.github.io/Binette/).
```


## Functional Tests


A functional test has been implemented in the CI github workflow. It performs an execution of binette on a toy dataset consisting of 4 small genomes. The test uses a checkm2 database that has been shrunk to the minimum to make diamond run faster. Finally, the results are compared with the expected results.

The test dataset is stored in this github repository: [Binette TestData](https://github.com/genotoul-bioinfo/Binette_TestData).

You can replicate this test locally by following the next steps:


1. **Install Binette**:

Make sure you have Binette installed on your system. You can refer to the [installation](./installation.md) instructions.


2. **Clone the test dataset repository:

Clone the dataset repository using Git:

```bash

git clone https://github.com/genotoul-bioinfo/Binette_TestData.git

cd Binette_TestData

```

3. **Run Binette**:

Run Binette on the test data with the following command

```bash
binette -b binning_results/* --contigs all_contigs.fna --checkm2_db checkm2_tiny_db/checkm2_tiny_db.dmnd -v -o test_results

```

This should complete in a few seconds.


4. **Compare Results**:

After running Binette, you can compare the generated `final_bins_quality_reports.tsv` with the expected results stored in the `expected_results` folder. Some variation in the completeness, contamination, and score columns is expected due to Checkm2's slight variability.

You can perform the comparison manually by using the head command:

```bash
head expected_results/final_bins_quality_reports.tsv test_results/final_bins_quality_reports.tsv

```

Alternatively, you can use the provided Python script for automated comparison: [compare_results.py](https://github.com/genotoul-bioinfo/Binette_TestData/scripts/compare_results.py) located in the scripts folder.

```bash

python scripts/compare_results.py expected_results/final_bins_quality_reports.tsv test_results/final_bins_quality_reports.tsv

```


```{warning}
The CheckM2 database used for the test dataset is very small and is only valid for the 4 genomes included in the test datasets. It should not be used elsewhere.
```
27 changes: 27 additions & 0 deletions paper/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -180,3 +180,30 @@ @article{metagWGS_inprep

}

@article{gruning2018bioconda,
title={Bioconda: sustainable and comprehensive software distribution for the life sciences},
author={Gr{\"u}ning, Bj{\"o}rn and Dale, Ryan and Sj{\"o}din, Andreas and Chapman, Brad A and Rowe, Jillian and Tomkins-Tinch, Christopher H and Valieris, Renan and K{\"o}ster, Johannes and Bioconda Team},
journal={Nature methods},
volume={15},
number={7},
pages={475--476},
year={2018},
publisher={Nature Publishing Group US New York},
doi = {10.1038/s41592-018-0046-7},
}


@article{Bioconda:2018,
title = {Bioconda: Sustainable and Comprehensive Software Distribution for the Life Sciences},
shorttitle = {Bioconda},
author = {Gr{\"u}ning, Bj{\"o}rn and Dale, Ryan and Sj{\"o}din, Andreas and Chapman, Brad A. and Rowe, Jillian and {Tomkins-Tinch}, Christopher H. and Valieris, Renan and K{\"o}ster, Johannes},
year = {2018},
month = jul,
journal = {Nature Methods},
volume = {15},
number = {7},
pages = {475--476},
publisher = {{Nature Publishing Group}},
issn = {1548-7105},
}

11 changes: 10 additions & 1 deletion paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,18 @@ Binette is a Python reimplementation and enhanced version of the bin refinement
![**Overview of Binette Steps**. **(A) Intermediate Bin Creation Example**: Bins are represented as square shapes, each containing colored lines representing the contigs they contain. Creation of intermediate bins involves the initial bins sharing at least one contig. Set operations are applied to the contigs within the bins to generate these intermediate bins. **(B) Binette Workflow Overview**: Input bins serve as the basis for generating intermediate bins. Each bin undergoes a scoring process utilizing quality metrics provided by CheckM2. Subsequently, the bins are sorted based on their scores, and a selection process is executed to retain non-redundant bins.\label{fig:overview}](./binette_overview.pdf)


Bin completeness and contamination are assessed using CheckM2 [@chklovski2023checkm2]. Bins are scored using the following scoring function: $completeness - weight * contamination$, with the default weight set to 2. These scored bins are then sorted, facilitating the selection of a final new set of non-redundant bins (\autoref{fig:overview}.B). The ability to score bins is based on CheckM2 rather than CheckM1 as in the metaWRAP pipeline. CheckM2 uses a novel approach to evaluate bin quality based on machine learning techniques. This approach improves speed and also provides better results than CheckM1. Binette initiates CheckM2 processing by running its initial steps once for all contigs within the input bins. These initial steps involve gene prediction using Prodigal and alignment against the CheckM2 database using Diamond [@buchfink2015diamond]. Binette uses Pyrodigal [@larralde2022pyrodigal], a Python module that provides bindings and an interface to Prodigal [@hyatt2010prodigal]. The intermediate Checkm2 results are then used to assess the quality of individual bins, eliminating redundant calculations and speeding up the refinement process.
Bin completeness and contamination are assessed using CheckM2 [@chklovski2023checkm2]. Bins are scored using the following scoring function: $completeness - weight * contamination$, with the default weight set to 2. These scored bins are then sorted, facilitating the selection of a final new set of non-redundant bins (\autoref{fig:overview}.B). The ability to score bins is based on CheckM2 rather than CheckM1 as in the metaWRAP pipeline. CheckM2 uses a novel approach to evaluate bin quality based on machine learning techniques. This approach improves speed and also provides better results than CheckM1. Binette initiates CheckM2 processing by running its initial steps once for all contigs within the input bins. These initial steps involve gene prediction using Prodigal and alignment against the CheckM2 database using Diamond [@buchfink2015diamond]. Binette uses Pyrodigal [@larralde2022pyrodigal], a Python module that uses Cython to provide bindings to Prodigal [@hyatt2010prodigal]. The intermediate Checkm2 results are then used to assess the quality of individual bins, eliminating redundant calculations and speeding up the refinement process.

Binette serves as the bin refinement tool within the [metagWGS](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs) metagenomic analysis pipeline [@metagWGS_inprep], providing a robust and faster alternative to the bin refinement module of the metaWRAP pipeline as well as other similar bin refinement tools.

# Availability

Binette is readily available on [PyPI](https://pypi.org/project/Binette/) for seamless installation using standard Python package management tools. Additionally, a dedicated Conda package is available in the Bioconda channel [@gruning2018bioconda]. The source code for Binette is available on [GitHub](https://github.com/genotoul-bioinfo/binette) under the MIT license. The GitHub repository includes continuous integration tests, test coverage, and employs continuous deployment through GitHub actions to maintain a robust and reliable codebase.


# Acknowledgements

We would like to thank Matthias Zytnicki for his valuable insights and support during the development of the binette algorithm.


# References

0 comments on commit 12cdd43

Please sign in to comment.