Skip to content

Commit

Permalink
Revise output and required files documentation for clarity and detail
Browse files Browse the repository at this point in the history
  • Loading branch information
SorenHeidelbach committed Dec 19, 2024
1 parent 7e01e6e commit 0746336
Show file tree
Hide file tree
Showing 2 changed files with 66 additions and 52 deletions.
2 changes: 1 addition & 1 deletion docs/source/output.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Introduction
# Output desciption

The nanomotif workflow is designed to detect and characterize DNA methylation motifs, improve the quality of genome bins by identifying potential contamination, and link specific methylation motifs to candidate methyltransferase (MTase) genes. The outputs described below are generated by different nanomotif commands and analytical steps.

Expand Down
116 changes: 65 additions & 51 deletions docs/source/required_files.md
Original file line number Diff line number Diff line change
@@ -1,83 +1,97 @@
# Required File Preparation

To identify methylated motifs, the following files are required:
- Assembly
- Modkit methylation pileup
- Contig-bin relationship
Before running nanomotif for motif detection and analysis, ensure that you have prepared the necessary input files. These include a genome assembly file, a methylation pileup file, and a contig-bin relationship file.

---

## Assembly

Assembly file containing all the conting for evaluation in fasta format. The contig id is expected to be in the header and the sequence is expected to only contain IUPAC characters (upper and lower case is accepted). Nanomotif was developed using assemblies made with [metaFlye](https://github.com/fenderglass/Flye).
The assembly file should contain all contigs in FASTA format. Each header should have a unique contig identifier. The sequence should only include standard nucleotide or IUPAC characters (either upper or lower case). Nanomotif has been primarily developed and tested using assemblies generated by [Flye](https://github.com/fenderglass/Flye).

**Requirements:**
- Format: FASTA
- Contains all contigs for evaluation
- Contig ID in the FASTA header
- IUPAC-compliant characters only

---

## Methylation pileup
File containing the number of methylated reads mapped at each position on a contig.
The pileup file is generated by mapping reads with methylation calls in the header to the assembly mentioned in [Assembly](####Assembly). Then, using ONT [modkit](https://github.com/nanoporetech/modkit/blob/master/book/src/advanced_usage.md#pileup) generate the methylation pileup.
## Methylation Pileup

The methylation pileup file indicates how many mapped reads at each position show evidence of methylation. To generate this file:

Code snippet for generating pileup file:
1. Map reads (with methylation calls) to the assembly.
2. Use [modkit pileup](https://github.com/nanoporetech/modkit/blob/master/book/src/advanced_usage.md#pileup) to create the pileup.

**Example commands:**
```shell
MODCALLS="path/to/reads/with/methylation/calls.bam"
ASSEMBLY="path/to/assembly.fa"
MAPPING="path/to/generated/mapping.bam"
PILEUP="path/to/generated/pileup.bed
PILEUP="path/to/generated/pileup.bed"

samtools fastq -T MM,ML $MODCALLS | \
minimap2 -ax map-ont -y $ASSEMBLY - | \
samtools view -bS | \
samtools sort -o $MAPPING
minimap2 -ax map-ont -y $ASSEMBLY - | \
samtools view -bS | \
samtools sort -o $MAPPING

modkit pileup --only-tabs $MAPPING $PILEUP
```


Expected format: The pileup file is a tab-delimited table where each row represents a position on a contig, including information about methylation status.

Caling head on the generated pileup file should show a table similair to the one below:
| | | | | | | | | | | | | | | | | | |
|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|
| contig_3 | 0 | 1 | m | 133 | - | 0 | 1 | 255,0,0 | 133 | 0.00 | 0 | 133 | 0 | 0 | 6 | 0 | 0 |
| contig_3 | 1 | 2 | a | 174 | + | 1 | 2 | 255,0,0 | 174 | 1.72 | 3 | 171 | 0 | 0 | 3 | 0 | 0 |
| contig_3 | 2 | 3 | a | 172 | + | 2 | 3 | 255,0,0 | 172 | 2.33 | 4 | 168 | 0 | 0 | 7 | 0 | 0 |
| contig_3 | 3 | 4 | a | 178 | + | 3 | 4 | 255,0,0 | 178 | 0.56 | 1 | 177 | 0 | 0 | 2 | 0 | 0 |
| contig_3 | 4 | 5 | a | 177 | + | 4 | 5 | 255,0,0 | 177 | 2.82 | 5 | 172 | 0 | 0 | 5 | 0 | 0 |
| contig_3 | 5 | 6 | a | 179 | + | 5 | 6 | 255,0,0 | 179 | 2.79 | 5 | 174 | 0 | 0 | 3 | 2 | 0 |
| contig_3 | 5 | 6 | m | 1 | + | 5 | 6 | 255,0,0 | 1 | 0.00 | 0 | 1 | 0 | 0 | 3 | 180 | 0 |
| contig_3 | 5 | 6 | a | 1 | - | 5 | 6 | 255,0,0 | 1 | 0.00 | 0 | 1 | 0 | 0 | 0 | 156 | 0 |
| contig_3 | 6 | 7 | m | 183 | + | 6 | 7 | 255,0,0 | 183 | 0.55 | 1 | 182 | 0 | 0 | 1 | 0 | 0 |
| contig_3 | 6 | 7 | a | 4 | - | 6 | 7 | 255,0,0 | 4 | 0.00 | 0 | 4 | 0 | 0 | 0 | 153 | 0 |
Running "head" on the pileup file should produce a table similar to the one below:

| contig_3 | 0 | 1 | m | 133 | - | 0 | 1 | 255,0,0 | 133 | 0.00 | 0 | 133 | 0 | 0 | 6 | 0 | 0 |
|----------|----|---|---|-----|---|---|---|---------|-----|------|---|-----|---|---|---|---|---|
| contig_3 | 1 | 2 | a | 174 | + | 1 | 2 | 255,0,0 | 174 | 1.72 | 3 | 171 | 0 | 0 | 3 | 0 | 0 |
| contig_3 | 2 | 3 | a | 172 | + | 2 | 3 | 255,0,0 | 172 | 2.33 | 4 | 168 | 0 | 0 | 7 | 0 | 0 |
| contig_3 | 3 | 4 | a | 178 | + | 3 | 4 | 255,0,0 | 178 | 0.56 | 1 | 177 | 0 | 0 | 2 | 0 | 0 |
| contig_3 | 4 | 5 | a | 177 | + | 4 | 5 | 255,0,0 | 177 | 2.82 | 5 | 172 | 0 | 0 | 5 | 0 | 0 |
| contig_3 | 5 | 6 | a | 179 | + | 5 | 6 | 255,0,0 | 179 | 2.79 | 5 | 174 | 0 | 0 | 3 | 2 | 0 |
| contig_3 | 5 | 6 | m | 1 | + | 5 | 6 | 255,0,0 | 1 | 0.00 | 0 | 1 | 0 | 0 | 3 | 180 | 0 |
| contig_3 | 5 | 6 | a | 1 | - | 5 | 6 | 255,0,0 | 1 | 0.00 | 0 | 1 | 0 | 0 | 0 | 156 | 0 |
| contig_3 | 6 | 7 | m | 183 | + | 6 | 7 | 255,0,0 | 183 | 0.55 | 1 | 182 | 0 | 0 | 1 | 0 | 0 |
| contig_3 | 6 | 7 | a | 4 | - | 6 | 7 | 255,0,0 | 4 | 0.00 | 0 | 4 | 0 | 0 | 0 | 153 | 0 |

**Considerations**
Considerations:
- Use untrimmed reads for mapping to avoid downstream errors.
- Running modkit pileup with default parameters may set a low methylation threshold and introduce noise. A filter-threshold of 0.7 is recommended to reduce noise and improve motif detection quality.

- When demultiplexing, trimming of reads may result in errors downstream. We therefore recommend using untrimmed reads for mapping
---

- Running `modkit pileup` with default parameters results in modkit estimating the threshold for calling a methylation. This can result in very low methylation calling score threshold such as 0.6 or even lower. This is not detrimental to Nanomotif motif identification, but may result in inclusion of a high degree of noise, and loss of some motifs. We generally recommend a `--filter-threshold` of 0.7.
## Contig-Bin Relationship

## Contig-bin
File describing which contigs belongs to which bins. It should be headerless, tab-separated file with contig id in the first column and bin in the second column.
This file links each contig to its corresponding bin. It is a tab-separated file with two columns and no header:
- Column 1: Contig ID
- Column 2: Bin ID

If the bins are outputted in a folder with one fasta file pr. bin, the contig-bin file can be generated using the following snippet:
If you have a folder of bin FASTA files (one file per bin), you can generate the contig-bin file by extracting contig IDs and their associated bin filenames, then formatting this information into a two-column TSV.
```shell
BINS="/path/to/bins/fasta" # Bins directory
BIN_EXT="fa" # Bins file extension
OUT="contig_bin.tsv" # contig-bin output destination
grep ">" ${BINS}/*.${EXT} | \

grep ">" ${BINS}/*.${BIN_EXT} | \
sed "s/.*\///" | \
sed "s/.${EXT}:>/\t/" | \
sed "s/.${BIN_EXT}:>/\t/" | \
awk -F'\t' '{print $2 "\t" $1}' > $OUT
```
Caling head on the generated contig-bin file should show a table similair to the one below:
| | |
|-|-|
| contig_1 | bin1 |
| contig_2 | bin1 |
| contig_3 | bin1 |
| contig_4 | bin2 |
| contig_5 | bin2 |
| contig_6 | bin3 |
| contig_7 | bin3 |
| contig_8 | bin3 |
| contig_9 | bin3 |
| contig_10 | bin1 |


Example output:

| contig_1 | bin1 |
|-----------|------|
| contig_2 | bin1 |
| contig_3 | bin1 |
| contig_4 | bin2 |
| contig_5 | bin2 |
| contig_6 | bin3 |
| contig_7 | bin3 |
| contig_8 | bin3 |
| contig_9 | bin3 |
| contig_10 | bin1 |

---

0 comments on commit 0746336

Please sign in to comment.