-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Revise output and required files documentation for clarity and detail
- Loading branch information
1 parent
7e01e6e
commit 0746336
Showing
2 changed files
with
66 additions
and
52 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,83 +1,97 @@ | ||
# Required File Preparation | ||
|
||
To identify methylated motifs, the following files are required: | ||
- Assembly | ||
- Modkit methylation pileup | ||
- Contig-bin relationship | ||
Before running nanomotif for motif detection and analysis, ensure that you have prepared the necessary input files. These include a genome assembly file, a methylation pileup file, and a contig-bin relationship file. | ||
|
||
--- | ||
|
||
## Assembly | ||
|
||
Assembly file containing all the conting for evaluation in fasta format. The contig id is expected to be in the header and the sequence is expected to only contain IUPAC characters (upper and lower case is accepted). Nanomotif was developed using assemblies made with [metaFlye](https://github.com/fenderglass/Flye). | ||
The assembly file should contain all contigs in FASTA format. Each header should have a unique contig identifier. The sequence should only include standard nucleotide or IUPAC characters (either upper or lower case). Nanomotif has been primarily developed and tested using assemblies generated by [Flye](https://github.com/fenderglass/Flye). | ||
|
||
**Requirements:** | ||
- Format: FASTA | ||
- Contains all contigs for evaluation | ||
- Contig ID in the FASTA header | ||
- IUPAC-compliant characters only | ||
|
||
--- | ||
|
||
## Methylation pileup | ||
File containing the number of methylated reads mapped at each position on a contig. | ||
The pileup file is generated by mapping reads with methylation calls in the header to the assembly mentioned in [Assembly](####Assembly). Then, using ONT [modkit](https://github.com/nanoporetech/modkit/blob/master/book/src/advanced_usage.md#pileup) generate the methylation pileup. | ||
## Methylation Pileup | ||
|
||
The methylation pileup file indicates how many mapped reads at each position show evidence of methylation. To generate this file: | ||
|
||
Code snippet for generating pileup file: | ||
1. Map reads (with methylation calls) to the assembly. | ||
2. Use [modkit pileup](https://github.com/nanoporetech/modkit/blob/master/book/src/advanced_usage.md#pileup) to create the pileup. | ||
|
||
**Example commands:** | ||
```shell | ||
MODCALLS="path/to/reads/with/methylation/calls.bam" | ||
ASSEMBLY="path/to/assembly.fa" | ||
MAPPING="path/to/generated/mapping.bam" | ||
PILEUP="path/to/generated/pileup.bed | ||
PILEUP="path/to/generated/pileup.bed" | ||
|
||
samtools fastq -T MM,ML $MODCALLS | \ | ||
minimap2 -ax map-ont -y $ASSEMBLY - | \ | ||
samtools view -bS | \ | ||
samtools sort -o $MAPPING | ||
minimap2 -ax map-ont -y $ASSEMBLY - | \ | ||
samtools view -bS | \ | ||
samtools sort -o $MAPPING | ||
|
||
modkit pileup --only-tabs $MAPPING $PILEUP | ||
``` | ||
|
||
|
||
Expected format: The pileup file is a tab-delimited table where each row represents a position on a contig, including information about methylation status. | ||
|
||
Caling head on the generated pileup file should show a table similair to the one below: | ||
| | | | | | | | | | | | | | | | | | | | ||
|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-| | ||
| contig_3 | 0 | 1 | m | 133 | - | 0 | 1 | 255,0,0 | 133 | 0.00 | 0 | 133 | 0 | 0 | 6 | 0 | 0 | | ||
| contig_3 | 1 | 2 | a | 174 | + | 1 | 2 | 255,0,0 | 174 | 1.72 | 3 | 171 | 0 | 0 | 3 | 0 | 0 | | ||
| contig_3 | 2 | 3 | a | 172 | + | 2 | 3 | 255,0,0 | 172 | 2.33 | 4 | 168 | 0 | 0 | 7 | 0 | 0 | | ||
| contig_3 | 3 | 4 | a | 178 | + | 3 | 4 | 255,0,0 | 178 | 0.56 | 1 | 177 | 0 | 0 | 2 | 0 | 0 | | ||
| contig_3 | 4 | 5 | a | 177 | + | 4 | 5 | 255,0,0 | 177 | 2.82 | 5 | 172 | 0 | 0 | 5 | 0 | 0 | | ||
| contig_3 | 5 | 6 | a | 179 | + | 5 | 6 | 255,0,0 | 179 | 2.79 | 5 | 174 | 0 | 0 | 3 | 2 | 0 | | ||
| contig_3 | 5 | 6 | m | 1 | + | 5 | 6 | 255,0,0 | 1 | 0.00 | 0 | 1 | 0 | 0 | 3 | 180 | 0 | | ||
| contig_3 | 5 | 6 | a | 1 | - | 5 | 6 | 255,0,0 | 1 | 0.00 | 0 | 1 | 0 | 0 | 0 | 156 | 0 | | ||
| contig_3 | 6 | 7 | m | 183 | + | 6 | 7 | 255,0,0 | 183 | 0.55 | 1 | 182 | 0 | 0 | 1 | 0 | 0 | | ||
| contig_3 | 6 | 7 | a | 4 | - | 6 | 7 | 255,0,0 | 4 | 0.00 | 0 | 4 | 0 | 0 | 0 | 153 | 0 | | ||
Running "head" on the pileup file should produce a table similar to the one below: | ||
|
||
| contig_3 | 0 | 1 | m | 133 | - | 0 | 1 | 255,0,0 | 133 | 0.00 | 0 | 133 | 0 | 0 | 6 | 0 | 0 | | ||
|----------|----|---|---|-----|---|---|---|---------|-----|------|---|-----|---|---|---|---|---| | ||
| contig_3 | 1 | 2 | a | 174 | + | 1 | 2 | 255,0,0 | 174 | 1.72 | 3 | 171 | 0 | 0 | 3 | 0 | 0 | | ||
| contig_3 | 2 | 3 | a | 172 | + | 2 | 3 | 255,0,0 | 172 | 2.33 | 4 | 168 | 0 | 0 | 7 | 0 | 0 | | ||
| contig_3 | 3 | 4 | a | 178 | + | 3 | 4 | 255,0,0 | 178 | 0.56 | 1 | 177 | 0 | 0 | 2 | 0 | 0 | | ||
| contig_3 | 4 | 5 | a | 177 | + | 4 | 5 | 255,0,0 | 177 | 2.82 | 5 | 172 | 0 | 0 | 5 | 0 | 0 | | ||
| contig_3 | 5 | 6 | a | 179 | + | 5 | 6 | 255,0,0 | 179 | 2.79 | 5 | 174 | 0 | 0 | 3 | 2 | 0 | | ||
| contig_3 | 5 | 6 | m | 1 | + | 5 | 6 | 255,0,0 | 1 | 0.00 | 0 | 1 | 0 | 0 | 3 | 180 | 0 | | ||
| contig_3 | 5 | 6 | a | 1 | - | 5 | 6 | 255,0,0 | 1 | 0.00 | 0 | 1 | 0 | 0 | 0 | 156 | 0 | | ||
| contig_3 | 6 | 7 | m | 183 | + | 6 | 7 | 255,0,0 | 183 | 0.55 | 1 | 182 | 0 | 0 | 1 | 0 | 0 | | ||
| contig_3 | 6 | 7 | a | 4 | - | 6 | 7 | 255,0,0 | 4 | 0.00 | 0 | 4 | 0 | 0 | 0 | 153 | 0 | | ||
|
||
**Considerations** | ||
Considerations: | ||
- Use untrimmed reads for mapping to avoid downstream errors. | ||
- Running modkit pileup with default parameters may set a low methylation threshold and introduce noise. A filter-threshold of 0.7 is recommended to reduce noise and improve motif detection quality. | ||
|
||
- When demultiplexing, trimming of reads may result in errors downstream. We therefore recommend using untrimmed reads for mapping | ||
--- | ||
|
||
- Running `modkit pileup` with default parameters results in modkit estimating the threshold for calling a methylation. This can result in very low methylation calling score threshold such as 0.6 or even lower. This is not detrimental to Nanomotif motif identification, but may result in inclusion of a high degree of noise, and loss of some motifs. We generally recommend a `--filter-threshold` of 0.7. | ||
## Contig-Bin Relationship | ||
|
||
## Contig-bin | ||
File describing which contigs belongs to which bins. It should be headerless, tab-separated file with contig id in the first column and bin in the second column. | ||
This file links each contig to its corresponding bin. It is a tab-separated file with two columns and no header: | ||
- Column 1: Contig ID | ||
- Column 2: Bin ID | ||
|
||
If the bins are outputted in a folder with one fasta file pr. bin, the contig-bin file can be generated using the following snippet: | ||
If you have a folder of bin FASTA files (one file per bin), you can generate the contig-bin file by extracting contig IDs and their associated bin filenames, then formatting this information into a two-column TSV. | ||
```shell | ||
BINS="/path/to/bins/fasta" # Bins directory | ||
BIN_EXT="fa" # Bins file extension | ||
OUT="contig_bin.tsv" # contig-bin output destination | ||
grep ">" ${BINS}/*.${EXT} | \ | ||
|
||
grep ">" ${BINS}/*.${BIN_EXT} | \ | ||
sed "s/.*\///" | \ | ||
sed "s/.${EXT}:>/\t/" | \ | ||
sed "s/.${BIN_EXT}:>/\t/" | \ | ||
awk -F'\t' '{print $2 "\t" $1}' > $OUT | ||
``` | ||
Caling head on the generated contig-bin file should show a table similair to the one below: | ||
| | | | ||
|-|-| | ||
| contig_1 | bin1 | | ||
| contig_2 | bin1 | | ||
| contig_3 | bin1 | | ||
| contig_4 | bin2 | | ||
| contig_5 | bin2 | | ||
| contig_6 | bin3 | | ||
| contig_7 | bin3 | | ||
| contig_8 | bin3 | | ||
| contig_9 | bin3 | | ||
| contig_10 | bin1 | | ||
|
||
|
||
Example output: | ||
|
||
| contig_1 | bin1 | | ||
|-----------|------| | ||
| contig_2 | bin1 | | ||
| contig_3 | bin1 | | ||
| contig_4 | bin2 | | ||
| contig_5 | bin2 | | ||
| contig_6 | bin3 | | ||
| contig_7 | bin3 | | ||
| contig_8 | bin3 | | ||
| contig_9 | bin3 | | ||
| contig_10 | bin1 | | ||
|
||
--- |