Skip to content

Commit

Permalink
Merge pull request #95 from pdimens/docs_1.1
Browse files Browse the repository at this point in the history
Docs 1.1
  • Loading branch information
pdimens authored Jul 2, 2024
2 parents 52be3e1 + a725626 commit ef52cd6
Show file tree
Hide file tree
Showing 12 changed files with 181 additions and 130 deletions.
12 changes: 6 additions & 6 deletions Modules/Align/Align.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@ will need to align them to a reference genome before you can call variants.
Harpy offers several aligners for this purpose:

{.compact}
| aligner | linked-read aware | speed | link |
| :--- | :---: | :---:| :---: |
| [BWA](bwa.md) | no ❌ | fast ⚡ | [repo](https://github.com/lh3/bwa), [paper](http://arxiv.org/abs/1303.3997) |
| [EMA](ema.md) | yes ✅ | slow 🐢 |[repo](https://github.com/arshajii/ema), [paper](https://www.biorxiv.org/content/early/2017/11/16/220236) |
| [Minimap2](minimap.md) | no ❌ | fast ⚡ | [repo](https://github.com/lh3/minimap2) [paper](https://doi.org/10.1093/bioinformatics/btab705) |
| aligner | linked-read aware | speed | repository | publication |
| :--- | :---: | :---:| :---: | :---:|
| [BWA](bwa.md) | no ❌ | fast ⚡ | [github](https://github.com/lh3/bwa) | [paper](http://arxiv.org/abs/1303.3997) |
| [EMA](ema.md) | yes ✅ | slow 🐢 |[github](https://github.com/arshajii/ema) | [preprint](https://www.biorxiv.org/content/early/2017/11/16/220236) |
| [strobealign](strobe.md) | no ❌ | super fast ⚡ | [github](https://github.com/ksahlin/strobealign) | [paper](https://doi.org/10.1186/s13059-022-02831-7) |

Despite the fact that EMA is the only barcode-aware aligner offered, when using BWA or Minimap2, Harpy retains the barcode information from the sequence headers and will
Despite the fact that EMA is the only barcode-aware aligner offered, when using BWA or strobealign, Harpy retains the barcode information from the sequence headers and will
assign molecule identifiers (`MI:i` SAM tags) based on these barcodes and the [molecule distance threshold](../../haplotagdata.md/#barcode-thresholds).
16 changes: 9 additions & 7 deletions Modules/Align/bwa.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ order: 5
- at least 4 cores/threads available
- a genome assembly in FASTA format: [!badge variant="success" text=".fasta"] [!badge variant="success" text=".fa"] [!badge variant="success" text=".fasta.gz"] [!badge variant="success" text=".fa.gz"]
- paired-end fastq sequence file with the [proper naming convention](/haplotagdata/#naming-conventions) [!badge variant="secondary" text="gzipped recommended"]
- **forward**: [!badge variant="success" text="_F"] [!badge variant="success" text=".F"] [!badge variant="success" text=".1"] [!badge variant="success" text="_1"] [!badge variant="success" text="_R1_001"] [!badge variant="success" text=".R1_001"] [!badge variant="success" text="_R1"] [!badge variant="success" text=".R1"]
- **reverse**: [!badge variant="success" text="_R"] [!badge variant="success" text=".R"] [!badge variant="success" text=".2"] [!badge variant="success" text="_2"] [!badge variant="success" text="_R2_001"] [!badge variant="success" text=".R2_001"] [!badge variant="success" text="_R2"] [!badge variant="success" text=".R2"]
- **fastq extension**: [!badge variant="success" text=".fq"] [!badge variant="success" text=".fastq"] [!badge variant="success" text=".FQ"] [!badge variant="success" text=".FASTQ"]
===

Once sequences have been trimmed and passed through other QC filters, they will need to
Expand Down Expand Up @@ -123,6 +126,7 @@ Align/bwa
│ ├── sample1.markdup.log
│ │── sample1.sort.log
└── reports
├── barcodes.summary.html
├── bwa.stats.html
├── Sample1.html
└── data
Expand All @@ -140,6 +144,7 @@ Align/bwa
| `logs/*markdup.log` | stats provided by `samtools markdup` |
| `logs/*sort.log` | output of `samtools sort` |
| `reports/` | various counts/statistics/reports relating to sequence alignment |
| `reports/barcodes.summary.html` | interactive html report summarizing barcode-specific metrics across all samples |
| `reports/bwa.stats.html` | report summarizing `samtools flagstat and stats` results across all samples from `multiqc` |
| `reports/Sample1.html` | interactive html report summarizing BX tag metrics and alignment coverage |
| `reports/data/coverage/*.cov.gz` | output from samtools cov, used for plots |
Expand Down Expand Up @@ -173,16 +178,13 @@ These are taken directly from the [BWA documentation](https://bio-bwa.sourceforg
+++ :icon-graph: reports
These are the summary reports Harpy generates for this workflow. You may right-click
the images and open them in a new tab if you wish to see the examples in better detail.
||| Depth and coverage
Reports the depth of alignments in 10kb windows.
||| Alignment BX Information
An aggregate report of barcode-specific alignment information for all samples.
![reports/coverage/*.html](/static/report_align_coverage.png)
||| BX validation
Reports the number of valid/invalid barcodes in the alignments.
![reports/reads.bxstats.html](/static/report_align_bxstats.png)
||| Molecule size
||| Molecule size and Coverage
Reports the inferred molecule sized based on barcodes in the alignments.
![reports/BXstats/*.bxstats.html](/static/report_align_bxmol.png)
||| Alignment stats
||| Samtools Alignment stats
Reports the general statistics computed by samtools `stats` and `flagstat`
![reports/samtools_*stat/*html](/static/report_align_flagstat.png)
|||
Expand Down
18 changes: 9 additions & 9 deletions Modules/Align/ema.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ order: 5
- at least 4 cores/threads available
- a genome assembly in FASTA format: [!badge variant="success" text=".fasta"] [!badge variant="success" text=".fa"] [!badge variant="success" text=".fasta.gz"] [!badge variant="success" text=".fa.gz"]
- paired-end fastq sequence file with the [proper naming convention](/haplotagdata/#naming-conventions) [!badge variant="secondary" text="gzipped recommended"]
- **forward**: [!badge variant="success" text="_F"] [!badge variant="success" text=".F"] [!badge variant="success" text=".1"] [!badge variant="success" text="_1"] [!badge variant="success" text="_R1_001"] [!badge variant="success" text=".R1_001"] [!badge variant="success" text="_R1"] [!badge variant="success" text=".R1"]
- **reverse**: [!badge variant="success" text="_R"] [!badge variant="success" text=".R"] [!badge variant="success" text=".2"] [!badge variant="success" text="_2"] [!badge variant="success" text="_R2_001"] [!badge variant="success" text=".R2_001"] [!badge variant="success" text="_R2"] [!badge variant="success" text=".R2"]
- **fastq extension**: [!badge variant="success" text=".fq"] [!badge variant="success" text=".fastq"] [!badge variant="success" text=".FQ"] [!badge variant="success" text=".FASTQ"]
- patience because EMA is [!badge variant="warning" text="slow"]
==- Why EMA?
The original haplotag manuscript uses BWA to map reads. The authors have since recommended
Expand Down Expand Up @@ -144,8 +147,8 @@ Align/ema
│ └── preproc
│    └── Sample1.preproc.log
└── reports
├── barcodes.summary.html
├── ema.stats.html
├── reads.bxcounts.html
├── Sample1.html
└── data
   ├── bxstats
Expand All @@ -162,7 +165,7 @@ Align/ema
| `logs/preproc/*.preproc.log` | everything `ema preproc` writes to `stderr` during operation |
| `reports/` | various counts/statistics/reports relating to sequence alignment |
| `reports/ema.stats.html` | report summarizing `samtools flagstat and stats` results across all samples from `multiqc` |
| `reports/reads.bxcounts.html` | interactive html report summarizing `ema count` across all samples |
| `reports/barcodes.summary.html` | interactive html report summarizing barcode-specific metrics across all samples |
| `reports/Sample1.html` | interactive html report summarizing BX tag metrics and alignment coverage |
| `reports/data/coverage/*.cov.gz` | output from samtools cov, used for plots |
| `reports/data/bxstats` | tabular data containing the information used to generate the BX stats in reports |
Expand All @@ -184,16 +187,13 @@ These are taken directly from the [EMA documentation](https://github.com/arshaji
These are the summary reports Harpy generates for this workflow. You may right-click
the images and open them in a new tab if you wish to see the examples in better detail.

||| Depth and coverage
Reports the depth of alignments in 10kb windows.
||| Alignment BX Information
An aggregate report of barcode-specific alignment information for all samples.
![reports/coverage/*.html](/static/report_align_coverage.png)
||| BX validation
Reports the number of valid/invalid barcodes in the alignments.
![reports/reads.bxstats.html](/static/report_align_bxstats.png)
||| Molecule size
||| Molecule size and Coverage
Reports the inferred molecule sized based on barcodes in the alignments.
![reports/BXstats/*.bxstats.html](/static/report_align_bxmol.png)
||| Alignment stats
||| Samtools Alignment stats
Reports the general statistics computed by samtools `stats` and `flagstat`
![reports/samtools_*stat/*html](/static/report_align_flagstat.png)
|||
Expand Down
Loading

0 comments on commit ef52cd6

Please sign in to comment.