Skip to content

Commit

Permalink
Merge pull request #67 from pdimens/docs_dev
Browse files Browse the repository at this point in the history
make all the mermaids nicer
  • Loading branch information
pdimens authored Apr 3, 2024
2 parents c5be001 + b0e060d commit b80b141
Show file tree
Hide file tree
Showing 11 changed files with 96 additions and 39 deletions.
7 changes: 6 additions & 1 deletion Modules/Align/bwa.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,20 +89,25 @@ are not used to inform mapping. The `-m` threshold is used for alignment molecul

```mermaid
graph LR
Z([trimmed reads]) --> B
A([index genome]) --> B([align to genome])
B-->C([sort alignments])
C-->D([mark duplicates])
D-->E([assign molecules])
E-->F([alignment metrics])
D-->G([barcode stats])
G-->F
subgraph aln [Inputs]
Z[FASTQ files]---genome
end
aln-->B & A
subgraph markdp [mark duplicates via `samtools`]
direction LR
collate-->fixmate
fixmate-->sort
sort-->markdup
end
style markdp fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
style aln fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
```
+++ :icon-file-directory: BWA output
The default output directory is `Align/bwa` with the folder structure below. `Sample1` is a generic sample name for demonstration purposes.
Expand Down
6 changes: 6 additions & 0 deletions Modules/Align/ema.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,10 @@ within alignments, but the BWA alignments need duplicates marked manually using

```mermaid
graph LR
subgraph Inputs
trm[FASTQ files]---geno[genome]
end
Inputs-->A & IDX
A([EMA count]) --> B([EMA preprocess])
B-->C([EMA align barcoded])
C-->D([sort BX alignments])
Expand All @@ -119,6 +123,8 @@ graph LR
fixmate-->sort
sort-->markdup
end
style markdp fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
style Inputs fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
```
+++ :icon-file-directory: EMA output
Expand Down
7 changes: 6 additions & 1 deletion Modules/Align/minimap.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,20 +91,25 @@ are not used to inform mapping. The `-m` threshold is used for alignment molecul

```mermaid
graph LR
Z([trimmed reads]) --> B
A([index genome]) --> B([align to genome])
B-->C([sort alignments])
C-->D([mark duplicates])
D-->E([assign molecules])
E-->F([alignment metrics])
D-->G([barcode stats])
G-->F
subgraph aln [Inputs]
Z[FASTQ files]---genome[genome]
end
aln-->B & A
subgraph markdp [mark duplicates via `samtools`]
direction LR
collate-->fixmate
fixmate-->sort
sort-->markdup
end
style markdp fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
style aln fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
```
+++ :icon-file-directory: minimap2 output
The default output directory is `Align/minimap` with the folder structure below. `Sample1` is a generic sample name for demonstration purposes.
Expand Down
13 changes: 7 additions & 6 deletions Modules/SV/leviathan.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,17 +97,18 @@ in the alignments, then it calls variants using Leviathan.

```mermaid
graph LR
subgraph Population calling
popsplit([merge by population])
subgraph id1 [Population calling]
bams2[BAM alignments] --> popsplit([merge by population])
end
subgraph Individual calling
bams([individual alignments])
subgraph id2 [Individual calling]
bams[BAM alignments]
end
popsplit-->A
bams-->A
id1 & id2-->A
A([index barcodes]) --> B([leviathan])
B-->C([convert to BCF])
C-->E([generate reports])
style id1 fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
style id2 fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
```
+++ :icon-file-directory: leviathan output
The default output directory is `SV/leviathan` with the folder structure below. `sample1` and `sample2` are generic sample names for demonstration purposes.
Expand Down
24 changes: 15 additions & 9 deletions Modules/SV/naibr.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,12 +111,15 @@ a phased VCF file than using alignments that were phased when mapped with EMA. T
circuitous (see the workflow diagram), but the results were noticeably better.

```mermaid
---
title: Calling variants with NAIBR, starting with unphased alignments
---
graph LR
aln([alignments])-->|harpy snp|snps([SNPs])
aln[alignments]-->|harpy snp|snps([SNPs])
snps-->|bcftools filter -i 'QUAL>95' ...|filt([filtered SNPs])
filt-->|harpy phase|phasesnp([phased haplotypes])
phasesnp-->|whatshap haplotag|aln
aln-->|NAIBR|results((structural variants))
phasesnp-->|whatshap haplotag|aln2
aln2([phased alignments])-->|NAIBR|results((structural variants))
```

----
Expand All @@ -131,19 +134,22 @@ This fork includes improved accuracy as well as quality-of-life updates.

```mermaid
graph LR
subgraph Phase
aln([unphased alignments])-->phased([phased alignments])
subgraph id1 ["Phase"]
aln[unphased BAM alignments]-->phased([phased alignments])
end
subgraph Population calling
phased-->popsplit([merge by population])
subgraph id2 ["Population calling"]
popsplit([merge by population])
end
id1-->id2
popsplit-->A
phased-->A
id1-->A
A([index alignments]) --> B([NAIBR])
Z([create config file]) --> B
popsplit --> Z
phased --> Z
id1 --> Z
B-->C([generate reports])
style id2 fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
style id1 fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
```
+++ :icon-file-directory: naibr output
The default output directory is `SV/naibr` with the folder structure below. `sample1` and `sample2` are generic sample
Expand Down
37 changes: 20 additions & 17 deletions Modules/Simulate/simulate-variants.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,10 @@ specific variants to simulate. There are also these unifying options among the d

==- snps and indels
### snpindel
!!!warning SNPs can be slow
Given software limitations, simulating many SNPs (>10,000) will be noticeably slower than the other variant types.
!!!

A single nucleotide polymorphism ("SNP") is a genomic variant at a single base position in the DNA ([source](https://www.genome.gov/genetics-glossary/Single-Nucleotide-Polymorphisms)).
An indel, is a type of mutation that involves the addition/deletion of one or more nucleotides into a segment of DNA ([insertions](https://www.genome.gov/genetics-glossary/Insertion), [deletions](https://www.genome.gov/genetics-glossary/Deletion)).
The snp and indel variants are combined in this module because `simuG` allows simulating them together. The
Expand All @@ -80,10 +84,6 @@ the value to either `9999` or `0` :
| `--indel-size-constant` | `-l` | float | 0.5 | Exponent constant for power-law-fitted indel size distribution |
| `--snp-gene-constraints` | `-y` | string | | How to constrain randomly simulated SNPs {`noncoding`,`coding`,`2d`,`4d`} when using `--genes`|

!!!warning SNPs can be slow
Given software limitations, simulating many SNPs (>10,000) will be noticeably slower than the other variant types.
!!!

==- inversions
### inversion
Inversions are when a section of a chromosome appears in the reverse orientation ([source](https://www.genome.gov/genetics-glossary/Inversion)).
Expand Down Expand Up @@ -224,13 +224,11 @@ into homozygotes and heterozygotes, onto the original haploid genome, creating t
genomes.
```mermaid
graph LR
hap1(inversion.hap1.vcf)-->|simulate inversion -v|geno
geno(haploid genome)-->genohap1(haplotype-1 genome)
```
```mermaid
graph LR
hap1(inversion.hap2.vcf)-->|simulate inversion -v|geno
geno(haploid genome)-->genohap1(haplotype-2 genome)
subgraph id1 ["Inputs"]
hap1(inversion.hap1.vcf)---geno(haploid genome)
end
id1-->|simulate inversion -v|hapgeno(haplotype-1 genome)
style id1 fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
```
#### Step 3
Use the one of the new genome haplotypes for simulating other kinds of variants.
Expand All @@ -245,17 +243,22 @@ graph LR
```
#### Step 4
Use the resulting haplotype VCFs to simulate known variants onto the **haplotype genomes** from
Step 2.

[Step 2](#step-2).
```mermaid
graph LR
hap1(snpindel.hap1.vcf)-->|simulate snpindel -v|geno
geno(haplotype-1 genome)-->genohap1(haplotype-1 genome with new variants)
subgraph id1 ["Haplotype 1 inputs"]
hap1(snpindel.hap1.vcf)---geno(haplotype-1 genome)
end
id1-->|simulate inversion -v|genohap1(haplotype-1 genome with new variants)
style id1 fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
```
```mermaid
graph LR
hap1(snpindel.hap2.vcf)-->|simulate snpindel -v|geno
geno(haplotype-2 genome)-->genohap1(haplotype-2 genome with new variants)
subgraph id2 ["Haplotype 2 inputs"]
hap1(snpindel.hap2.vcf)---geno(haplotype-2 genome)
end
id2-->|simulate inversion -v|genohap2(haplotype-2 genome with new variants)
style id2 fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
```

#### Step 5
Expand Down
8 changes: 7 additions & 1 deletion Modules/demultiplex.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,10 +71,16 @@ individual samples is performed in parallel and using the beloved workhorse `gre

```mermaid
graph LR
A([multiplexed FASTQ]) --> B([barcodes to headers])
subgraph Inputs
A[multiplexed FASTQ]
BX[Barcode Files]
SCH[Sample Schema]
end
Inputs-->B([barcodes to headers])
B-->C([demultiplex samples])
C-->D([quality metrics])
D-->E([create report])
style Inputs fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
```

+++ :icon-file-directory: demultiplexing output
Expand Down
7 changes: 7 additions & 0 deletions Modules/impute.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,11 +179,18 @@ contigs have at least 2 biallelic SNPs, then performs imputation on only those c

```mermaid
graph LR
subgraph Inputs
v[VCF file]---gen[genome]
gen---bam[BAM alignments]
end
B([split contigs])-->C([keep biallelic SNPs])
Inputs-->B & C & G
C-->D([convert to STITCH format])
D-->E([STITCH imputation])
E-->F([merge output])
G([create file list])-->E
style Inputs fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
```
+++ :icon-file-directory: impute output
The default output directory is `Impute` with the folder structure below. `contig1` and `contig2`
Expand Down
11 changes: 7 additions & 4 deletions Modules/phase.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,10 +79,12 @@ across all of your samples to speed things along.

```mermaid
graph LR
A([split samples]) --> B([extractHAIRS])
B-->C([LinkFragments])
Z([sample alignments]) --> B
Z-->C
subgraph Inputs
Z([sample alignments])---gen["genome (optional)"]
end
Inputs --> B([extractHAIRS])
Inputs--->A([split samples])
Inputs-->C([LinkFragments])
C-->D([phase blocks])
B-->D
A-->D
Expand All @@ -94,6 +96,7 @@ graph LR
D-->G
G-->H([index merged annotations])
H-->I([merge phased samples])
style Inputs fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
```

+++ :icon-file-directory: phasing output
Expand Down
5 changes: 5 additions & 0 deletions Modules/qc.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,14 @@ approach (`--cut-right`) to identify low quality bases. The workflow is quite si

```mermaid
graph LR
subgraph Inputs
F[FASTQ files]
end
Inputs-->A
A([fastp trim]) --> B([count barcodes])
A --> C([create reports])
B --> C
style Inputs fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
```

+++ :icon-file-directory: qc output
Expand Down
10 changes: 10 additions & 0 deletions Modules/snp.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,10 @@ are used to call variants from alignments.

```mermaid
graph LR
subgraph Inputs
aln[BAM alignments]---gen[genome]
end
Inputs --> B & A
A([split contigs]) --> B([bcftools mpileup])
B-->C([bcftools call])
C-->D([index BCFs])
Expand All @@ -97,6 +101,7 @@ graph LR
E-->G([normalize variants])
E-->F([generate reports])
G-->F
style Inputs fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
```

### freebayes
Expand All @@ -105,12 +110,17 @@ call SNPs and small indels. Like mpileup, this method is ubiquitous in bioinform

```mermaid
graph LR
subgraph Inputs
aln[BAM alignments]---gen[genome]
end
Inputs --> B & A
A([split contigs]) --> B([freebayes])
B-->D([index BCFs])
D-->E([combine BCFs])
E-->G([normalize variants])
E-->F([generate reports])
G-->F
style Inputs fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
```

+++ :icon-file-directory: snp output
Expand Down

0 comments on commit b80b141

Please sign in to comment.