Merge pull request #67 from pdimens/docs_dev

make all the mermaids nicer
pdimens · Apr 3, 2024 · b80b141 · b80b141
2 parents c5be001 + b0e060d
commit b80b141
Show file tree

Hide file tree

Showing 11 changed files with 96 additions and 39 deletions.
diff --git a/Modules/Align/bwa.md b/Modules/Align/bwa.md
@@ -89,20 +89,25 @@ are not used to inform mapping. The `-m` threshold is used for alignment molecul
 
 ```mermaid
 graph LR
-    Z([trimmed reads]) --> B
     A([index genome]) --> B([align to genome])
     B-->C([sort alignments])
     C-->D([mark duplicates])
     D-->E([assign molecules])
     E-->F([alignment metrics])
     D-->G([barcode stats])
     G-->F
+    subgraph aln [Inputs]
+        Z[FASTQ files]---genome
+    end
+    aln-->B & A
     subgraph markdp [mark duplicates via `samtools`]
         direction LR
         collate-->fixmate
         fixmate-->sort
         sort-->markdup
     end
+    style markdp fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
+    style aln fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
 ```
 +++ :icon-file-directory: BWA output
 The default output directory is `Align/bwa` with the folder structure below. `Sample1` is a generic sample name for demonstration purposes.

diff --git a/Modules/Align/ema.md b/Modules/Align/ema.md
@@ -103,6 +103,10 @@ within alignments, but the BWA alignments need duplicates marked manually using
 
 ```mermaid
 graph LR
+    subgraph Inputs
+        trm[FASTQ files]---geno[genome]
+    end
+    Inputs-->A & IDX
     A([EMA count]) --> B([EMA preprocess])
     B-->C([EMA align barcoded])
     C-->D([sort BX alignments])
@@ -119,6 +123,8 @@ graph LR
         fixmate-->sort
         sort-->markdup
     end
+    style markdp fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
+    style Inputs fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
 
 ```
 +++ :icon-file-directory: EMA output

diff --git a/Modules/Align/minimap.md b/Modules/Align/minimap.md
@@ -91,20 +91,25 @@ are not used to inform mapping. The `-m` threshold is used for alignment molecul
 
 ```mermaid
 graph LR
-    Z([trimmed reads]) --> B
     A([index genome]) --> B([align to genome])
     B-->C([sort alignments])
     C-->D([mark duplicates])
     D-->E([assign molecules])
     E-->F([alignment metrics])
     D-->G([barcode stats])
     G-->F
+    subgraph aln [Inputs]
+        Z[FASTQ files]---genome[genome]
+    end
+    aln-->B & A
     subgraph markdp [mark duplicates via `samtools`]
         direction LR
         collate-->fixmate
         fixmate-->sort
         sort-->markdup
     end
+    style markdp fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
+    style aln fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
 ```
 +++ :icon-file-directory: minimap2 output
 The default output directory is `Align/minimap` with the folder structure below. `Sample1` is a generic sample name for demonstration purposes.

diff --git a/Modules/SV/leviathan.md b/Modules/SV/leviathan.md
@@ -97,17 +97,18 @@ in the alignments, then it calls variants using Leviathan.
 
 ```mermaid
 graph LR
-    subgraph Population calling
-    popsplit([merge by population])
+    subgraph id1 [Population calling]
+        bams2[BAM alignments] --> popsplit([merge by population])
     end
-    subgraph Individual calling
-    bams([individual alignments])
+    subgraph id2 [Individual calling]
+        bams[BAM alignments]
     end
-    popsplit-->A
-    bams-->A
+    id1 & id2-->A
     A([index barcodes]) --> B([leviathan])
     B-->C([convert to BCF])
     C-->E([generate reports])
+    style id1 fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
+    style id2 fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
 ```
 +++ :icon-file-directory: leviathan output
 The default output directory is `SV/leviathan` with the folder structure below. `sample1` and `sample2` are generic sample names for demonstration purposes.

diff --git a/Modules/SV/naibr.md b/Modules/SV/naibr.md
@@ -111,12 +111,15 @@ a phased VCF file than using alignments that were phased when mapped with EMA. T
 circuitous (see the workflow diagram), but the results were noticeably better.
 
 ```mermaid
+---
+title: Calling variants with NAIBR, starting with unphased alignments
+---
 graph LR
-    aln([alignments])-->|harpy snp|snps([SNPs])
+    aln[alignments]-->|harpy snp|snps([SNPs])
     snps-->|bcftools filter -i 'QUAL>95' ...|filt([filtered SNPs])
     filt-->|harpy phase|phasesnp([phased haplotypes])
-    phasesnp-->|whatshap haplotag|aln
-    aln-->|NAIBR|results((structural variants))
+    phasesnp-->|whatshap haplotag|aln2
+    aln2([phased alignments])-->|NAIBR|results((structural variants))
 ```
 
 ----
@@ -131,19 +134,22 @@ This fork includes improved accuracy as well as quality-of-life updates.
 
 ```mermaid
 graph LR
-    subgraph Phase
-    aln([unphased alignments])-->phased([phased alignments])
+    subgraph id1 ["Phase"]
+    aln[unphased BAM alignments]-->phased([phased alignments])
     end
-    subgraph Population calling
-    phased-->popsplit([merge by population])
+    subgraph id2 ["Population calling"]
+    popsplit([merge by population])
     end
+    id1-->id2
     popsplit-->A
-    phased-->A
+    id1-->A
     A([index alignments]) --> B([NAIBR])
     Z([create config file]) --> B
     popsplit --> Z
-    phased --> Z
+    id1 --> Z
     B-->C([generate reports])
+    style id2 fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
+    style id1 fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
 ```
 +++ :icon-file-directory: naibr output
 The default output directory is `SV/naibr` with the folder structure below. `sample1` and `sample2` are generic sample 

diff --git a/Modules/Simulate/simulate-variants.md b/Modules/Simulate/simulate-variants.md
@@ -56,6 +56,10 @@ specific variants to simulate. There are also these unifying options among the d
 
 ==- snps and indels
 ### snpindel
+!!!warning SNPs can be slow
+Given software limitations, simulating many SNPs (>10,000) will be noticeably slower than the other variant types.
+!!!
+
 A single nucleotide polymorphism ("SNP") is a genomic variant at a single base position in the DNA ([source](https://www.genome.gov/genetics-glossary/Single-Nucleotide-Polymorphisms)).
 An indel, is a type of mutation that involves the addition/deletion of one or more nucleotides into a segment of DNA ([insertions](https://www.genome.gov/genetics-glossary/Insertion), [deletions](https://www.genome.gov/genetics-glossary/Deletion)).
 The snp and indel variants are combined in this module because `simuG` allows simulating them together. The
@@ -80,10 +84,6 @@ the value to either `9999` or `0` :
 | `--indel-size-constant` | `-l` | float | 0.5 | Exponent constant for power-law-fitted indel size distribution |
 | `--snp-gene-constraints` | `-y` | string | | How to constrain randomly simulated SNPs {`noncoding`,`coding`,`2d`,`4d`} when using `--genes`|
 
-!!!warning SNPs can be slow
-Given software limitations, simulating many SNPs (>10,000) will be noticeably slower than the other variant types.
-!!!
-
 ==- inversions
 ### inversion
 Inversions are when a section of a chromosome appears in the reverse orientation ([source](https://www.genome.gov/genetics-glossary/Inversion)).
@@ -224,13 +224,11 @@ into homozygotes and heterozygotes, onto the original haploid genome, creating t
 genomes. 
 ```mermaid
 graph LR
-    hap1(inversion.hap1.vcf)-->|simulate inversion -v|geno
-    geno(haploid genome)-->genohap1(haplotype-1 genome)
-```
-```mermaid
-graph LR
-    hap1(inversion.hap2.vcf)-->|simulate inversion -v|geno
-    geno(haploid genome)-->genohap1(haplotype-2 genome)
+    subgraph id1 ["Inputs"]
+    hap1(inversion.hap1.vcf)---geno(haploid genome)
+    end
+    id1-->|simulate inversion -v|hapgeno(haplotype-1 genome)
+    style id1 fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
 ```
 #### Step 3
 Use the one of the new genome haplotypes for simulating other kinds of variants. 
@@ -245,17 +243,22 @@ graph LR
 ```
 #### Step 4
 Use the resulting haplotype VCFs to simulate known variants onto the **haplotype genomes** from
-Step 2.
-
+[Step 2](#step-2).
 ```mermaid
 graph LR
-    hap1(snpindel.hap1.vcf)-->|simulate snpindel -v|geno
-    geno(haplotype-1 genome)-->genohap1(haplotype-1 genome with new variants)
+    subgraph id1 ["Haplotype 1 inputs"]
+    hap1(snpindel.hap1.vcf)---geno(haplotype-1 genome)
+    end
+    id1-->|simulate inversion -v|genohap1(haplotype-1 genome with new variants)
+    style id1 fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
 ```
 ```mermaid
 graph LR
-    hap1(snpindel.hap2.vcf)-->|simulate snpindel -v|geno
-    geno(haplotype-2 genome)-->genohap1(haplotype-2 genome with new variants)
+    subgraph id2 ["Haplotype 2 inputs"]
+    hap1(snpindel.hap2.vcf)---geno(haplotype-2 genome)
+    end
+    id2-->|simulate inversion -v|genohap2(haplotype-2 genome with new variants)
+    style id2 fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
 ```
 
 #### Step 5

diff --git a/Modules/demultiplex.md b/Modules/demultiplex.md
@@ -71,10 +71,16 @@ individual samples is performed in parallel and using the beloved workhorse `gre
 
 ```mermaid
 graph LR
-    A([multiplexed FASTQ]) --> B([barcodes to headers])
+    subgraph Inputs
+        A[multiplexed FASTQ]
+        BX[Barcode Files]
+        SCH[Sample Schema]
+    end
+    Inputs-->B([barcodes to headers])
     B-->C([demultiplex samples])
     C-->D([quality metrics])
     D-->E([create report])
+    style Inputs fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
 ```
 
 +++ :icon-file-directory: demultiplexing output

diff --git a/Modules/impute.md b/Modules/impute.md
@@ -179,11 +179,18 @@ contigs have at least 2 biallelic SNPs, then performs imputation on only those c
 
 ```mermaid
 graph LR
+    subgraph Inputs
+        v[VCF file]---gen[genome]
+        gen---bam[BAM alignments]
+    end
     B([split contigs])-->C([keep biallelic SNPs])
+    Inputs-->B & C & G
     C-->D([convert to STITCH format])
     D-->E([STITCH imputation])
     E-->F([merge output])
     G([create file list])-->E
+    style Inputs fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
+
 ```
 +++ :icon-file-directory: impute output
 The default output directory is `Impute` with the folder structure below. `contig1` and `contig2` 

diff --git a/Modules/phase.md b/Modules/phase.md
@@ -79,10 +79,12 @@ across all of your samples to speed things along.
 
 ```mermaid
 graph LR
-    A([split samples]) --> B([extractHAIRS])
-    B-->C([LinkFragments])
-    Z([sample alignments]) --> B
-    Z-->C
+    subgraph Inputs
+    Z([sample alignments])---gen["genome (optional)"]
+    end
+    Inputs --> B([extractHAIRS])
+    Inputs--->A([split samples])
+    Inputs-->C([LinkFragments])
     C-->D([phase blocks])
     B-->D
     A-->D
@@ -94,6 +96,7 @@ graph LR
     D-->G
     G-->H([index merged annotations])
     H-->I([merge phased samples])
+    style Inputs fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
 ```
 
 +++ :icon-file-directory: phasing output

diff --git a/Modules/qc.md b/Modules/qc.md
@@ -43,9 +43,14 @@ approach (`--cut-right`) to identify low quality bases. The workflow is quite si
 
 ```mermaid
 graph LR
+    subgraph Inputs
+        F[FASTQ files]
+    end
+    Inputs-->A
     A([fastp trim]) --> B([count barcodes])
     A --> C([create reports])
     B --> C
+    style Inputs fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
 ```
 
 +++ :icon-file-directory: qc output

diff --git a/Modules/snp.md b/Modules/snp.md
@@ -89,6 +89,10 @@ are used to call variants from alignments.
 
 ```mermaid
 graph LR
+    subgraph Inputs
+        aln[BAM alignments]---gen[genome]
+    end
+    Inputs --> B & A
     A([split contigs]) --> B([bcftools mpileup])
     B-->C([bcftools call])
     C-->D([index BCFs])
@@ -97,6 +101,7 @@ graph LR
     E-->G([normalize variants])
     E-->F([generate reports])
     G-->F
+    style Inputs fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
 ```
 
 ### freebayes
@@ -105,12 +110,17 @@ call SNPs and small indels. Like mpileup, this method is ubiquitous in bioinform
 
 ```mermaid
 graph LR
+    subgraph Inputs
+        aln[BAM alignments]---gen[genome]
+    end
+    Inputs --> B & A
     A([split contigs]) --> B([freebayes])
     B-->D([index BCFs])
     D-->E([combine BCFs])
     E-->G([normalize variants])
     E-->F([generate reports])
     G-->F
+    style Inputs fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px
 ```
 
 +++ :icon-file-directory: snp output