From f8c823067421bafd0995b4f46a0a4eeccd9c91dd Mon Sep 17 00:00:00 2001 From: pdimens Date: Mon, 19 Feb 2024 13:57:45 -0500 Subject: [PATCH 1/2] fix text --- Modules/Align/bwa.md | 9 ++++----- Modules/Align/ema.md | 3 +-- 2 files changed, 5 insertions(+), 7 deletions(-) diff --git a/Modules/Align/bwa.md b/Modules/Align/bwa.md index a9d7c2fed..74e805b12 100644 --- a/Modules/Align/bwa.md +++ b/Modules/Align/bwa.md @@ -81,11 +81,10 @@ if the primary alignment was marked as a duplicate. Duplicates get marked but ** - ignores (but retains) barcode information - fast -The [BWA MEM](https://github.com/lh3/bwa) workflow is substantially simpler and faster than the EMA workflow - and maps all reads against the reference genome, no muss no fuss. Duplicates are marked using - [sambamba](https://lomereiter.github.io/sambamba/). The `BX:Z` tags in the read headers are still added - to the alignment headers, even though barcodes are not used to inform mapping. The `-m` threshold is used - for alignment molecule assignment. +The [BWA MEM](https://github.com/lh3/bwa) workflow is much simpler and faster than the EMA workflow + and maps all reads against the reference genome. Duplicates are marked using `samtools markdup`. + The `BX:Z` tags in the read headers are still added to the alignment headers, even though barcodes + are not used to inform mapping. The `-m` threshold is used for alignment molecule assignment. ```mermaid graph LR diff --git a/Modules/Align/ema.md b/Modules/Align/ema.md index 566a0a418..d85e81079 100644 --- a/Modules/Align/ema.md +++ b/Modules/Align/ema.md @@ -97,8 +97,7 @@ information, the EMA workflow is a bit more complicated under the hood. Reads wi barcodes are aligned using EMA and reads without valid barcodes are separately mapped using BWA before merging all the alignments together again. EMA will mark duplicates within alignments, but the BWA alignments need duplicates marked manually using -[sambamba](https://lomereiter.github.io/sambamba/). Thankfully, you shouldn't need -to worry about any of these details. +`samtools markdup`. ```mermaid graph LR From 90c4e71d1be71cbfb462ee8ebed8e5fc3a53e5f9 Mon Sep 17 00:00:00 2001 From: pdimens Date: Mon, 19 Feb 2024 14:04:14 -0500 Subject: [PATCH 2/2] add phrase about worflow dir --- Modules/Align/bwa.md | 7 +++--- Modules/Align/ema.md | 3 ++- Modules/SV/leviathan.md | 1 + Modules/SV/naibr.md | 1 + Modules/demultiplex.md | 1 + Modules/impute.md | 1 + Modules/phase.md | 1 + Modules/qc.md | 1 + Modules/snp.md | 1 + software.md | 52 ++++++++++++++++++++++------------------- 10 files changed, 41 insertions(+), 28 deletions(-) diff --git a/Modules/Align/bwa.md b/Modules/Align/bwa.md index 74e805b12..65a6e3e88 100644 --- a/Modules/Align/bwa.md +++ b/Modules/Align/bwa.md @@ -82,9 +82,9 @@ if the primary alignment was marked as a duplicate. Duplicates get marked but ** - fast The [BWA MEM](https://github.com/lh3/bwa) workflow is much simpler and faster than the EMA workflow - and maps all reads against the reference genome. Duplicates are marked using `samtools markdup`. - The `BX:Z` tags in the read headers are still added to the alignment headers, even though barcodes - are not used to inform mapping. The `-m` threshold is used for alignment molecule assignment. +and maps all reads against the reference genome. Duplicates are marked using `samtools markdup`. +The `BX:Z` tags in the read headers are still added to the alignment headers, even though barcodes +are not used to inform mapping. The `-m` threshold is used for alignment molecule assignment. ```mermaid graph LR @@ -105,6 +105,7 @@ graph LR ``` +++ :icon-file-directory: BWA output The `harpy align` module creates an `Align/bwa` directory with the folder structure below. `Sample1` is a generic sample name for demonstration purposes. +The resulting folder also includes a `workflow` directory (not shown) with workflow-relevant runtime files and information. ``` Align/bwa ├── Sample1.bam diff --git a/Modules/Align/ema.md b/Modules/Align/ema.md index d85e81079..ad621ba9b 100644 --- a/Modules/Align/ema.md +++ b/Modules/Align/ema.md @@ -120,7 +120,8 @@ graph LR ``` +++ :icon-file-directory: EMA output -The `harpy align` module creates an `Align/ema` directory with the folder structure below. `Sample1` is a generic sample name for demonstration purposes. +The `harpy align` module creates an `Align/ema` directory with the folder structure below. `Sample1` is a generic sample name for demonstration purposes. +The resulting folder also includes a `workflow` directory (not shown) with workflow-relevant runtime files and information. ``` Align/ema ├── Sample1.bam diff --git a/Modules/SV/leviathan.md b/Modules/SV/leviathan.md index 531d2d872..c2a8e6548 100644 --- a/Modules/SV/leviathan.md +++ b/Modules/SV/leviathan.md @@ -111,6 +111,7 @@ graph LR ``` +++ :icon-file-directory: leviathan output The `harpy variants --method leviathan` module creates a `Variants/leviathan` (or `leviathan-pop`) directory with the folder structure below. `sample1` and `sample2` are generic sample names for demonstration purposes. +The resulting folder also includes a `workflow` directory (not shown) with workflow-relevant runtime files and information. ``` Variants/leviathan/ diff --git a/Modules/SV/naibr.md b/Modules/SV/naibr.md index 7269dce58..923ca85fc 100644 --- a/Modules/SV/naibr.md +++ b/Modules/SV/naibr.md @@ -149,6 +149,7 @@ graph LR The `harpy sv --method naibr` module creates a `Variants/naibr` (or `naibr-pop`) directory with the folder structure below. `sample1` and `sample2` are generic sample names for demonstration purposes. +The resulting folder also includes a `workflow` directory (not shown) with workflow-relevant runtime files and information. ``` Variants/naibr/ diff --git a/Modules/demultiplex.md b/Modules/demultiplex.md index 769bf74c8..124de2fd6 100644 --- a/Modules/demultiplex.md +++ b/Modules/demultiplex.md @@ -80,6 +80,7 @@ graph LR +++ :icon-file-directory: demultiplexing output The `harpy demultiplex` module creates an `Demultiplex/PREFIX` directory with the folder structure below, where `PREFIX` is the prefix of your input file that Harpy infers by removing the file extension and forward/reverse distinction. `Sample1` and `Sample2` are generic sample names for demonstration purposes. +The resulting folder also includes a `workflow` directory (not shown) with workflow-relevant runtime files and information. ``` Demultiplex/PREFIX ├── Sample1.F.fq.gz diff --git a/Modules/impute.md b/Modules/impute.md index 845f4fa24..8a6c0b8fb 100644 --- a/Modules/impute.md +++ b/Modules/impute.md @@ -190,6 +190,7 @@ The `harpy impute` module creates an `Imputation` directory with the folder stru are generic contig names from an imaginary `genome.fasta` for demonstration purposes. The directory `model1/` is a generic name to reflect the corresponding parameter row in the stitch parameter file, which would have explicit names in real use (e.g. `modelpseudoHaploid_useBXTrue_k10_s1_nGen50/`). +The resulting folder also includes a `workflow` directory (not shown) with workflow-relevant runtime files and information. ``` Impute/ diff --git a/Modules/phase.md b/Modules/phase.md index f527ea12f..02afe144b 100644 --- a/Modules/phase.md +++ b/Modules/phase.md @@ -99,6 +99,7 @@ graph LR +++ :icon-file-directory: phasing output The `harpy phase` module creates an `Phase` directory with the folder structure below. `Sample1` is a generic sample name for demonstration purposes. If using the `--ignore-bx` option, the output directory will be named `Phase.noBX` instead. +The resulting folder also includes a `workflow` directory (not shown) with workflow-relevant runtime files and information. ``` Phase/ diff --git a/Modules/qc.md b/Modules/qc.md index 590b04006..f52f92c6f 100644 --- a/Modules/qc.md +++ b/Modules/qc.md @@ -50,6 +50,7 @@ graph LR +++ :icon-file-directory: qc output The `harpy qc` module creates a `QC` directory with the folder structure below. `Sample1` and `Sample2` are generic sample names for demonstration purposes. +The resulting folder also includes a `workflow` directory (not shown) with workflow-relevant runtime files and information. ``` QC/ ├── Sample1.R1.fq.gz diff --git a/Modules/snp.md b/Modules/snp.md index 61ef3b236..8a3c1495a 100644 --- a/Modules/snp.md +++ b/Modules/snp.md @@ -117,6 +117,7 @@ graph LR The `harpy snp` module creates a `Variants/METHOD` directory with the folder structure below where `METHOD` is what you specify as the `--method` (mpileup or freebayes). `contig1` and `contig2` are generic contig names from an imaginary `genome.fasta` for demonstration purposes. +The resulting folder also includes a `workflow` directory (not shown) with workflow-relevant runtime files and information. ``` Variants/METHOD ├── variants.normalized.bcf diff --git a/software.md b/software.md index 73c868a9b..3c27cba60 100644 --- a/software.md +++ b/software.md @@ -9,27 +9,31 @@ HARPY is the sum of its parts, and out of tremendous respect for the developers Issues with specific tools might warrant a discussion with the authors/developers on the repositories of these projects. -| Software | Website | Publication | -|:-----------|:-------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------| -| bash | [website](https://www.gnu.org/software/bash/) | | -| bcftools | [website](https://samtools.github.io/bcftools/bcftools.html) | | -| bgzip | [website](http://www.htslib.org/doc/bgzip.html) | | -| bwa | [website](https://github.com/lh3/bwa) | [pubication](http://arxiv.org/abs/1303.3997) | -| click | [website](https://github.com/pallets/click) | | -| conda | [website](https://github.com/conda) | | -| EMA | [website](https://github.com/arshajii/ema) | [publication](https://www.biorxiv.org/content/early/2017/11/16/220236) | -| fastp | [website](https://github.com/OpenGene/fastp) | [publication](https://doi.org/10.1093/bioinformatics/bty560) | -| HapCUT2 | [website](https://github.com/vibansal/HapCUT2) | [publication](https://doi.org/10.1101/gr.213462.116) | -| LEVIATHAN | [website](https://github.com/morispi/LEVIATHAN) | [publication](https://doi.org/10.1101/2021.03.25.437002) | -| LRez | [website](https://github.com/morispi/LRez) | [publication](https://academic.oup.com/bioinformaticsadvances/article/1/1/vbab022/6375438?login=false) | -| mamba | [website](https://github.com/mamba-org/mamba) | | -| NAIBR | [website](https://github.com/raphael-group/NAIBR) + [fork](https://github.com/pontushojer/NAIBR) | [publication](https://doi.org/10.1093/bioinformatics/btx712) | -| python | [website](https://www.python.org/) | | -| rich | [webiste](https://github.com/Textualize/rich) | | -| rich-click | [website](https://github.com/ewels/rich-click) | | -| sambamba | [website](https://github.com/biod/sambamba) | [publication](https://doi.org/10.1093/bioinformatics/btv098) | -| samtools | [website](http://www.htslib.org/) | | -| seqtk | [website](https://github.com/lh3/seqtk) | | -| Snakemake | [website](https://github.com/snakemake/snakemake) | [publication](https://f1000research.com/articles/10-33/v1) | -| STITCH | [website](https://github.com/rwdavies/STITCH) | [publication](https://doi.org/10.1038%2Fng.3594) | -| whatshap | [website](https://github.com/whatshap/whatshap) | [publication](https://doi.org/10.1101/085050) | \ No newline at end of file +| Software | Links | +|:------------|:--------------------------------------------------------------------------------------------------------------------| +| bash | [website](https://www.gnu.org/software/bash/) | +| bcftools | [website](https://samtools.github.io/bcftools/bcftools.html) | +| bgzip | [website](http://www.htslib.org/doc/bgzip.html) | +| bwa | [website](https://github.com/lh3/bwa), [publication](http://arxiv.org/abs/1303.3997) | +| click | [website](https://github.com/pallets/click) | +| conda | [website](https://github.com/conda) | +| EMA | [website](https://github.com/arshajii/ema), [publication](https://www.biorxiv.org/content/early/2017/11/16/220236) | +| fastp | [website](https://github.com/OpenGene/fastp), [publication](https://doi.org/10.1093/bioinformatics/bty560) | +| HapCUT2 | [website](https://github.com/vibansal/HapCUT2), [publication](https://doi.org/10.1101/gr.213462.116) | +| LEVIATHAN | [website](https://github.com/morispi/LEVIATHAN), [publication](https://doi.org/10.1101/2021.03.25.437002) | +| LRez | [website](https://github.com/morispi/LRez), [publication](https://academic.oup.com/bioinformaticsadvances/article/1/1/vbab022/6375438?login=false) | +| mamba | [website](https://github.com/mamba-org/mamba) | +| NAIBR | [website](https://github.com/raphael-group/NAIBR), [fork](https://github.com/pontushojer/NAIBR), [publication](https://doi.org/10.1093/bioinformatics/btx712) | +| plotly | [website](https://plotly.com/) | +| python | [website](https://www.python.org/) | +| R | [website](https://www.r-project.org/) | +| r-circlize | [website](https://github.com/jokergoo/circlize), [publication](https://doi.org/10.1093/bioinformatics/btu393) | +| r-tidyverse | [website](https://www.tidyverse.org/), [publication](https://doi.org/10.21105/joss.01686) | +| r-DT | [website](https://rstudio.github.io/DT/), [js-website](http://datatables.net) | +| rich | [website](https://github.com/Textualize/rich) | +| rich-click | [website](https://github.com/ewels/rich-click) | +| samtools | [website](http://www.htslib.org/) | +| seqtk | [website](https://github.com/lh3/seqtk) | +| Snakemake | [website](https://github.com/snakemake/snakemake), [publication](https://f1000research.com/articles/10-33/v1) | +| STITCH | [website](https://github.com/rwdavies/STITCH), [publication](https://doi.org/10.1038%2Fng.3594) | +| whatshap | [website](https://github.com/whatshap/whatshap), [publication](https://doi.org/10.1101/085050) | \ No newline at end of file