Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
pdimens committed Oct 27, 2023
1 parent 08bd1e5 commit d8d6467
Show file tree
Hide file tree
Showing 9 changed files with 106 additions and 103 deletions.
71 changes: 0 additions & 71 deletions Modules/extrafiles.md

This file was deleted.

4 changes: 2 additions & 2 deletions Modules/impute.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ harpy impute OPTIONS...

```bash example
# create stitch parameter file 'stitch.params'
harpy extra -s stitch.params
harpy stitchparams -o stitch.params

# run imputation
harpy impute --threads 20 --vcf Variants/mpileup/variants.raw.bcf --directory Align/ema --parameters stitch.params
Expand Down Expand Up @@ -62,7 +62,7 @@ Typically, one runs STITCH multiple times, exploring how results vary with
different model parameters (explained in next section). The solution Harpy uses for this is to have the user
provide a tab-delimited dataframe file where the columns are the 6 STITCH model
parameters and the rows are the values for those parameters. The parameter file
is required and can be created manually or with `harpy extra -s <filename>`.
is required and can be created manually or with `harpy stitchparams -o <filename>`.
If created using harpy, the resulting file includes largely meaningless values
that you will need to adjust for your study. The parameter must follow a particular format:
- tab or comma delimited
Expand Down
69 changes: 69 additions & 0 deletions Modules/othermodules.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
label: Other
order: 1
icon: file-diff
description: Generate extra files for analysis with Harpy
---

# :icon-file-diff: Other Harpy modules
Some parts of Harpy (variant calling, imputation) want or need extra files. You can create various files necessary for different modules using these extra modules:
The arguments represent different sub-commands and can be run in any order or combination to generate the files you need.

## :icon-terminal: Other modules
| module | description |
|:---------------|:---------------------------------------------------------------------------------|
| `popgroup` | Create generic sample-group file using existing sample file names (fq.gz or bam) |
| `stitchparams` | Create template STITCH parameter file |
| `hpc` | Create HPC scheduling profile for cluster submission |

### popgroup
#### Sample grouping file for variant calling
##### arguments
- `-o`, `--output`: name of the output file
- `-d`, `--directory`: name of the directory of input files, either fastq or bam.

This file is entirely optional and useful if you want SNP variant calling to happen on a
per-population level via `harpy snp ... -p` or on samples pooled-as-populations via `harpy sv ... -p`.
- takes the format of sample\<tab\>group
- all the samples will be assigned to group `pop1` since file names don't always provide grouping information
- so make sure to edit the second column to reflect your data correctly.
- the file will look like:
```less popgroups.txt
sample1 pop1
sample2 pop1
sample3 pop2
sample4 pop1
sample5 pop3
```

### stitchparams
#### STITCH parameter file
##### arguments
- `-o`, `--output`: name of the output file

Typically, one runs STITCH multiple times, exploring how results vary with
different model parameters. The solution Harpy uses for this is to have the user
provide a tab-delimited dataframe file where the columns are the 6 STITCH model
parameters and the rows are the values for those parameters. To make formatting
easier, a template file is generated for you, just replace the values and add/remove
rows as necessary. See the [Imputation section](/Modules/impute.md) for details on these parameters.

### hpc
#### HPC cluster profile
!!!warning
HPC support is not yet natively integrated into Harpy. Until then, you can manually
use the [Snakemake HPC infrastructure](https://snakemake.readthedocs.io/en/stable/executing/cluster.html) with the `-s` flag.
!!!
##### arguments
- `-o`, `--output`: name of the output file
- `-s`, `--system`: name of the scheduling system
- options: `slurm` (more to come)

For snakemake to work in harmony with an HPC scheduler, a "profile" needs to
be provided that tells Snakemake how it needs to interact with the HPC scheduler
to submit your jobs to the cluster. Using `harpy hpc -s <hpc-type>` will create
the necessary folder and profile yaml file for you to use. To use the profile, call
the intended Harpy module with an additional ``--snakemake` argument:
```bash use the slurm profile
harpy module --option1 <value1> --option2 <value2> --snakemake "--profile slurm.profile"
```
2 changes: 1 addition & 1 deletion commonoptions.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ order: 4

Every Harpy module has a series of configuration parameters. These are arguments you need to input
to configure the module to run on your data, such as the directory with the reads/alignments,
the genome assembly, etc. All modules (except `extra`) also share a series of common runtime
the genome assembly, etc. All main modules (e.g. `qc`) also share a series of common runtime
parameters that don't impact the results of the module, but instead control the speed/verbosity/etc.
of calling the module. These runtime parameters are listed in the modules' help strings and can be
configured using these arguments:
Expand Down
10 changes: 5 additions & 5 deletions development.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ development and how to contribute to it, if you were inclined to do so.
Before we get into the technical details, you, dear reader, need to understand
why Harpy is the way it is. Harpy may be a pipeline for other software, but
there is a lot of extra stuff built in to make it user
friendly. Not just friendly, but _compassionate_. That means there is a lot
friendly. Not just friendly, but _compassionate_. The guiding ethos for Harpy is
**"We don't hate the user"**. That means there is a lot
of code that checks input files, runtime details, etc. to exit before
Snakemake takes over. This is done to minimize time wasted on minor
errors that only show their ugly heads 18 hours into a 96 hour process. With that in mind:
Expand Down Expand Up @@ -92,10 +93,9 @@ build script is also stored in `misc/meta.yml` and `misc/build.sh`. The yaml fil
is the metadata of the package, including software deps and their versions. The
build script is how conda will install all of Harpy's parts. In order to modify
these files for a new release, you need to fork `bioconda/bioconda-recipes`,
create a new branch, modify the Harpy `meta.yml` and `build.sh` files, then open
a pull request onto the `master` branch of `bioconda/bioconda-recipes`. There is
also an automation that submits a pull request on your behalf when you change the
version number.
create a new branch, modify the Harpy `meta.yml` (and possibly `build.sh`) files. Bioconda
has an bot that looks for changes to the version number in the `meta.yml` file
and will automatically submit a Pull Request when it notices that's been changed.

## The Harpy repository
### structure
Expand Down
47 changes: 25 additions & 22 deletions index.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,16 +35,17 @@ Great! Only want to call variants? Awesome! All modules are called by `harpy <mo

| Module | Description |
|:--------------|:----------------------------------------------|
| `extra` | Create various associated or necessary files |
| `preflight` | Run various format checks for FASTQ and BAM files |
| `demultiplex` | Demultiplex haplotagged FASTQ files |
| `qc` | Remove adapters and quality trim sequences |
| `qc` | Remove adapters and quality trim sequences |
| `align` | Align sample sequences to a reference genome |
| `snp` | Call SNPs and small indels |
| `snp` | Call SNPs and small indels |
| `sv` | Call large structural variants |
| `impute` | Impute genotypes using variants and sequences |
| `phase` | Phase SNPs into haplotypes |

| `popgroup` | Create a sample grouping file |
| `stitchparams` | Create a template STITCH parameter file |
| `hpc` | Create a config file to run Harpy on an HPC |

## Using Harpy
You can call `harpy` without any arguments (or with `--help`) to print the docstring to your terminal. You can likewise call any of the modules without arguments or with `--help` to see their usage (e.g. `harpy align --help`).
Expand All @@ -56,25 +57,27 @@ You can call `harpy` without any arguments (or with `--help`) to print the docst
reads, map sequences, call variants, impute genotypes, and
phase haplotypes of Haplotagging data. Batteries included.
demultiplex >> qc >> align >> snp >> impute >> phase
demultiplex >> qc >> align >> snp >> impute >> phase >> sv
Documentation: https://pdimens.github.io/harpy/
╭─ Options ───────────────────────────────────────────────────╮
│ --version Show the version and exit. │
│ --help -h Show this message and exit. │
╰─────────────────────────────────────────────────────────────╯
╭─ Modules ───────────────────────────────────────────────────╮
│ demultiplex Demultiplex haplotagged FASTQ files │
│ qc Remove adapters and quality trim sequences │
│ align Align sample sequences to a reference genome │
│ snp Call SNPs and small indels │
│ sv Call large structural variants │
│ impute Impute genotypes using variants and sequences │
│ phase Phase SNPs into haplotypes │
╰─────────────────────────────────────────────────────────────╯
╭─ Other Commands ────────────────────────────────────────────╮
│ preflight Run file format checks on haplotag data │
│ extra Create various optional/necessary input files │
╰─────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────╮
│ --version Show the version and exit. │
│ --help -h Show this message and exit. │
╰────────────────────────────────────────────────────────────╯
╭─ Modules ──────────────────────────────────────────────────╮
│ demultiplex Demultiplex haplotagged FASTQ files │
│ qc Remove adapters and quality trim sequences │
│ align Align sample sequences to a reference genome │
│ snp Call SNPs and small indels │
│ sv Call large structural variants │
│ impute Impute genotypes using variants and sequences │
│ phase Phase SNPs into haplotypes │
╰────────────────────────────────────────────────────────────╯
╭─ Other Commands ───────────────────────────────────────────╮
│ preflight Run file format checks on haplotag data │
│ popgroup Create a sample grouping file │
│ stitchparams Create a template STITCH parameter file │
│ hpc Create a config file to run Harpy on an HPC │
╰────────────────────────────────────────────────────────────╯
```
2 changes: 1 addition & 1 deletion snakemake.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ order: 2
# :icon-terminal: Adding Snakamake parameters
Harpy relies on Snakemake under the hood to handle file and job dependencies.
Most of these details have been abstracted away from the end-user, but every
module of Harpy (except `extra`) has an optional flag `-s` (`--snakemake`)
module of Harpy (except `hpc`, `popgroup`, and `stitchparams`) has an optional flag `-s` (`--snakemake`)
that you can use to augment the Snakemake workflow if necessary. Whenever you
use this flag, your argument must be enclosed in quotation marks, for example:
```bash
Expand Down
4 changes: 3 additions & 1 deletion software.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,13 @@ Issues with specific tools might warrant a discussion with the authors/developer
| LEVIATHAN | [website](https://github.com/morispi/LEVIATHAN) | [publication](https://doi.org/10.1101/2021.03.25.437002) |
| LRez | [website](https://github.com/morispi/LRez) | [publication](https://academic.oup.com/bioinformaticsadvances/article/1/1/vbab022/6375438?login=false) |
| mamba | [website](https://github.com/mamba-org/mamba) | |
| NAIBR | [website](https://github.com/raphael-group/NAIBR) + [fork](https://github.com/pontushojer/NAIBR) | [publication](https://doi.org/10.1093/bioinformatics/btx712) |
| NAIBR | [website](https://github.com/raphael-group/NAIBR) + [fork](https://github.com/pontushojer/NAIBR) | [publication](https://doi.org/10.1093/bioinformatics/btx712) |
| python | [website](https://www.python.org/) | |
| rich | [webiste](https://github.com/Textualize/rich) | |
| rich-click | [website](https://github.com/ewels/rich-click) | |
| sambamba | [website](https://github.com/biod/sambamba) | [publication](https://doi.org/10.1093/bioinformatics/btv098) |
| samtools | [website](http://www.htslib.org/) | |
| seqtk | [website](https://github.com/lh3/seqtk) | |
| Snakemake | [website](https://github.com/snakemake/snakemake) | [publication](https://f1000research.com/articles/10-33/v1) |
| STITCH | [website](https://github.com/rwdavies/STITCH) | [publication](https://doi.org/10.1038%2Fng.3594) |
| whatshap | [website](https://github.com/whatshap/whatshap) | [publication](https://doi.org/10.1101/085050) |
Binary file modified static/errormsg.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit d8d6467

Please sign in to comment.