Skip to content

Commit

Permalink
update to 1.14.1
Browse files Browse the repository at this point in the history
  • Loading branch information
pdimens committed Dec 11, 2024
1 parent c001d6a commit 44247a9
Showing 1 changed file with 19 additions and 10 deletions.
29 changes: 19 additions & 10 deletions Workflows/other.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,13 @@
---
label: Other
icon: file-diff
icon: ellipsis
description: Generate extra files for analysis with Harpy
order: 7
---

# :icon-file-diff: Other Harpy modules
On this page you'll find Harpy functions that aren't standalone workflows. These may create ancillary inputs, continue where you left off,
view important workflow files, etc.
# :icon-ellipsis: Other Harpy modules
On this page you'll find Harpy functions that do other, ancillary things.

## :icon-terminal: Other modules
{.compact}
| module | description |
| :------------- | :------------------------------------------------------------------------------- |
Expand All @@ -24,15 +22,26 @@ view important workflow files, etc.
### downsample
While downsampling (subsampling) FASTQ and BAM files is relatively simple with tools such as `awk`, `samtools`, `seqtk`, `seqkit`, etc.,
Harpy offers the `downsample` module, which allows you to downsample a BAM file (or paired-end FASTQ) _by barcodes_. That means you can
keep all the reads associated with `d` number of barcodes.
keep all the reads associated with `d` number of barcodes. First, barcodes are extracted, then subsampled, then the reads associated
with those barcodes are extracted. The `--invalid` proportion will determine what proportion of invalid barcodes appear in the barcode
pool that gets subsampled, where `0` is none, `1` is all invalid barcodes, and a number in between is that proportion, e.g. `0.5` is half.
Bear in mind that the barcode pool still gets subsampled, so the `--invalid` proportion doesn't necessarily reflect how many end up getting
sampled, rather what proportion will be considered for sampling.

!!! Barcode tag
Barcodes must be in the `BX:Z` SAM tag for both BAM and FASTQ inputs. See [Section 1 of the SAM Spec here](https://samtools.github.io/hts-specs/SAMtags.pdf).
!!!

```bash usage
# a BAM file
harpy downsample OPTIONS... INPUT(S)...
```

```bash example
harpy downsample -d 1000 -i drop -b BC -p sample1.sub1000
# BAM file
harpy downsample -d 1000 -i 0.3 -p sample1.sub1000 sample1.bam

# FASTQ file
harpy downsample -d 1000 -i 0 -p sample1.sub1000 sample1.F.fq.gz sample1.R.fq.gz
```

#### arguments
Expand All @@ -41,10 +50,10 @@ harpy downsample -d 1000 -i drop -b BC -p sample1.sub1000
| :-------------- | :--------: | :-----------: | :-------------------------------------------------------------------------------------------------------------------------------- |
| `INPUT(S)` | | | [!badge variant="info" text="required"] One BAM file or both read files from a paired-end FASTQ pair |
| `--downsample` | `-d` | | [!badge variant="info" text="required"] Number of barcodes to downsample to |
| `--invalid` | `-i` | `keep` | Strategy to handle invalid/missing barcodes [`keep`,`drop`] |
| `--bx-tag` | `-b` | `BX` | The header tag with the barcode [!badge variant="secondary" text="alphanumeric"] [!badge variant="secondary" text="2 characters"] |
| `--invalid` | `-i` | 1 | Proportion of barcodes to sample |
| `--prefix` | `-p` | `downsampled` | Prefix for output files |
| `--random-seed` | | | Random seed for sampling [!badge variant="secondary" text="optional"] |
| `--snakemake` | | | Additional Snakemake arguments, in quotes |
| `--threads` | `-t` | `4` | Number of threads to use |
| `--quiet` | | | Don't show output text while running |

Expand Down

0 comments on commit 44247a9

Please sign in to comment.