diff --git a/404.html b/404.html index 1fa5a0df3..0f77cd4b9 100644 --- a/404.html +++ b/404.html @@ -4,7 +4,7 @@ - + @@ -29,11 +29,11 @@ - + - + - +
Every Harpy module has a series of configuration parameters. These are arguments you need to input
to configure the module to run on your data, such as the directory with the reads/alignments,
the genome assembly, etc. All main modules (e.g. qc
) also share a series of common runtime
@@ -258,6 +264,22 @@
--print-only
--skipreports
-r
--snakemake
-s
workflow
folder
+ When you run one of the main Harpy modules, the output directory will contain a workflow
folder. This folder is
+both necessary for the module to run and is very useful to understand what the module did, be it for your own
+understanding or as a point of reference when writing the Methods within a manuscript. The presence of the folder
+and the contents therein also allow you to rerun the workflow manually. The workflow
folder may contain the following:
Align/bwa
├── Sample1.bam
├── Sample1.bam.bai
-├── align
-│ ├── Sample1.bam
-│ └── Sample1.bam.bai
├── logs
-│ ├── harpy.align.log
│ └── markduplicates
│ └── Sample1.markdup.log
-└── stats
+└── reports
├── bwa.stats.html
├── BXstats
│ ├── Sample1.bxstats.html
@@ -442,47 +438,39 @@
sequence alignment indexes for each sample
align/*bam*
logs/harpy.align.log
logs/markduplicates
sambamba markdup
writes to stderr
during operationstats/
reports/
stats/bwa.stats.html
reports/bwa.stats.html
samtools flagstat and stats
results across all samples from multiqc
stats/reads.bxstats.html
reports/reads.bxstats.html
stats/BXstats/*.bxstats.html
reports/BXstats/*.bxstats.html
stats/coverage/*.html
reports/coverage/*.html
stats/coverage/data/*.gencov.gz
reports/coverage/data/*.gencov.gz
stats/BXstats/
reports/BXstats/
stats/BXstats/data/
reports/BXstats/data/
--platform
-p
haplotag
or 10x
--whitelist
-w
--platform 10x
only)--directory
-d
--molecule-distance
-m
--ema-bins
-e
Unlike the manual MI:i
assignment in the BWA workflow, the EMA aligner will assign
-a unique Molecular Identifier MI:i
tag to alignments using its own heuristics.
-Instead, the EMA workflow uses this value to calculate statistics for the haplotag
-barcodes identified in the alignments.
Some linked-read methods (e.g. 10x, Tellseq) require the inclusion of a barcode "whitelist." This file is a +simple text file that has one barcode per line so a given software knows what barcodes to expect in your data. +If you need to process 10x data, then you will need to include the whitelist file (usually provided by 10x). +Conveniently, haplotag data doesn't require this file.
Align/ema
├── Sample1.bam
├── Sample1.bam.bai
-├── align
-│ ├── Sample1.bam
-│ └── Sample1.bam.bai
├── count
│ └── Sample1.ema-ncnt
├── logs
-│ ├── harpy.align.log
│ ├── markduplicates
│ │ └── Sample1.markdup.nobarcode.log
│ └── preproc
│ └── Sample1.preproc.log
-└── stats
+└── reports
├── ema.stats.html
├── reads.bxcounts.html
├── BXstats
@@ -467,18 +471,10 @@
sequence alignment indexes for each sample
align/*bam*
count/
ema count
logs/harpy.align.log
logs/markduplicates/
sambamba markdup
writes to stderr
during operation on alignments with invalid/missing barcodesema preproc
writes to stderr
during operationstats/
reports/
stats/ema.stats.html
reports/ema.stats.html
samtools flagstat and stats
results across all samples from multiqc
stats/reads.bxstats.html
reports/reads.bxstats.html
ema count
across all samplesstats/coverage/*.html
reports/coverage/*.html
stats/coverage/data/*.all.gencov.gz
reports/coverage/data/*.all.gencov.gz
stats/coverage/data/*.bx.gencov.gz
reports/coverage/data/*.bx.gencov.gz
stats/BXstats/
reports/BXstats/
stats/BXstats/*.bxstats.html
reports/BXstats/*.bxstats.html
stats/BXstats/data/
reports/BXstats/data/
--vcf-samples
--directory
-d
--extra-params
-x
--vcf-samples
--parameters
-p
You may add additional parameters to STITCH by way of the
+--extra-params
(or -x
) option. Since STITCH is a function in the R language, the parameters you add must be in R
+syntax (e.g. regionStart=0
, populations=c("GBA","CUE")
). The argument should be wrapped in quotes (like in other Harpy modules),
+however, if your additional parameters require the use of quotes (like the previous example), then wrap the -x
argument
+in single quotes. Otherwise, the format should take the form of "arg1=value, arg2=value2"
. Example:
harpy impute -v file.vcf -p stitch.params -t 15 -x 'regionStart=20, regionEnd=500'
+linkFragments
prints to stderr
logs/harpy.phase.log
reports/blocks.summary.gz
fastp
createslogs/harpy.trim.log
logs/err
stderr
when running#
for Harpy to ignore themharpy extra -p <samplefolder>
or manuallyharpy extra -p
, all the samples will be assigned to group pop1
, so make sure to edit the second column to reflect your data correctly.harpy extra popgroup -d <samplefolder>
or manuallyharpy extra popgroup
, all the samples will be assigned to group pop1
, so make sure to edit the second column to reflect your data correctly.The harpy variants snp
module creates a Variants/METHOD
directory with the folder structure below where METHOD
is what
+
The harpy snp
module creates a Variants/METHOD
directory with the folder structure below where METHOD
is what
you specify as the --method
(mpileup or freebayes). contig1
and contig2
are generic contig names from an imaginary
genome.fasta
for demonstration purposes.
bcftools mpileup
or freebayes
writes to stderr
logs/harpy.variants.log
logs/sample.groups
--populations
with commented lines removedstats/*.stats
reports/*.stats
bcftools stats
stats/variants.*.html
reports/variants.*.html
#
for Harpy to ignore themharpy extra -p <samplefolder>
or manuallyharpy extra -p
, all the samples will be assigned to group pop1
+harpy extra popgroup -d <samplefolder>
or manuallyharpy extra popgroup
, all the samples will be assigned to group pop1
--populations
with commented lines removedlogs/*.leviathan.log
stderr
during operationlogs/*candidates
#
for Harpy to ignore themharpy extra -p <samplefolder>
or manuallyharpy extra -p
, all the samples will be assigned to group pop1
+harpy extra popgroup -d <samplefolder>
or manuallyharpy extra popgroup
, all the samples will be assigned to group pop1
The harpy variants --method naibr
module creates a Variants/naibr
(or naibr-pop
)
+
The harpy sv --method naibr
module creates a Variants/naibr
(or naibr-pop
)
directory with the folder structure below. sample1
and sample2
are generic sample
names for demonstration purposes.
*.bedpe
| structural variants identified by NAIBR |
| configs/
| the configuration files harpy generated for each sample |
| filtered/
| the variants that failed NAIBR's internal filters |
-| IGV/
| same as the output .bedpefiles but in IGV format | |
logs/harpy.variants.log| relevant runtime parameters for the variants module | |
logs/sample.groups | if provided, a copy of the file provided to
--populationswith commented lines removed | |
logs/*.log | what NAIBR writes to
stderrduring operation | |
reports/ | summary reports with interactive plots of detected SV | |
vcf/ | the resulting variants, but in
.VCF` format |
+| IGV/
| same as the output .bedpefiles but in IGV format | |
logs/sample.groups | if provided, a copy of the file provided to
--populationswith commented lines removed | |
logs/*.log | what NAIBR writes to
stderrduring operation | |
reports/ | summary reports with interactive plots of detected SV | |
vcf/ | the resulting variants, but in
.VCF` format |
--directory
--cores
--snakefile
--config
--configfile
--rerun-incomplete
--nolock
--use-conda
--conda-prefix
harpy variants snp -g genome.fasta -d Align/ema -s "--dry-run"
+harpy snp mpileup -g genome.fasta -d Align/ema -s "--dry-run"
EMA count
. To do this, just list the file/files (relative
to your working directory) without any flags. Example for the beadtag report:
harpy align -g genome.fasta -d QC/ -t 4 -s "Align/ema/stats/reads.bxstats.html"
+harpy align bwa -g genome.fasta -d QC/ -t 4 -s "Align/ema/stats/reads.bxstats.html"
This of course necessitates knowing the names of the files ahead of time. See the individual modules for a breakdown of expected outputs.
--shadow-prefix <dirname>
where <dirname>
is the path to the mandatory directory you need to work out of. By
configuring this "shadow directory" setting, Snakemake will automatically move the files in/out of that directory for you:
harpy variants sv --method leviathan -g genome.fasta -d Align/bwa --threads 8 -p samples.groups -s "--shadow-prefix /SCRATCH/username/"
+harpy sv leviathan -g genome.fasta -d Align/bwa --threads 8 -p samples.groups -s "--shadow-prefix /SCRATCH/username/"