Carry over of @muffato changes to TreeVal

sanger-tol · Sep 25, 2023 · 8313724 · 8313724
1 parent ed91e24
commit 8313724
Showing 1 changed file with 77 additions and 10 deletions.
diff --git a/docs/usage.md b/docs/usage.md
@@ -19,33 +19,98 @@ The `--pacbio` should point to the folder containing `.fasta.gz` files.
 If you do not have these file formats we have also included instructions on converting from common formats to our preferred format.
 If there is a popular public preference for a particular format, we can modify the pipeline to utilise those formats. Just submit an issue.
 
-#### HiC data Preparation
+### HiC data Preparation
+
+<details markdown="1">
+  <summary>Details</summary>
 
 Illumina HiC read files should be presented in an unmapped CRAM format, each must be accompanied by an index file (.crai) generated by samtools index. If your unmapped HiC reads are in FASTQ format, you should first convert them to CRAM format by using samtools import methods. Examples are below:
 
-##### Conversion of FASTQ to CRAM
+#### Conversion of FASTQ to CRAM
 
-```bash
+```
 samtools import -@8 -r ID:{prefix} -r CN:{hic-kit} -r PU:{prefix} -r SM:{sample_name} {prefix}_R1.fastq.gz {prefix}_R2.fastq.gz -o {prefix}.cram
 ```
 
-##### Indexing of CRAM
+#### Indexing of CRAM
 
-```bash
+```
 samtools index {prefix}.cram
 ```
 
-#### PacBio Data Preparation
+</details>
+
+### PacBio Data Preparation
+
+<details markdown="1">
+  <summary>Details</summary>
+
+Before running the pipeline data has to be in the `fasta.gz` format. Because of the software we use this data with it must also be long-read data as well as single stranded. This means you could use ONT too (except duplex reads).
+
+The below commands should help you convert from mapped bam to fasta.gz, or from fastq to fasta.
 
-Find information on this here: [PacBio Data Prep](pacbio.md)
+If your data isn't already in these formats, then let us know and we'll see how we can help.
 
-#### PreText Accessory ingestion
+#### BAM -> FASTQ
+
+This command iterates through your bam files and converts them to fastq via samtools.
+
+```
+cd { TO FOLDER OF BAM FILES }
+mkdir fastq
+for i in *bam
+do
+  echo $i
+  j=${i%.bam}
+  echo $j
+  samtools bam2fq ${i} > fastq/${j}.fq
+done
+```
+
+#### FASTQ -> FASTA
+
+This command creates a `fasta` folder (to store our fasta files), moves into the `fastq` folder and then converts `fastq` to `fasta` using seqtk seq.
+
+```
+mkdir fasta
+cd fastq
+for i in *fq; do
+  echo $i
+  j=${i%.fq}
+  echo $j
+  seqtk seq -a $i > ../fasta/${j}.fasta
+done
+```
+
+#### FASTA -> FASTA.GZ
+
+This simply gzips the fasta files.
+
+```
+for i in .fasta; do
+  echo $i
+  gzip $i
+done
+```
+
+#### Or if you're a command line ninja
+
+```
+samtools bam2fq {prefix}.bam| seqtk seq -a - | gzip - > {prefix}.fasta.gz
+```
+
+</details>
+
+### Pretext Accessory File Ingestion
+
+<details markdown="1">
+  <summary>Details</summary>
 
 Note: This will require you to install bigwigToBedGraph from the ucsc package. Instructions on downloading this can be found at [EXAMPLE #3](https://genome.ucsc.edu/goldenPath/help/bigWig.html#:~:text=Alternatively%2C%20bigWig%20files%20can%20be,to%20the%20Genome%20Browser%20server.)
 
 The PreText files generated by the pipeline are not automatically ingested into the pretext files. For this you must use the following code:
 
-```bash
+```
 cd {outdir}/hic_files
 
 bigWigToBedGraph {coverage.bigWig} /dev/stdout | PretextGraph -i { your.pretext } -n "coverage"
@@ -54,9 +119,11 @@ bigWigToBedGraph {repeat_density.bigWig} /dev/stdout | PretextGraph -i { your.pr
 
 cat {telomere.bedgraph} | awk -v OFS="\t" '{$4 = 1000; print}'|PretextGraph -i { your.pretext } -n "telomere"
 
-cat {gap.bedgraph} | awk -v OFS="\t" '{$4= 1000; print}'| PretextGraph -i  { your.pretext } -n "gap"
+cat {gap.bedgraph} | awk -v OFS="\t" '{$4= 1000; print}'| PretextGraph -i { your.pretext } -n "gap"
 ```
 
+</details>
+
 ## Running the pipeline
 
 The typical command for running the pipeline is as follows: