Skip to content

Commit

Permalink
Carry over of @muffato changes to TreeVal
Browse files Browse the repository at this point in the history
  • Loading branch information
DLBPointon committed Sep 25, 2023
1 parent ed91e24 commit 8313724
Showing 1 changed file with 77 additions and 10 deletions.
87 changes: 77 additions & 10 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,33 +19,98 @@ The `--pacbio` should point to the folder containing `.fasta.gz` files.
If you do not have these file formats we have also included instructions on converting from common formats to our preferred format.
If there is a popular public preference for a particular format, we can modify the pipeline to utilise those formats. Just submit an issue.

#### HiC data Preparation
### HiC data Preparation

<details markdown="1">
<summary>Details</summary>

Illumina HiC read files should be presented in an unmapped CRAM format, each must be accompanied by an index file (.crai) generated by samtools index. If your unmapped HiC reads are in FASTQ format, you should first convert them to CRAM format by using samtools import methods. Examples are below:

##### Conversion of FASTQ to CRAM
#### Conversion of FASTQ to CRAM

```bash
```
samtools import -@8 -r ID:{prefix} -r CN:{hic-kit} -r PU:{prefix} -r SM:{sample_name} {prefix}_R1.fastq.gz {prefix}_R2.fastq.gz -o {prefix}.cram
```

##### Indexing of CRAM
#### Indexing of CRAM

```bash
```
samtools index {prefix}.cram
```

#### PacBio Data Preparation
</details>

### PacBio Data Preparation

<details markdown="1">
<summary>Details</summary>

Before running the pipeline data has to be in the `fasta.gz` format. Because of the software we use this data with it must also be long-read data as well as single stranded. This means you could use ONT too (except duplex reads).

The below commands should help you convert from mapped bam to fasta.gz, or from fastq to fasta.

Find information on this here: [PacBio Data Prep](pacbio.md)
If your data isn't already in these formats, then let us know and we'll see how we can help.

#### PreText Accessory ingestion
#### BAM -> FASTQ

This command iterates through your bam files and converts them to fastq via samtools.

```
cd { TO FOLDER OF BAM FILES }
mkdir fastq
for i in *bam
do
echo $i
j=${i%.bam}
echo $j
samtools bam2fq ${i} > fastq/${j}.fq
done
```

#### FASTQ -> FASTA

This command creates a `fasta` folder (to store our fasta files), moves into the `fastq` folder and then converts `fastq` to `fasta` using seqtk seq.

```
mkdir fasta
cd fastq
for i in *fq; do
echo $i
j=${i%.fq}
echo $j
seqtk seq -a $i > ../fasta/${j}.fasta
done
```

#### FASTA -> FASTA.GZ

This simply gzips the fasta files.

```
for i in .fasta; do
echo $i
gzip $i
done
```

#### Or if you're a command line ninja

```
samtools bam2fq {prefix}.bam| seqtk seq -a - | gzip - > {prefix}.fasta.gz
```

</details>

### Pretext Accessory File Ingestion

<details markdown="1">
<summary>Details</summary>

Note: This will require you to install bigwigToBedGraph from the ucsc package. Instructions on downloading this can be found at [EXAMPLE #3](https://genome.ucsc.edu/goldenPath/help/bigWig.html#:~:text=Alternatively%2C%20bigWig%20files%20can%20be,to%20the%20Genome%20Browser%20server.)

The PreText files generated by the pipeline are not automatically ingested into the pretext files. For this you must use the following code:

```bash
```
cd {outdir}/hic_files
bigWigToBedGraph {coverage.bigWig} /dev/stdout | PretextGraph -i { your.pretext } -n "coverage"
Expand All @@ -54,9 +119,11 @@ bigWigToBedGraph {repeat_density.bigWig} /dev/stdout | PretextGraph -i { your.pr
cat {telomere.bedgraph} | awk -v OFS="\t" '{$4 = 1000; print}'|PretextGraph -i { your.pretext } -n "telomere"
cat {gap.bedgraph} | awk -v OFS="\t" '{$4= 1000; print}'| PretextGraph -i { your.pretext } -n "gap"
cat {gap.bedgraph} | awk -v OFS="\t" '{$4= 1000; print}'| PretextGraph -i { your.pretext } -n "gap"
```

</details>

## Running the pipeline

The typical command for running the pipeline is as follows:
Expand Down

0 comments on commit 8313724

Please sign in to comment.