Hi-C processing pipeline

About the pipeline

This pipeline is heavily inspired by the juicer pipeline. For any information about the internal workings of the pipeline, please check their repository. You can see this pipeline as a wrapper around that one.

The main usage of this pipeline is to run the analysis of multiple Hi-C samples and organize their results in a coherent way.

Using a pre-made setup

If the pipeline was already installed in your system, simply run the following:

install_hic_pipeline.sh

If requested, logout from the system, login again, and run again install_hic_pipeline.sh to complete the setup. If not, move to the next step.

Testing the pipeline

Test that the pipeline works by running:

run_hic_pipeline.sh test_input.csv

and

run_merge_hic_pipeline.sh test_mega.csv

where test_input.csv and test_mega.csv are sample files which can be downloaded from this Github repository. Remember to modify accordingly the genome_sequence and chromsizes fields to point to your juicer installation.

1) Processing single Hi-C samples

This is the main step of the pipeline.

./run_hic_pipeline.sh <input-samples.csv>

Input format

The run_hic_pipeline.sh script accepts a .csv file (with header) as input having one line for each sample to be processed. The required columns are:

sample_path: path to the sample results
raw_path: path to the sample fastq files. Fastq files are assumed to be paired-ended and located in the same folder. Read1 and Read2 are denoted by _R1_ and _R2_ inside the file names.
restriction_enzyme: which restriction enzyme to use (MboI, HindIII, Arima, etc...)
genome_assembly: genome assembly (hg19, mm10, etc...)
genome_sequence (OPTIONAL): path to the fasta file for the reference genome
chromsizes (OPTIONAL): path to the chromosome sizes file for the reference genome

You can check the test_input.csv file for reference.

Reference genomes

The genome_assembly column of the input file should match one of the available genome assemblies that are available in your system. Additionally, the REFERENCES_PATH environment path should be defined in your system and should point to the location of all the available referneces. In the case in which this variable is not defined, you have to manually specify genome_sequence and chromsizes fields in the input file.

Steps

The pipeline will run the following analyses:

Fastq quality control with fastqc
Read alignment and .hic file generation using juicer
Conversion of .hic files to .mcool files for compatibility with cooler format

2) Aggregating replicates into mega-maps

If you have multiple replicates of the same experiment, most likely you will want to merge them in a single file, to improve data depth and following analyses. To do that you can run:

./run_merge_hic_pipeline.sh <mega-samples.csv>

Input format

The run_merge_hic_pipeline.sh script accepts a .csv file (with header) as input having one line for each aggregated hi-c map. The required columns are:

sample_path: path to the aggregated map results
restriction_enzyme: which restriction enzyme to use (MboI, HindIII, Arima, etc...). Notice that this implies that you cannot merge Hi-C samples which have been generated by different restriction enzymes
genome_assembly: genome assembly (hg19, mm10, etc...). For obvious reasons, you cannot merge Hi-C samples which have been generated by different genome assemblies
replicate_paths: paths to the replicate sample results (generated by the previous step), separated by colon (:)
chromsizes (OPTIONAL): path to the chromosome sizes file for the reference genome. Same reasoning as previous column.

You can check the test_mega.csv file for reference.

Same rules apply to the genome_assembly fiels as for the input file for the single Hi-C processing (see above).

Installing the pipeline from scratch

Clone the repository and enter in the folder:

git clone https://github.com/CSOgroup/hic_pipeline.git
cd hic_pipeline

Install the dependencies using conda/mamba, creating a new environment (hic_pipeline):

./install_hic_pipeline.sh

If requested, logout from the system, login again, and run again install_hic_pipeline.sh to complete the setup. If not, move to the next step.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
src		src
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
install_hic_pipeline.sh		install_hic_pipeline.sh
run_hic_pipeline.sh		run_hic_pipeline.sh
run_merge_hic_pipeline.sh		run_merge_hic_pipeline.sh
test_input.csv		test_input.csv
test_mega.csv		test_mega.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hi-C processing pipeline

About the pipeline

Using a pre-made setup

Testing the pipeline

1) Processing single Hi-C samples

Input format

Reference genomes

Steps

2) Aggregating replicates into mega-maps

Input format

Installing the pipeline from scratch

About

Releases

Packages

Languages

CSOgroup/hic_pipeline

Folders and files

Latest commit

History

Repository files navigation

Hi-C processing pipeline

About the pipeline

Using a pre-made setup

Testing the pipeline

1) Processing single Hi-C samples

Input format

Reference genomes

Steps

2) Aggregating replicates into mega-maps

Input format

Installing the pipeline from scratch

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages