In-silico experiment pipeline

The pipeline uses two types of configuration files:

config.json - general parameters such as run name, models to use, number of replicates
tabular config - each row represents an in-silico mixture. The tabular config files used to create paper figures can be found under "resources".

In-house packages required to run this pipeline:

epiread-tools
deconvolution_models (should automatically install epiread-tools)
bimodal_detector, for Epistate/UXM.

Deconvolution is based on two inputs: a reference atlas and a mixture sample. The reference atlas contains information on known cell types. The mixture contains unknown proportions of the reference cell types. Deconvolution models will use the atlas to determine proportions in the mixture.

The in-silico experiments do not use the entire genome. Rather, they use a defined list of genomic intervals, hereby referred to as "regions". These regions should be differentially methylated across the reference cell types to enable deconvolution. There are many approaches to selecting an appropriate set of regions. Here, if no user regions are supplied, Tissue Informative Markers are called.

Regions file should be BED-formatted and have no header. Regions may overlap. If they do - the overlapping section will be read twice. If this is not the desired behaviour, use bedtools merge to avoid overlap.

For CelFiE and CelFiE-ISH, the atlas format should be METH, COV, as in CelFiE. A small pipeline to create such an atlas from bedgraph files is provided here. The atlas should include individual CpGs (this is the input for CelFiE-ISH). Summing of methylation and coverage per region for CelFiE happens in deconvolution_models.

For the Epistate model, an epistate atlas is required. This is done with bimodal_detector.

Whole-genome files are required to call TIM regions. If user-supplied regions are used, the atlas and mixture files may only cover these regions. The same is true for the UXM %U atlas.

Input	No atlas	User atlas
No regions	- call atlas from bedgraph - call TIMs	- call TIMs
User regions	- call atlas from epiread	- proceed to deconvolution

Regions and atlases used to create paper figures can be found under "resources". As mixtures are randomly generated, slight variation is to be expected.

Additional files:

CpG coordinates file - this file contains all CpG coordinates (chromosome, start, end). Must be sorted (sort -k1,1 -k2,2n). If atlas and regions are supplied, this may only include CpGs in and around regions (relevant if read only partially overlaps region).
Include list - only if atlas not supplied. Used to filter out low quality genomic regions.
Genome file - if regions not supplied. Used for slopping around TIMs.

Usage

Clone the repository, adjust config.json as necessary. From the project root simply run

snakemake --cores 1

Output files will be created under "results".

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
config		config
resources/params		resources/params
workflow		workflow
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
rulegraph_no_regions_no_atlas.svg		rulegraph_no_regions_no_atlas.svg
rulegraph_no_regions_with_atlas.svg		rulegraph_no_regions_with_atlas.svg
rulegraph_with_regions_no_atlas.svg		rulegraph_with_regions_no_atlas.svg
rulegraph_with_regions_with_atlas.svg		rulegraph_with_regions_with_atlas.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

In-silico experiment pipeline

Usage

About

Releases

Packages

Contributors 2

Languages

License

methylgrammarlab/deconvolution_in_silico_pipeline

Folders and files

Latest commit

History

Repository files navigation

In-silico experiment pipeline

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages