Skip to content

Commit

Permalink
Update README
Browse files Browse the repository at this point in the history
  • Loading branch information
hongduosun committed Mar 28, 2018
1 parent 7ec5b95 commit e2d3d8f
Showing 1 changed file with 65 additions and 56 deletions.
121 changes: 65 additions & 56 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,127 +14,136 @@ MAmotif

Introduction
------------
MAmotif is used to compare two ChIP-seq samples of the same protein from different cell types
(or conditions, e.g. wild-type vs mutant) and identify transcriptional factors (TFs) associated
with the cell type-biased binding of this protein as its co-factors, by using TF binding information obtained from
motif analysis (or from other ChIP-seq data). MAmotif automatically combines MAnorm model to perform quantitative
comparison on input ChIP-seq samples together with Motif-Scan toolkit to scan ChIP-seq peaks for TF binding motifs,
and uses a systematic integrative analysis to search for TFs whose binding sites are significantly associated with
the cell type-biased peaks between two ChIP-seq samples. When applying to ChIP-seq data of histone marks of
regulatory elements (such as H3K4me3 for active promoters and H3K9/27ac for active promoters and enhancers),
or DNase/ATAC-seq data, MAmotif can be used to detect cell type-specific regulators .

**MAmotif** is used to compare two ChIP-seq samples of the same protein from different cell types or conditions
(e.g. Mutant vs Wild-type) and **identify transcriptional factors (TFs) associated with the cell-type biased binding**
of this protein as its **co-factors**, by using TF binding information obtained from motif analysis
(or from other ChIP-seq data).

Documentation
-------------
MAmotif automatically combines **MAnorm** model to perform quantitative comparison on given ChIP-seq samples together
with Motif-Scan toolkit to scan ChIP-seq peaks for **TF binding motifs**, and uses a systematic integrative analysis to
search for TFs whose binding sites are significantly associated with the cell-type biased peaks between two ChIP-seq samples.

To see the full documentation of MAmoitf, please refer to: http://bioinfo.sibs.ac.cn/shaolab/mamotif/index.php
When applying to ChIP-seq data of histone marks of regulatory elements (such as H3K4me3 for active promoters and
H3K9/27ac for active promoter/enhancers), or DNase/ATAC-seq data, MAmotif can be used to detect **cell-type specific regulators**.

Workflow
--------

.. image:: https://github.com/shao-lab/MAmotif/blob/master/docs/source/image/MAmotif_workflow.png

Documentation
-------------

To see the full documentation of MAmoitf, please refer to: http://bioinfo.sibs.ac.cn/shaolab/mamotif/index.php

Installation
------------

The latest version release of MAmotif is available at
`PyPI <https://pypi.python.org/pypi/mamotif>`__:
The latest release of MAmotif is available at `PyPI <https://pypi.python.org/pypi/mamotif>`__:

::

$ pip install mamotif

MAmoitf uses `setuptools <https://setuptools.readthedocs.io/en/latest/>`__ for installation from source code.
The source code of MAmoitf is hosted on GitHub: https://github.com/shao-lab/MAmotif
Or you can install MAmotif via conda:

You can clone the repo and execute the following command under source directory:
**WIP!**

::

$ python setup.py install
$ conda install -c bioconda mamotif

Usage
-----

Build genomes
^^^^^^^^^^^^^
MAmotif uses `setuptools <https://setuptools.readthedocs.io/en/latest/>`__ for installation from source code.
The source code of MAmotif is hosted on GitHub: https://github.com/shao-lab/MAmotif

Before you use MAmotif, you need to build the prerequisites for corresponding genome assembly.
You can clone the repo and execute the following command under source directory:

::

$ genomecompile [-h] [-v] -G sequences.fa -o output_dir
$ python setup.py install

A directory contaning compiled genome sequence and information would be generated by this command.
Galaxy Installation
-------------------

**Note:** You only need run it once for each genome.
**WIP!**

Build motif PWM (Optional)
^^^^^^^^^^^^^^^^^^^^^^^^^^

**Note:** MAmoitf provides some preprocessed motif PWM files under data/motif of the MotifScan package.
Usage
-----

You need to build some prerequisites before running MAmotif:

Build genomes
^^^^^^^^^^^^^

IF you have some motifs that have not be included in our pre-complied motif collection, you need to compile on your own by using the following command.
Preprocess sequences and genome-wide nucleotide frequency for the corresponding genome assembly.

::

$ motifcompile –M motif_pwm_demo.txt –g hg19_for_motifscan
$ genomecompile [-h] [-v] -G hg19.fa -o hg19_genome

-M motif raw matrix file
**Note:** You only need to run this command once for each genome

-g a pre-compiled genome directory generated by genomecompile
Build motifs (Optional)
^^^^^^^^^^^^^^^^^^^^^^^

Motif raw matrix file should follow the format as below:
**Note:** MAmotif provides some preprocessed motif PWM files under **data/motif** of the MotifScan package.

motif id and motif name are followed by a positive weighted matrix, and columns are seperated by tabs.
Build motif PWM/motif-score cutoff for custom motifs that are not included in our pre-complied motif collection:

::

>MA0599.1 KLF5
1429 0 0 3477 0 5051 0 0 0 3915
2023 11900 12008 9569 13611 0 13611 13611 13135 5595
7572 0 0 0 0 5182 0 0 0 0
2587 1711 1603 565 0 3378 0 0 476 4101
$ motifcompile [-h] [-v] –M motif_pwm_demo.txt –g hg19_genome -o hg19_motif

run MAmotif
^^^^^^^^^^^

::

$ mamoitf --p1 sample1_peaks.bed --p2 sample2_peaks.bed --r1 sample1_reads.bed --r2 sample2_reads.bed -g hg19_for_motifscan –m motif_pwm_demo.txt -o sample1_vs_sample2
$ mamotif --p1 sample1_peaks.bed --p2 sample2_peaks.bed --r1 sample1_reads.bed --r2 sample2_reads.bed -g hg19_genome
–m hg19_motif_p1e-4.txt -o sample1_vs_sample2

**Note:** Using -h/--help for the details of all arguments.


Output of MAmotif
-----------------

After finished running MAmotif, all output files will be written to the directory you specified with "-o" argument.

The main output file will include the following fields:
Main output
^^^^^^^^^^^

::

1.Motif Name
2.Target Number: Number of peaks with motif targets
3.Average of Target M-value
4.Deviation of Target M-value
5.Non-target Number: Number of peaks without motif targets
6.Average of Non-target M-value
7.Deviation of Non-target M-value
8.T-test Statistics: T-Statistics for M-values of (peaks with motif targets) against (peaks without motif targets)
9.T-test P-value(right-tail)
2.Target Number: Number of motif-present peaks
3.Average of Target M-value: Average M-value of motif-present peaks
4.Deviation of Target M-value: M-value Std of motif-present peaks
5.Non-target Number: Number of motif-absent peaks
6.Average of Non-target M-value: Average M-value of motif-absent peaks
7.Deviation of Non-target M-value: M-value Std of motif-absent peaks
8.T-test Statistics: T-Statistics for M-values of motif-present peaks against motif-absent peaks
9.T-test P-value: Right-tailed P-value of T-test
10.T-test P-value By Benjamin correction
11.RanSum-test Statistics
12.RankSum-test P-value(right-tail)
12.RankSum-test P-value
13.RankSum-test P-value By Benjamin correction
14.Maximal P-value: Maximal corrected P-value of T-test and RankSum test
14.Maximal P-value: Maximal corrected P-value of T-test and RankSum-test

MAnorm output
^^^^^^^^^^^^^

MAmotif will invoke MAnorm and output the normalization results and MA-plot for samples under comparison.


Motif output
^^^^^^^^^^^^

MAmotif will also output tables to summarize the motif targets number and motif score of each peak region.
MAmotif will also output tables to summarize the enrichment of motifs and the motif target number and motif-score
of each peak region.

If you specified "-s" with MAmotif, it will also output the genome coordinates of every motif targets.
If you specified "-s" with MAmotif, it will also output the genome coordinates of every motif target site.


License
Expand Down

0 comments on commit e2d3d8f

Please sign in to comment.