Skip to content

Commit

Permalink
Update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
hongduosun committed Mar 28, 2018
1 parent 6d147cb commit 9a50717
Show file tree
Hide file tree
Showing 3 changed files with 355 additions and 4 deletions.
6 changes: 3 additions & 3 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Workflow
Documentation
-------------

To see the full documentation of MAmoitf, please refer to: http://bioinfo.sibs.ac.cn/shaolab/mamotif/index.php
To see the full documentation of MAmotif, please refer to: http://mamotif.readthedocs.io/en/latest/

Installation
------------
Expand Down Expand Up @@ -137,8 +137,8 @@ MAnorm output
MAmotif will invoke MAnorm and output the normalization results and MA-plot for samples under comparison.


Motif output
^^^^^^^^^^^^
MotifScan output
^^^^^^^^^^^^^^^^

MAmotif will also output tables to summarize the enrichment of motifs and the motif target number and motif-score
of each peak region.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ MAmotif
.. image:: https://travis-ci.org/shao-lab/MAmotif.svg?branch=master
:alt: Travis Build
:target: https://travis-ci.org/shao-lab/MAmotif
.. image:: https://readthedocs.org/projects/mamoitf/badge/?version=latest
.. image:: https://readthedocs.org/projects/mamotif/badge/?version=latest
:alt: Documentation Status
:target: http://mamotif.readthedocs.io/en/latest/?badge=latest
.. image:: https://img.shields.io/pypi/v/mamotif.svg
Expand Down
351 changes: 351 additions & 0 deletions docs/source/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,354 @@ Tutorial

Installation
============

Like many other Python packages and bioinformatics softwares, MAmotif can be obtained easily from PyPI_ or Bioconda_(WIP).
The command below shows how to install the latest release of MAmotif in a convenient way, but you can also install it
from source code alternatively.

Prerequisites
-------------

.. tip::
MAmotif is implemented under **Python 2.7** and will support **Python 3.X** in the following updates.

* **Python 2.7**
* setuptools
* numpy
* pandas
* statsmodels
* scipy
* matplotlib

Install with pip
----------------
The latest release of MAmotif is available at PyPI_, you can install via ``pip``::

$ pip install mamotif

.. _PyPI: https://pypi.python.org/pypi/MAmotif

Install with conda (WIP)
------------------------

You can also install MAmotif with conda_ through Bioconda_ channel::

$ conda install -c bioconda mamotif

.. _conda: https://conda.io/docs/
.. _Bioconda: https://bioconda.github.io/

Install from source code
------------------------

It's highly recommended to install MAmotif with ``pip`` or ``conda``. If you prefer to install it from source code,
please read the following steps:

The source code of MAmotif is hosted on GitHub_, and setuptools_ is required for installation.

.. _setuptools: https://setuptools.readthedocs.io/en/latest/
.. _GitHub: https://github.com/shao-lab/MAmotif

First, clone the repository of MAmotif::

$ git clone https://github.com/shao-lab/MAmotif.git

Then, install MAmotif in the source directory::

$ cd MAmotif
$ python setup.py install

.. note::
* You may need to install all dependencies listed in ``requirements.txt``.
* You may need to modify ``$PATH`` and ``$PYTHONPATH`` manually to make it work.

Galaxy Installation
-------------------

WIP

Usage of MAmotif
================

To check whether MAmotif is properly installed, you can inspect the version of MAmotif by ``-v/--version`` option::

$ manorm -v
$ manorm --version

Command-Line Usage
------------------

You need to build some prerequisites before running MAmotif:

Build genomes
^^^^^^^^^^^^^

Preprocess sequences and genome-wide nucleotide frequency for the corresponding genome assembly.

::

$ genomecompile [-h] [-v] -G hg19.fa -o hg19_genome

**Note:** You only need to run this command once for each genome

Options
"""""""

-h, --help Show help message and exit.
-v, --version Show version number and exit.
-G **[Required]** Genome sequences in fasta format.
-o **[Required]** Path to write the output files.

Build motifs (Optional)
^^^^^^^^^^^^^^^^^^^^^^^

**Note:** MAmotif provides some preprocessed motif PWM files under **data/motif** of the MotifScan package.

Build motif PWM/motif-score cutoff for custom motifs that are not included in our pre-complied motif collection:

::

$ motifcompile [-h] [-v] –M motif_pwm_demo.txt –g hg19_genome -o hg19_motif

Options
"""""""

-h, --help Show help message and exit.
-v, --version Show version number and exit.
-M **[Required]** Raw motif PFM (Position Frequency Matrix) file.
-g **[Required]** Path of pre-compiled genome directory (generated by `genomecompile`)
-o **[Requried]** Prefix of output file.

run MAmotif
^^^^^^^^^^^

MAmotif provide a console script ``mamotif`` for running the program, the basic usage is as follows:

::

$ mamotif --p1 sample1_peaks.bed --p2 sample2_peaks.bed --r1 sample1_reads.bed --r2 sample2_reads.bed -g hg19_genome
–m hg19_motif_p1e-4.txt -o sample1_vs_sample2

.. tip::
Please use ``-h/--help`` for the details of all options.

Options
"""""""

-h, --help Show help message and exit.
-v, --version Show version number and exit.
--p1 **[Required]** Peaks file of sample1.
--p2 **[Required]** Peaks file of sample2.
--r1 **[Required]** Reads file of sample1.
--r2 **[Required]** Reads file of sample2.
--s1 Reads shiftsize of sample1. Default: 100
--s2 Reads shiftsize of sample2. Default: 100
-g **[Required]** Path of pre-compiled genome directory (generated by `genomecompile`).
-m **[Required]** Pre-compiled motif file (generated by `motifcompile`).
-a Gene annotation file, which is used to generate random controls when performing enrichment analysis.
-w Width of window to calculate read density. Default: 1000
-d Summit-to-summit distance cutoff for common peaks. Default: ``-w``/2
-n Number of simulations to test the enrichment of peaks overlap between two samples.
--m_cutoff *M-value* cutoff to distinguish biased (sample-specific) peaks from unbiased peaks.
-p *P-value* cutoff to define biased peaks.
-l Motif list file.
-r Perform MAmotif on {all,promoter,distal} regions.
--upstream Upstream distance to TSS to define promoter regions.
--downstream Downstream distance to TSS to define promoter regions.
--peak_length The length of input regions to perform motif scan around peak summit/midpoint.
--negative Using negative test (sample2 vs sample1).
--correction Type of multiple test correction [benjamin, bonferroni].
-s Detailed output mode. Write the normalization results for original peaks and the genome coordinates
of target sites for each motif.
-o Comparison name, this is used as the folder name and prefix of output files.

Input Format
============

Format of Peaks file
--------------------

Standard **BED** format and **MACS xls** format are supported, other supported format are listed below::

* 3-columns tab split format

# chr start end
chr1 2345 4345
chr1 3456 5456
chr2 6543 8543

* 4-columns tab split format

# chr start end summit
chr1 2345 4345 254
chr1 3456 5456 127
chr2 6543 8543 302

.. note::
The fourth column **summit** is the relative position to **start**.


Format of Reads file
--------------------

Only **BED** format are supported for now. More format will be embedded in the following updates.

Format of Motif PWM file
------------------------

MAmotif supports JASPAR_ 2014/2016/2018 motif matrix format.

JASPAR2014::

>MA0004.1 Arnt
4 19 0 0 0 0
16 0 20 0 0 0
0 1 0 20 0 20
0 0 0 0 20 0

JASPAR2016/2018::

>MA0004.1 Arnt
A [ 4 19 0 0 0 0 ]
C [ 16 0 20 0 0 0 ]
G [ 0 1 0 20 0 20 ]
T [ 0 0 0 0 20 0 ]

.. _JASPAR: http://jaspar.genereg.net/

Format of Gene annotation file
------------------------------

MAmotif supports RefSeq_ format for gene annotation.

.. _RefSeq: http://genome.ucsc.edu/cgi-bin/hgTables

MAmotif Output
==============

After finished running MAmotif, all output files will be written to the directory you specified with "-o" argument.

Main output
-----------

::

1.Motif Name
2.Target Number: Number of motif-present peaks
3.Average of Target M-value: Average M-value of motif-present peaks
4.Deviation of Target M-value: M-value Std of motif-present peaks
5.Non-target Number: Number of motif-absent peaks
6.Average of Non-target M-value: Average M-value of motif-absent peaks
7.Deviation of Non-target M-value: M-value Std of motif-absent peaks
8.T-test Statistics: T-Statistics for M-values of motif-present peaks against motif-absent peaks
9.T-test P-value: Right-tailed P-value of T-test
10.T-test P-value By Benjamin correction
11.RanSum-test Statistics
12.RankSum-test P-value
13.RankSum-test P-value By Benjamin correction
14.Maximal P-value: Maximal corrected P-value of T-test and RankSum-test

MAnorm output
-------------

MAmotif will invoke MAnorm and output the normalization results and MA-plot for samples under comparison.

1. output_prefix_all_MAvalues.xls

This is the main output result of MAnorm which contains the M-A values and normalized read density of each peak,
common peaks from two samples are merged together::

1.chr: chromosome name
2.start: start position of the peak
3.end: end position of the peak
4.summit: summit position of the peak (relative to start)
5.m_value: M value (log2 Fold change) of normalized read densities under comparison
6.a_value: A value (average signal strength) of normalized read densities under comparison
7.p_value
8.peak_group: indicates where the peak is come from
9.normalized_read_density_in _sample1
10.normalized_read_density_in_sample2


.. note::
Coordinates in .xls file is under **1-based** coordinate-system.

2. output_filters/

* sample1_biased_peaks.bed
* sample2_biased_peaks.bed
* output_name_unbiased_peaks.bed

3. output_tracks/

* output_name_M_values.wig
* output_name_A_values.wig
* output_name_P_values.wig

4. output_figures/

* output_name_MA_plot_before_normalization.png
* output_name_MA_plot_after_normalization.png
* output_name_MA_plot_with_P-value.png
* output_name_read_density_on_common_peaks.png

MotifScan output
----------------

MAmotif will also output tables to summarize the enrichment of motifs and the motif target number and motif-score
of each peak region.

If you specified "-s" with MAmotif, it will also output the genome coordinates of every motif target site.

1. motif_enrichment.csv

Enrichment of motifs in given peaks compared to random regions. All analyzed motifs are listed and sorted by enrichment
p-value in the ascending order.

2. peak_motif_score.csv

The table can be divided into two parts, the first 5 columns are the region information part which briefly derived from
the region file that user specified and the second part is the motif score information. Each motif has a score measuring
the binding affinity for each region sequence.

+------+-------+-------+--------+-------+------------+-------------+-----+
| chr | start | end | summit | score | IRF2.score | GATA2.score | ... |
+======+=======+=======+========+=======+============+=============+=====+
| chr1 | 10012 | 10256 | 10135 | 64.21 | 0.82 | 0.35 | ... |
+------+-------+-------+--------+-------+------------+-------------+-----+
| ... | | | | | | | |
+------+-------+-------+--------+-------+------------+-------------+-----+

3. peak_motif_tarnum.csv

It is a also detail information table for each region’s motif target number for each motif. The file structure is
similar to the peak_motif_score.csv, except the bold font represents the motif target number instead of the motif
score.

+------+-------+-------+--------+-------+-------------+--------------+-----+
| chr | start | end | summit | score | IRF2.number | GATA2.number | ... |
+======+=======+=======+========+=======+=============+==============+=====+
| chr1 | 10012 | 10256 | 10135 | 64.21 | 0.82 | 0.35 | ... |
+------+-------+-------+--------+-------+-------------+--------------+-----+
| ... | | | | | | | |
+------+-------+-------+--------+-------+-------------+--------------+-----+

4. motif_target_sites/*

Only appears when option -s is on. The directory contains all the motif target site information of all candidate motifs.
Each motif forms an independent file that named after [motif_name]_target_site.txt. The fisrt 3 columns are the motif
target site coordinate on the genome. The 4th column is the corresponding target sequence and the motif score of the
this motif occurrence is indicated in the last column.

+------+-------+-------+----------+-------------+
| chr | start | end | sequence | motif score |
+======+=======+=======+==========+=============+
| chr1 | 10012 | 10256 | AATCGAT | 0.57 |
+------+-------+-------+----------+-------------+
| ... | | | | |
+------+-------+-------+----------+-------------+

5. plot/

Under this directory, motif enrichment plot and distribution relative to peak summit/center will be generated for each
motif.

0 comments on commit 9a50717

Please sign in to comment.