Update docs

shao-lab · Mar 28, 2018 · 9a50717 · 9a50717
1 parent 6d147cb
commit 9a50717
Show file tree

Hide file tree

Showing 3 changed files with 355 additions and 4 deletions.
diff --git a/README.rst b/README.rst
@@ -35,7 +35,7 @@ Workflow
 Documentation
 -------------
 
-To see the full documentation of MAmoitf, please refer to: http://bioinfo.sibs.ac.cn/shaolab/mamotif/index.php
+To see the full documentation of MAmotif, please refer to: http://mamotif.readthedocs.io/en/latest/
 
 Installation
 ------------
@@ -137,8 +137,8 @@ MAnorm output
 MAmotif will invoke MAnorm and output the normalization results and MA-plot for samples under comparison.
 
 
-Motif output
-^^^^^^^^^^^^
+MotifScan output
+^^^^^^^^^^^^^^^^
 
 MAmotif will also output tables to summarize the enrichment of motifs and the motif target number and motif-score
 of each peak region.

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -4,7 +4,7 @@ MAmotif
 .. image:: https://travis-ci.org/shao-lab/MAmotif.svg?branch=master
    :alt: Travis Build
    :target: https://travis-ci.org/shao-lab/MAmotif
-.. image:: https://readthedocs.org/projects/mamoitf/badge/?version=latest
+.. image:: https://readthedocs.org/projects/mamotif/badge/?version=latest
    :alt: Documentation Status
    :target: http://mamotif.readthedocs.io/en/latest/?badge=latest
 .. image:: https://img.shields.io/pypi/v/mamotif.svg

diff --git a/docs/source/tutorial.rst b/docs/source/tutorial.rst
@@ -9,3 +9,354 @@ Tutorial
 
 Installation
 ============
+
+Like many other Python packages and bioinformatics softwares, MAmotif can be obtained easily from PyPI_ or Bioconda_(WIP).
+The command below shows how to install the latest release of MAmotif in a convenient way, but you can also install it
+from source code alternatively.
+
+Prerequisites
+-------------
+
+.. tip::
+   MAmotif is implemented under **Python 2.7** and will support **Python 3.X** in the following updates.
+
+* **Python 2.7**
+* setuptools
+* numpy
+* pandas
+* statsmodels
+* scipy
+* matplotlib
+
+Install with pip
+----------------
+The latest release of MAmotif is available at PyPI_, you can install via ``pip``::
+
+    $ pip install mamotif
+
+.. _PyPI: https://pypi.python.org/pypi/MAmotif
+
+Install with conda (WIP)
+------------------------
+
+You can also install MAmotif with conda_ through Bioconda_ channel::
+
+   $ conda install -c bioconda mamotif
+
+.. _conda: https://conda.io/docs/
+.. _Bioconda: https://bioconda.github.io/
+
+Install from source code
+------------------------
+
+It's highly recommended to install MAmotif with ``pip`` or ``conda``. If you prefer to install it from source code,
+please read the following steps:
+
+The source code of MAmotif is hosted on GitHub_, and setuptools_ is required for installation.
+
+.. _setuptools: https://setuptools.readthedocs.io/en/latest/
+.. _GitHub: https://github.com/shao-lab/MAmotif
+
+First, clone the repository of MAmotif::
+
+   $ git clone https://github.com/shao-lab/MAmotif.git
+
+Then, install MAmotif in the source directory::
+
+   $ cd MAmotif
+   $ python setup.py install
+
+.. note::
+   * You may need to install all dependencies listed in ``requirements.txt``.
+   * You may need to modify ``$PATH`` and ``$PYTHONPATH`` manually to make it work.
+
+Galaxy Installation
+-------------------
+
+WIP
+
+Usage of MAmotif
+================
+
+To check whether MAmotif is properly installed, you can inspect the version of MAmotif by ``-v/--version`` option::
+
+  $ manorm -v
+  $ manorm --version
+
+Command-Line Usage
+------------------
+
+You need to build some prerequisites before running MAmotif:
+
+Build genomes
+^^^^^^^^^^^^^
+
+Preprocess sequences and genome-wide nucleotide frequency for the corresponding genome assembly.
+
+::
+
+    $ genomecompile [-h] [-v] -G hg19.fa -o hg19_genome
+
+**Note:** You only need to run this command once for each genome
+
+Options
+"""""""
+
+-h, --help     Show help message and exit.
+-v, --version  Show version number and exit.
+-G             **[Required]** Genome sequences in fasta format.
+-o             **[Required]** Path to write the output files.
+
+Build motifs (Optional)
+^^^^^^^^^^^^^^^^^^^^^^^
+
+**Note:** MAmotif provides some preprocessed motif PWM files under **data/motif** of the MotifScan package.
+
+Build motif PWM/motif-score cutoff for custom motifs that are not included in our pre-complied motif collection:
+
+::
+
+    $ motifcompile [-h] [-v] –M motif_pwm_demo.txt –g hg19_genome -o hg19_motif
+
+Options
+"""""""
+
+-h, --help     Show help message and exit.
+-v, --version  Show version number and exit.
+-M             **[Required]** Raw motif PFM (Position Frequency Matrix) file.
+-g             **[Required]** Path of pre-compiled genome directory (generated by `genomecompile`)
+-o             **[Requried]** Prefix of output file.
+
+run MAmotif
+^^^^^^^^^^^
+
+MAmotif provide a console script ``mamotif`` for running the program, the basic usage is as follows:
+
+::
+
+    $ mamotif --p1 sample1_peaks.bed --p2 sample2_peaks.bed --r1 sample1_reads.bed --r2 sample2_reads.bed -g hg19_genome
+    –m hg19_motif_p1e-4.txt -o sample1_vs_sample2
+
+.. tip::
+    Please use ``-h/--help`` for the details of all options.
+
+Options
+"""""""
+
+-h, --help     Show help message and exit.
+-v, --version  Show version number and exit.
+--p1           **[Required]** Peaks file of sample1.
+--p2           **[Required]** Peaks file of sample2.
+--r1           **[Required]** Reads file of sample1.
+--r2           **[Required]** Reads file of sample2.
+--s1           Reads shiftsize of sample1. Default: 100
+--s2           Reads shiftsize of sample2. Default: 100
+-g             **[Required]** Path of pre-compiled genome directory (generated by `genomecompile`).
+-m             **[Required]** Pre-compiled motif file (generated by `motifcompile`).
+-a             Gene annotation file, which is used to generate random controls when performing enrichment analysis.
+-w             Width of window to calculate read density. Default: 1000
+-d             Summit-to-summit distance cutoff for common peaks. Default: ``-w``/2
+-n             Number of simulations to test the enrichment of peaks overlap between two samples.
+--m_cutoff     *M-value* cutoff to distinguish biased (sample-specific) peaks from unbiased peaks.
+-p             *P-value* cutoff to define biased peaks.
+-l             Motif list file.
+-r             Perform MAmotif on {all,promoter,distal} regions.
+--upstream     Upstream distance to TSS to define promoter regions.
+--downstream   Downstream distance to TSS to define promoter regions.
+--peak_length  The length of input regions to perform motif scan around peak summit/midpoint.
+--negative     Using negative test (sample2 vs sample1).
+--correction   Type of multiple test correction [benjamin, bonferroni].
+-s             Detailed output mode. Write the normalization results for original peaks and the genome coordinates
+               of target sites for each motif.
+-o             Comparison name, this is used as the folder name and prefix of output files.
+
+Input Format
+============
+
+Format of Peaks file
+--------------------
+
+Standard **BED** format and **MACS xls** format are supported, other supported format are listed below::
+
+  * 3-columns tab split format
+
+    # chr   start end
+      chr1  2345  4345
+      chr1  3456  5456
+      chr2  6543  8543
+
+  * 4-columns tab split format
+
+    # chr   start end   summit
+      chr1  2345  4345  254
+      chr1  3456  5456  127
+      chr2  6543  8543  302
+
+.. note::
+   The fourth column **summit** is the relative position to **start**.
+
+
+Format of Reads file
+--------------------
+
+Only **BED** format are supported for now. More format will be embedded in the following updates.
+
+Format of Motif PWM file
+------------------------
+
+MAmotif supports JASPAR_ 2014/2016/2018 motif matrix format.
+
+JASPAR2014::
+
+   >MA0004.1 Arnt
+   4       19      0       0       0       0
+   16      0       20      0       0       0
+   0       1       0       20      0       20
+   0       0       0       0       20      0
+
+JASPAR2016/2018::
+
+   >MA0004.1	Arnt
+   A  [     4     19      0      0      0      0 ]
+   C  [    16      0     20      0      0      0 ]
+   G  [     0      1      0     20      0     20 ]
+   T  [     0      0      0      0     20      0 ]
+
+.. _JASPAR: http://jaspar.genereg.net/
+
+Format of Gene annotation file
+------------------------------
+
+MAmotif supports RefSeq_ format for gene annotation.
+
+.. _RefSeq: http://genome.ucsc.edu/cgi-bin/hgTables
+
+MAmotif Output
+==============
+
+After finished running MAmotif, all output files will be written to the directory you specified with "-o" argument.
+
+Main output
+-----------
+
+::
+
+    1.Motif Name
+    2.Target Number: Number of motif-present peaks
+    3.Average of Target M-value: Average M-value of motif-present peaks
+    4.Deviation of Target M-value: M-value Std of motif-present peaks
+    5.Non-target Number: Number of motif-absent peaks
+    6.Average of Non-target M-value: Average M-value of motif-absent peaks
+    7.Deviation of Non-target M-value: M-value Std of motif-absent peaks
+    8.T-test Statistics: T-Statistics for M-values of motif-present peaks against motif-absent peaks
+    9.T-test P-value: Right-tailed P-value of T-test
+    10.T-test P-value By Benjamin correction
+    11.RanSum-test Statistics
+    12.RankSum-test P-value
+    13.RankSum-test P-value By Benjamin correction
+    14.Maximal P-value: Maximal corrected P-value of T-test and RankSum-test
+
+MAnorm output
+-------------
+
+MAmotif will invoke MAnorm and output the normalization results and MA-plot for samples under comparison.
+
+1. output_prefix_all_MAvalues.xls
+
+This is the main output result of MAnorm which contains the M-A values and normalized read density of each peak,
+common peaks from two samples are merged together::
+
+    1.chr: chromosome name
+    2.start: start position of the peak
+    3.end: end position of the peak
+    4.summit: summit position of the peak (relative to start)
+    5.m_value: M value (log2 Fold change) of normalized read densities under comparison
+    6.a_value: A value (average signal strength) of normalized read densities under comparison
+    7.p_value
+    8.peak_group: indicates where the peak  is come from
+    9.normalized_read_density_in _sample1
+    10.normalized_read_density_in_sample2
+
+
+.. note::
+   Coordinates in .xls file is under **1-based** coordinate-system.
+
+2. output_filters/
+
+  * sample1_biased_peaks.bed
+  * sample2_biased_peaks.bed
+  * output_name_unbiased_peaks.bed
+
+3. output_tracks/
+
+  * output_name_M_values.wig
+  * output_name_A_values.wig
+  * output_name_P_values.wig
+
+4. output_figures/
+
+  * output_name_MA_plot_before_normalization.png
+  * output_name_MA_plot_after_normalization.png
+  * output_name_MA_plot_with_P-value.png
+  * output_name_read_density_on_common_peaks.png
+
+MotifScan output
+----------------
+
+MAmotif will also output tables to summarize the enrichment of motifs and the motif target number and motif-score
+of each peak region.
+
+If you specified "-s" with MAmotif, it will also output the genome coordinates of every motif target site.
+
+1. motif_enrichment.csv
+
+Enrichment of motifs in given peaks compared to random regions. All analyzed motifs are listed and sorted by enrichment
+p-value in the ascending order.
+
+2. peak_motif_score.csv
+
+The table can be divided into two parts, the first 5 columns are the region information part which briefly derived from
+the region file that user specified and the second part is the motif score information. Each motif has a score measuring
+the binding affinity for each region sequence.
+
++------+-------+-------+--------+-------+------------+-------------+-----+
+| chr  | start | end   | summit | score | IRF2.score | GATA2.score | ... |
++======+=======+=======+========+=======+============+=============+=====+
+| chr1 | 10012 | 10256 |  10135 | 64.21 |    0.82    |    0.35     | ... |
++------+-------+-------+--------+-------+------------+-------------+-----+
+| ...  |       |       |        |       |            |             |     |
++------+-------+-------+--------+-------+------------+-------------+-----+
+
+3. peak_motif_tarnum.csv
+
+It is a also detail information table for each region’s motif target number for each motif. The file structure is
+similar to the peak_motif_score.csv, except the bold font represents the motif target number instead of the motif
+score.
+
++------+-------+-------+--------+-------+-------------+--------------+-----+
+| chr  | start | end   | summit | score | IRF2.number | GATA2.number | ... |
++======+=======+=======+========+=======+=============+==============+=====+
+| chr1 | 10012 | 10256 |  10135 | 64.21 |    0.82     |    0.35      | ... |
++------+-------+-------+--------+-------+-------------+--------------+-----+
+| ...  |       |       |        |       |             |              |     |
++------+-------+-------+--------+-------+-------------+--------------+-----+
+
+4. motif_target_sites/*
+
+Only appears when option -s is on. The directory contains all the motif target site information of all candidate motifs.
+Each motif forms an independent file that named after [motif_name]_target_site.txt. The fisrt 3 columns are the motif
+target site coordinate on the genome. The 4th column is the corresponding target sequence and the motif score of the
+this motif occurrence is indicated in the last column.
+
++------+-------+-------+----------+-------------+
+| chr  | start | end   | sequence | motif score |
++======+=======+=======+==========+=============+
+| chr1 | 10012 | 10256 |  AATCGAT |     0.57    |
++------+-------+-------+----------+-------------+
+| ...  |       |       |          |             |
++------+-------+-------+----------+-------------+
+
+5. plot/
+
+Under this directory, motif enrichment plot and distribution relative to peak summit/center will be generated for each
+motif.