Skip to content

Commit

Permalink
Merge pull request #50 from fidelram/samtools_check
Browse files Browse the repository at this point in the history
Samtools check
  • Loading branch information
fidelram committed Feb 5, 2014
2 parents abf83ee + c6a4a33 commit 73cc9b0
Show file tree
Hide file tree
Showing 8 changed files with 303 additions and 232 deletions.
80 changes: 14 additions & 66 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,7 @@ deepTools
deepTools addresses the challenge of handling the large amounts of data
that are now routinely generated from DNA sequencing centers. To do so, deepTools contains useful modules to process the mapped reads data to create coverage files in standard bedGraph and bigWig file formats. By doing so, deepTools allows the creation of **normalized coverage files** or the comparison between two files (for example, treatment and control). Finally, using such normalized and standardized files, multiple
**visualizations** can be created to identify enrichments with
functional annotations of the genome. For a gallery of images that
can be produced, see
http://f1000.com/posters/browse/summary/1094053
functional annotations of the genome.

For support, questions, or feature requests contact: [email protected]

Expand All @@ -31,53 +29,33 @@ deepTools are available for:

Details on the installation routines can be found here.

[Installation from source](#linux)
[General Installation](#general)

[Installation on a Mac](#mac)

[Troubleshooting](#trouble)

[Galaxy installation](#galaxy)


<a name="linux"/></a>
### Installation from source (Linux, command line)
<a name="general"/></a>
### General Installation

The easiest way to install deepTools is by __downloading the source file and using python pip__ or easy_install tools:
The easiest way to install deepTools is by using python `pip` or `easy_install tools`:

Requirements: Python 2.7, numpy, scipy installed
Requirements: Python 2.7, numpy, scipy (http://www.scipy.org/install.html) installed

Commands:

$ cd ~
$ export PYTHONPATH=$PYTHONPATH:~/lib/python2.7/site-packages
$ export PATH=$PATH:~/bin:~/.local/bin

If pip is not already available, install with:

$ easy_install --prefix=~ pip

Install deepTools and dependencies with pip:

$ pip install --user deeptools
$ pip install deeptools
Done.




__Another option is to clone the repository:__
__A second option is to clone the repository:__

$ git clone https://github.com/fidelram/deepTools

Then go to the deepTools directory, edit the `deepTools.cfg`
file and then run the install script a:

$ cd deepTools
$ vim deeptools/config/deepTools.cfg
$ python setup.py install


By default, the script will install python library and executable
By default, the script will install the python library and executable
codes globally, which means you need to be root or administrator of
the machine to complete the installation. If you need to
provide a nonstandard install prefix, or any other nonstandard
Expand All @@ -86,22 +64,16 @@ script.

$ python setup.py --help

To install under a specific location use:
For example, to install under a specific location use:

$ python setup.py install --prefix <target directory>

<a name="mac"></a>
### Installation on a MAC

Although the installation of deepTools itself is quite simple,
the installation of the required modules SciPy and NumPy demand
a bit of extra work.

The easiest way to install them ois together with the
The easiest way to get numpy and scipy dependencies is to install the
[Anaconda Scientific Python Distribution][]. After installation, open
a terminal ("Applications" --> "Terminal"): and type:

$ pip install deeptools
a terminal ("Applications" → "Terminal") and follow the [General Installation](#general)

If individual installation of the dependencies is preferred, follow
those steps:
Expand All @@ -112,35 +84,11 @@ Download the packages and install them using dmg images:
- http://sourceforge.net/projects/numpy/files/NumPy/
- http://sourceforge.net/projects/scipy/files/scipy/

Then install deepTools via the terminal ("Applications" --> "Terminal"):

$ cd ~
$ export PYTHONPATH=$PYTHONPATH:~/lib/python2.7/site-packages
$ export PATH=$PATH:~/bin:~/.local/bin:~/Library/Python/2.7/bin

If pip is not already available, install with:

$ easy_install --prefix=~ pip

Install deepTools and dependencies with pip:

$ pip install --user deeptools
Then open terminal ("Applications" → "Terminal")
and follow the [General Installation](#general)


<a name="trouble"/></a>
##### Troubleshooting
The easy_install command is provided by the python package setuptools.
You can download the package from https://pypi.python.org/pypi/setuptools

$ wget https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py -O - | python
or the user-specific way:

$ wget https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py
$ python ez_setup.py --user

Numpy/Scipy Installation:
http://www.scipy.org/install.html

<a name="galaxy"/></a>
#### Galaxy Installation
Expand Down
171 changes: 99 additions & 72 deletions bin/correctGCBias
Original file line number Diff line number Diff line change
Expand Up @@ -18,79 +18,108 @@ from deeptools.utilities import getGC_content, tbitToBamChrName
from deeptools.countReadsPerBin import getFragmentFromRead
from deeptools import config as cfg
from deeptools import writeBedGraph, parserCommon, mapReduce
from deeptools import utilities

debug = 0

samtools = cfg.config.get('external_tools', 'samtools')
global_vars = dict()


def parseArguments(args=None):
parentParser = parserCommon.getParentArgParse()
requiredArgs = getRequiredArgs()
parser = argparse.ArgumentParser(
parents=[parentParser],
parents=[requiredArgs, parentParser],
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
description='Corrects the GC bias using Benjamini\'s method '
'[Benjamini & Speed (2012). Nucleic acids research, 40(10)]. '
'The tool computeGC bias needs to be run first.')
'The tool computeGC bias needs to be run first.',
usage='An example usage is:\n %(prog)s '
'-b file.bam --effectiveGenomeSize 2150570000 -g mm9.2bit '
'-l 200 --GCbiasFrequenciesFile freq.txt -o gc_corrected.bam '
'[options]',
conflict_handler='resolve',
add_help=False)

# define the arguments
parser.add_argument('--bamfile', '-b',
metavar='bam file',
help='Sorted Bam file to correct.',
required=True)

parser.add_argument('--effectiveGenomeSize',
help='The effective genome size is the portion '
'of the genome that is mappable. Large fractions of '
'the genome are stretches of NNNN that should be '
'discarded. Also, if repetitive regions were not '
'included in the mapping of reads, the effective '
'genome size needs to be adjusted accordingly. '
'Common values are: mm9: 2150570000, '
'hg19:2451960000, dm3:121400000 and ce10:93260000. '
'See Table 2 of http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0030377 '
'or http://www.nature.com/nbt/journal/v27/n1/fig_tab/nbt.1518_T1.html '
'for several effective genome sizes.',
default=None,
type=int,
required=False)

parser.add_argument('--genome', '-g',
help='Genome in two bit format. Most genomes can be '
'found here: http://hgdownload.cse.ucsc.edu/gbdb/ '
'Search for the .2bit ending. Otherwise, fasta '
'files can be converted to 2bit using the UCSC '
'programm called faToTwoBit available for different '
'plattforms at '
'http://hgdownload.cse.ucsc.edu/admin/exe/',
metavar='two bit file',
required=True)
args = parser.parse_args(args)
if args.correctedFile.name.endswith('bam'):
if not cfg.checkProgram(samtools, 'view',
'http://samtools.sourceforge.net/'):
exit(1)
if args.correctedFile.name.endswith('bw'):
if not cfg.checkProgram('bedGraphToBigWig', '-h',
'http://hgdownload.cse.ucsc.edu/admin/exe/'):
exit(1)

return(args)

group = parser.add_argument_group('Output options')

group.add_argument('--GCbiasFrequenciesFile', '-freq',
help='Indicate the output file from '
'computeGCBias containing, '
'the observed and expected read frequencies per GC '
'content.',
type=argparse.FileType('r'),
metavar='FILE',
required=True)
def getRequiredArgs():
parser = argparse.ArgumentParser(add_help=False)

group.add_argument('--correctedFile', '-o',
help='Name of the corrected file. The ending will '
'be used to decide the output file format. The options '
'are ".bam", ".bw" for a bigWig file, ".bg" for a '
'bedgraph file.',
metavar='FILE',
type=argparse.FileType('w'),
required=True)
required = parser.add_argument_group('Required arguments')

# define the arguments
required.add_argument('--bamfile', '-b',
metavar='bam file',
help='Sorted Bam file to correct.',
required=True)

required.add_argument('--effectiveGenomeSize',
help='The effective genome size is the portion '
'of the genome that is mappable. Large fractions of '
'the genome are stretches of NNNN that should be '
'discarded. Also, if repetitive regions were not '
'included in the mapping of reads, the effective '
'genome size needs to be adjusted accordingly. '
'Common values are: mm9: 2150570000, '
'hg19:2451960000, dm3:121400000 and ce10:93260000. '
'See Table 2 of '
'http://www.plosone.org/article/info:doi/10.1371/journal.pone.0030377 '
'or http://www.nature.com/nbt/journal/v27/n1/fig_tab/nbt.1518_T1.html '
'for several effective genome sizes. This value is '
'needed to detect enriched regions that, if not '
'discarded can bias the results.',
default=None,
type=int,
required=True)

required.add_argument('--genome', '-g',
help='Genome in two bit format. Most genomes can be '
'found here: http://hgdownload.cse.ucsc.edu/gbdb/ '
'Search for the .2bit ending. Otherwise, fasta '
'files can be converted to 2bit using the UCSC '
'programm called faToTwoBit available for different '
'plattforms at '
'http://hgdownload.cse.ucsc.edu/admin/exe/',
metavar='two bit file',
required=True)

required.add_argument('--GCbiasFrequenciesFile', '-freq',
help='Indicate the output file from '
'computeGCBias containing, '
'the observed and expected read frequencies per GC '
'content.',
type=argparse.FileType('r'),
metavar='FILE',
required=True)

output = parser.add_argument_group('Output options')
output.add_argument('--correctedFile', '-o',
help='Name of the corrected file. The ending will '
'be used to decide the output file format. The options '
'are ".bam", ".bw" for a bigWig file, ".bg" for a '
'bedgraph file.',
metavar='FILE',
type=argparse.FileType('w'),
required=True)

args = parser.parse_args(args)
# define the optional arguments
optional = parser.add_argument_group('Optional arguments')
optional.add_argument("--help", "-h", action="help",
help="show this help message and exit")

return(args)
return parser


def getReadGCcontent(tbit, read, fragmentLength, chrNameBit):
Expand Down Expand Up @@ -149,6 +178,7 @@ def writeCorrected_wrapper(args):
def writeCorrected_worker(chrNameBam, chrNameBit, start, end, step):
r"""writes a bedgraph file containing the GC correction of
a region from the genome
>>> test = Tester()
>>> tempFile = writeCorrected_worker(*test.testWriteCorrectedChunk())
>>> open(tempFile, 'r').readlines()
Expand Down Expand Up @@ -221,7 +251,7 @@ def writeCorrected_worker(chrNameBam, chrNameBit, start, end, step):
if i == 0:
return None

_file = tempfile.NamedTemporaryFile(delete=False)
_file = open(utilities.getTempFileName(suffix='.bg'), 'w')
# save in bedgraph format
for bin in xrange(0, len(cvg_corr), step):
value = np.mean(cvg_corr[bin:min(bin + step, end)])
Expand Down Expand Up @@ -267,6 +297,9 @@ def writeCorrectedSam_worker(chrNameBam, chrNameBit, start, end,
r"""
Writes a SAM file, deleting and adding some reads in order to compensate
for the GC bias. **This is a stochastic method.**
First, check if samtools can be executed, otherwise the test will fail
>>> resp = cfg.checkProgram(samtools, 'view', '')
>>> np.random.seed(1)
>>> test = Tester()
>>> args = test.testWriteCorrectedSam()
Expand Down Expand Up @@ -297,16 +330,7 @@ def writeCorrectedSam_worker(chrNameBam, chrNameBit, start, end,
tbit = twobit.TwoBitFile(open(global_vars['2bit']))

bam = pysam.Samfile(global_vars['bam'])
# is /dev/shm available?
# working in this directory speeds the process
try:
_file = tempfile.NamedTemporaryFile(suffix=".sam",
dir='/dev/shm', delete=False)
except OSError:
_file = tempfile.NamedTemporaryFile(suffix=".sam", delete=False)

tempFileName = _file.name
_file.close()
tempFileName = utilities.getTempFileName(suffix='.sam')

outfile = pysam.Samfile(tempFileName, 'wh', template=bam)
startTime = time.time()
Expand Down Expand Up @@ -519,29 +543,32 @@ def main(args):
if args.correctedFile.name.endswith('bg') or \
args.correctedFile.name.endswith('bw'):

_temp_bg_file = tempfile.NamedTemporaryFile()
_temp_bg_file_name = utilities.getTempFileName(suffix='_all.bg')
if len(mp_args) > 1 and args.numberOfProcessors > 1:

res = pool.map_async(writeCorrected_wrapper, mp_args).get(9999999)
else:
res = map(writeCorrected_wrapper, mp_args)

# concatenate intermediary bedgraph files
_temp_bg_file = open(_temp_bg_file_name, 'w')
for tempFileName in res:
if tempFileName:
# concatenate all intermediate tempfiles into one
# bedgraph file
shutil.copyfileobj(open(tempFileName, 'rb'), _temp_bg_file)
os.remove(tempFileName)

_temp_bg_file.close()
args.correctedFile.close()
chromSizes = [(x, bit[x].size) for x in bit.keys()]

if args.correctedFile.name.endswith('bg'):
os.system("mv {} {}".format(_temp_bg_file,
args.correctedFile.name))
shutil.move(_temp_bg_file_name, args.correctedFile.name)

else:
writeBedGraph.bedGraphToBigWig(chromSizes, _temp_bg_file.name,
chromSizes = [(x, bit[x].size) for x in bit.keys()]
writeBedGraph.bedGraphToBigWig(chromSizes, _temp_bg_file_name,
args.correctedFile.name)
_temp_bg_file.close()
os.remove(_temp_bg_file)


class Tester():
Expand Down
2 changes: 1 addition & 1 deletion deeptools/_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
# This file is originally generated from Git information by running 'setup.py
# version'. Distribution tarballs contain a pre-generated copy of this file.

__version__ = '1.5.6-45-g7190dd0'
__version__ = '1.5.7'
Loading

0 comments on commit 73cc9b0

Please sign in to comment.