Skip to content

Commit

Permalink
INNUca v4.2.1 - SPAdes v3.14.0
Browse files Browse the repository at this point in the history
* Change SPAdes version
  * Add SPAdes v3.14.0 and remove v3.10.1
  * Incorporate SPAdes --isolate option for estimated coverage >= 100x
* Change MLST QA/QC
  * Samples with species known MLST scheme but for which it was not possible to find a scheme will now raise a warning instead of fail
* Add more statistics
  * Save total number of reads and bp sequenced
* Change Docker image
  * Change base image to perl:5.30-slim-stretch. This allows to use most recent Perl version but keeping an old Linux distribution for old kernels compatibility.
  * Install procps to provide free package to access memory usage
  * Do a JDK headless installation
  * Add any2fasta (_mlst_ dependency)
* Minor changes
  * Add Docker image statistics to README
* Minor fixes
  * Correct _mlst_ installation
  * Check if the _mlst_ novel alleles file exists before cleaning it
  * Catch subprocess error when program to run is not installed
  • Loading branch information
miguelpmachado authored Feb 18, 2020
1 parent cc2b6f5 commit ee1f1dc
Show file tree
Hide file tree
Showing 313 changed files with 7,873 additions and 5,817 deletions.
6 changes: 4 additions & 2 deletions Docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
FROM perl:5.30-slim-stretch
MAINTAINER Miguel Machado <[email protected]>
LABEL version="4.2.1-01"
LABEL version="4.2.2-01"

WORKDIR /NGStools/

Expand Down Expand Up @@ -46,6 +46,8 @@ ENV PATH="/NGStools/ncbi-blast-2.9.0+/bin:/NGStools/any2fasta:${PATH}"
# --- mlst ----
RUN git clone https://github.com/tseemann/mlst.git
ENV PATH="/NGStools/mlst/bin:${PATH}"
# Update Clostridium to Clostridioides
RUN echo -e 'cdifficile\tClostridioides\tdifficile' >> /NGStools/mlst/db/scheme_species_map.tab

# --- ReMatCh ---
# TODO: to be used after converting INNUca do Python v3
Expand All @@ -60,7 +62,7 @@ ENV PATH="/NGStools/ReMatCh/ReMatCh/src/samtools-1.3.1/bin:/NGStools/ReMatCh/ReM
# --- INNUca ---
RUN git clone https://github.com/B-UMMI/INNUca.git && \
pip install setuptools
ENV PATH="/NGStools/INNUca/src/fastqc_v0.11.5:/NGStools/INNUca/src/pilon_v1.23:/NGStools/INNUca/src/SPAdes-3.13.0-Linux/bin:/NGStools/INNUca/src/Trimmomatic-0.38:/NGStools/INNUca:${PATH}"
ENV PATH="/NGStools/INNUca/src/fastqc_v0.11.5:/NGStools/INNUca/src/pilon_v1.23:/NGStools/INNUca/src/SPAdes-3.14.0-Linux/bin:/NGStools/INNUca/src/Trimmomatic-0.38:/NGStools/INNUca:${PATH}"

# fixing permissions for MLST update
RUN chmod +x /NGStools/INNUca/Docker/update_mlst_db.sh && chmod o+wr /NGStools/mlst/scripts/ && chmod -R o+wr /NGStools/mlst/db/ && sed -i "s#OUTDIR=pubmlst#OUTDIR=/NGStools/mlst/scripts/pubmlst#1" /NGStools/mlst/scripts/mlst-download_pub_mlst
Expand Down
28 changes: 16 additions & 12 deletions Docker/README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,29 @@
INNUca.py - Docker
===============
[![dockeri.co](https://dockeri.co/image/ummidock/innuca)](https://hub.docker.com/r/ummidock/innuca)

# INNUca.py - Docker

INNUca - Reads Control and Assembly

*INNUENDO quality control of reads, de novo assembly and contigs quality assessment, and possible contamination search*

<https://github.com/B-UMMI/INNUca>

<https://hub.docker.com/r/ummidock/innuca>


This is a dockerfile for using INNUca, with all dependencies already installed.

Within this container you can find:
- Debian Stretch (9)
- Perl v5.30
- Perl v5.30.1
- git v2.11.0
- Python v2.7
- Java-JDK v1.8.0_40 headless
- [Blast+](https://blast.ncbi.nlm.nih.gov/Blast.cgi) v2.9.0
- [mlst](https://github.com/tseemann/mlst) v2.18.0
- [mlst](https://github.com/tseemann/mlst) v2.18.1
- [ReMatCh](https://github.com/B-UMMI/ReMatCh) v4.1.0
- [Kraken](https://ccb.jhu.edu/software/kraken/) v2.0.7
- [INNUca](https://github.com/B-UMMI/INNUca) v4.2.1
- [INNUca](https://github.com/B-UMMI/INNUca) v4.2.2



Expand All @@ -30,37 +34,37 @@ Within [play-with-docker](http://labs.play-with-docker.com/) webpage click on **
will open with a big counter on the upper left corner. Click on **+ add new instance** and a terminal like instance should be generated on the right. On
this terminal you can load this docker image as follows:

`docker pull ummidock/innuca:4.2.1-01`
`docker pull ummidock/innuca:4.2.2-01`

#### Build this docker on your local machine

For this, docker needs to be installed on your machine. Instructions for this can be found [here](https://docs.docker.com/engine/installation/).

##### Using DockerHub (automated build image)

`docker pull ummidock/innuca:4.2.1-01`
`docker pull ummidock/innuca:4.2.2-01`

##### Using GitHub (build docker image)

1) `git clone https://github.com/B-UMMI/INNUca.git`
2) `docker build -t ummidock/innuca:4.2.1-01 ./INNUca/Docker/`
2) `docker build -t ummidock/innuca:4.2.2-01 ./INNUca/Docker/`

### Run (using automated build image)
docker run --rm -u $(id -u):$(id -g) -it -v /local/folder/fastq_data:/data/ ummidock/innuca:4.2.1-01 INNUca.py --speciesExpected "Streptococcus agalactiae" --genomeSizeExpectedMb 2.1 --inputDirectory /data/ --outdir /data/innuca_output/ --threads 8 --maxNumberContigs 100
docker run --rm -u $(id -u):$(id -g) -it -v /local/folder/fastq_data:/data/ ummidock/innuca:4.2.2-01 INNUca.py --speciesExpected "Streptococcus agalactiae" --genomeSizeExpectedMb 2.1 --inputDirectory /data/ --outdir /data/innuca_output/ --threads 8 --maxNumberContigs 100

### udocker

> "A basic user tool to execute simple docker containers in user space without requiring root privileges.". From [here](https://github.com/indigo-dc/udocker).
```bash
# Get Docker image
udocker pull ummidock/innuca:4.2.1-01
udocker pull ummidock/innuca:4.2.2-01

# Create container (only needed to be done once)
udocker create --name=innuca_4-2-1_01 ummidock/innuca:4.2.1-01
udocker create --name=innuca_4-2-2_01 ummidock/innuca:4.2.2-01

# Run INNUca
udocker run --user $(id -u):$(id -g) -v /local/folder/fastq_data:/data/ innuca_4-2-1_01 INNUca.py --speciesExpected "Streptococcus agalactiae" --genomeSizeExpectedMb 2.1 --inputDirectory /data/ --outdir /data/innuca_output/ --threads 8 --maxNumberContigs 100
udocker run --user $(id -u):$(id -g) -v /local/folder/fastq_data:/data/ innuca_4-2-2_01 INNUca.py --speciesExpected "Streptococcus agalactiae" --genomeSizeExpectedMb 2.1 --inputDirectory /data/ --outdir /data/innuca_output/ --threads 8 --maxNumberContigs 100
```
More examples on how to use **udocker** can be found in **udocker** [GitHub page](https://github.com/indigo-dc/udocker)

Expand Down
81 changes: 45 additions & 36 deletions INNUca.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
Copyright (C) 2018 Miguel Machado <[email protected]>
Last modified: November 25, 2019
Last modified: February 05, 2020
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
Expand Down Expand Up @@ -91,7 +91,7 @@ def include_rematch_dependencies_path(do_not_use_provided_software):


def main():
version = '4.2.1'
version = '4.2.2'
args = utils.parseArguments(version)

general_start_time = time.time()
Expand Down Expand Up @@ -138,12 +138,12 @@ def main():

# Check programms
programs_version_dictionary = {}
programs_version_dictionary['gunzip'] = ['--version', '>=', '1.6']
programs_version_dictionary['gunzip'] = {'required': ['--version', '>=', '1.6']}

# Java check first for java dependents check next
if not (args.skipFastQC and args.skipTrimmomatic and (args.skipPilon or args.skipSPAdes)):
# programs_version_dictionary['java'] = ['-version', '>=', '1.8']
programs_version_dictionary['java'] = [None, '>=', '1.8'] # For OpenJDK compatibility
programs_version_dictionary['java'] = {'required': [None, '>=', '1.8']} # For OpenJDK compatibility
missingPrograms, programs_version_dictionary = utils.checkPrograms(programs_version_dictionary)
if len(missingPrograms) > 0:
sys.exit('\n' + 'Errors:' + '\n' + '\n'.join(missingPrograms))
Expand All @@ -154,35 +154,35 @@ def main():
global version_kraken_global
version_kraken_global = kraken_version()
if version_kraken_global == 2:
programs_version_dictionary['kraken2'] = ['--version', '>=', '2.0.6']
programs_version_dictionary['kraken2'] = {'required': ['--version', '>=', '2.0.6']}
else:
programs_version_dictionary['kraken'] = ['--version', '>=', '0.10.6']
programs_version_dictionary['kraken-repor'] = ['--version', '>=', '0.10.6']
programs_version_dictionary['kraken'] = {'required': ['--version', '>=', '0.10.6']}
programs_version_dictionary['kraken-repor'] = {'required': ['--version', '>=', '0.10.6']}
if not args.skipTrueCoverage and trueCoverage_config is not None:
rematch_script = include_rematch_dependencies_path(args.doNotUseProvidedSoftware)
programs_version_dictionary['rematch.py'] = ['--version', '>=', '4.0.1']
programs_version_dictionary['bcftools'] = ['--version', '==', '1.3.1']
programs_version_dictionary['rematch.py'] = {'required': ['--version', '>=', '4.0.1']}
programs_version_dictionary['bcftools'] = {'required': ['--version', '==', '1.3.1']}
if not (args.skipTrueCoverage and ((args.skipAssemblyMapping and args.skipPilon) or args.skipSPAdes)):
programs_version_dictionary['bowtie2'] = ['--version', '>=', '2.2.9']
programs_version_dictionary['samtools'] = ['--version', '==', '1.3.1']
programs_version_dictionary['bowtie2'] = {'required': ['--version', '>=', '2.2.9']}
programs_version_dictionary['samtools'] = {'required': ['--version', '==', '1.3.1']}
if not args.skipFastQC:
programs_version_dictionary['fastqc'] = ['--version', '==', '0.11.5']
programs_version_dictionary['fastqc'] = {'required': ['--version', '==', '0.11.5']}
if not args.skipTrimmomatic:
programs_version_dictionary['trimmomatic-{version}.jar'.format(version=args.trimVersion)] = ['-version', '==',
args.trimVersion]
programs_version_dictionary['trimmomatic-{version}.jar'.format(version=args.trimVersion)] = \
{'required': ['-version', '==', args.trimVersion]}
if args.runPear:
programs_version_dictionary['pear'] = ['--version', '>=', '0.9.10']
programs_version_dictionary['pear'] = {'required': ['--version', '>=', '0.9.10']}
if not args.skipSPAdes:
programs_version_dictionary['spades.py'] = ['--version', '>=', '3.9.0']
programs_version_dictionary['spades.py'] = {'required': ['--version', '>=', '3.9.0']}
if not (args.skipPilon or args.skipSPAdes):
programs_version_dictionary['pilon-{version}.jar'.format(version=args.pilonVersion)] = ['--version', '==',
args.pilonVersion]
programs_version_dictionary['pilon-{version}.jar'.format(version=args.pilonVersion)] = \
{'required': ['--version', '==', args.pilonVersion]}
if not (args.skipMLST or args.skipSPAdes):
programs_version_dictionary['mlst'] = ['--version', '>=', '2.4']
programs_version_dictionary['mlst'] = {'required': ['--version', '>=', '2.4']}
if args.runInsertSize and not args.skipSPAdes:
if args.skipAssemblyMapping and args.skipPilon:
programs_version_dictionary['bowtie2'] = ['--version', '>=', '2.2.9']
programs_version_dictionary['samtools'] = ['--version', '==', '1.3.1']
programs_version_dictionary['bowtie2'] = {'required': ['--version', '>=', '2.2.9']}
programs_version_dictionary['samtools'] = {'required': ['--version', '==', '1.3.1']}

# Set and print PATH variable
utils.setPATHvariable(args, script_path)
Expand All @@ -195,19 +195,25 @@ def main():
jar_path_trimmomatic = None
if not args.skipTrimmomatic:
jar_path_trimmomatic = \
programs_version_dictionary['trimmomatic-{version}.jar'.format(version=args.trimVersion)][3]
programs_version_dictionary['trimmomatic-{version}.jar'.format(version=args.trimVersion)]['found']['path']

jar_path_pilon = None
if not args.skipPilon and not args.skipSPAdes:
jar_path_pilon = programs_version_dictionary['pilon-{version}.jar'.format(version=args.pilonVersion)][3]
jar_path_pilon = \
programs_version_dictionary['pilon-{version}.jar'.format(version=args.pilonVersion)]['found']['path']

# Get SPAdes version
spades_version = None
if not args.skipSPAdes:
spades_version = programs_version_dictionary['spades.py']['found']['version']

# pairEnd_filesSeparation_list = args.pairEnd_filesSeparation
pairEnd_filesSeparation_list = None
samples, inputDirectory, removeCreatedSamplesDirectories, indir_same_outdir = \
get_samples(args.inputDirectory, args.fastq, outdir, pairEnd_filesSeparation_list)

# Start running the analysis
print '\n' + 'RUNNING INNUca.py'
print('\n' + 'RUNNING INNUca.py')

# Prepare run report file
samples_report_path = os.path.join(outdir, 'samples_report.' + time_str + '.tab')
Expand All @@ -230,20 +236,20 @@ def main():
# Determine SPAdes maximum memory
spadesMaxMemory = None
if not args.skipSPAdes:
print ''
print('')
spadesMaxMemory = spades.define_memory(args.spadesMaxMemory, args.threads, available_memory_GB)
# Determine .jar maximum memory
jarMaxMemory = 'off'
if not (args.skipTrimmomatic and (args.skipSPAdes or args.skipPilon)):
print ''
print('')
jarMaxMemory = utils.define_jar_max_memory(args.jarMaxMemory, args.threads, available_memory_GB)

# Run INNUca for each sample
sample_report_json = {}
for sample in samples:
sample_start_time = time.time()

print '\n' + 'Sample: ' + sample + '\n'
print('\n' + 'Sample: ' + sample + '\n')

# Create sample outdir
sample_outdir = os.path.abspath(os.path.join(outdir, sample, ''))
Expand All @@ -253,21 +259,21 @@ def main():
# Get fastq files
fastq_files = utils.searchFastqFiles(os.path.join(inputDirectory, sample, ''), pairEnd_filesSeparation_list, False)
if len(fastq_files) == 1:
print 'Only one fastq file was found: ' + str(fastq_files)
print 'Pair-End sequencing is required. Moving to the next sample'
print('Only one fastq file was found: ' + str(fastq_files))
print('Pair-End sequencing is required. Moving to the next sample')
continue
elif len(fastq_files) == 0:
print 'No compressed fastq files were found. Continue to the next sample'
print('No compressed fastq files were found. Continue to the next sample')
continue

print 'The following files will be used:'
print str(fastq_files) + '\n'
print('The following files will be used:')
print(str(fastq_files) + '\n')

# Run INNUca.py analysis
run_successfully, pass_qc, run_report = \
run_innuca(sample, sample_outdir, fastq_files, args, script_path, scheme, spadesMaxMemory,
jar_path_trimmomatic, jar_path_pilon, jarMaxMemory, trueCoverage_config, rematch_script,
species_genus, mlst_scheme_genus)
species_genus, mlst_scheme_genus, spades_version=spades_version)

# Save sample fail report
utils.write_fail_report(os.path.join(sample_outdir, 'fail_report.txt'), run_report)
Expand All @@ -286,7 +292,7 @@ def main():
if args.fastq is not None:
utils.removeDirectory(os.path.join(outdir, 'reads', ''))

print 'END ' + sample + ' analysis'
print('END ' + sample + ' analysis')
time_taken = utils.runTime(sample_start_time)

# Save run report
Expand Down Expand Up @@ -389,7 +395,8 @@ def get_samples(args_input_directory, args_fastq, outdir, pair_end_files_separat


def run_innuca(sample_name, outdir, fastq_files, args, script_path, scheme, spades_max_memory, jar_path_trimmomatic,
jar_path_pilon, jar_max_memory, true_coverage_config, rematch_script, species_genus, mlst_scheme_genus):
jar_path_pilon, jar_max_memory, true_coverage_config, rematch_script, species_genus, mlst_scheme_genus,
spades_version=None):
threads = args.threads
adapters_fasta = args.adapters
if adapters_fasta is not None:
Expand Down Expand Up @@ -624,7 +631,9 @@ def run_innuca(sample_name, outdir, fastq_files, args, script_path, scheme, spad
args.spadesMinCoverageAssembly, args.spadesMinContigsLength, genome_size,
args.spadesKmers, max_reads_length, args.spadesDefaultKmers,
args.spadesMinKmerCovContigs, assembled_se_reads, args.saveExcludedContigs,
args.maxNumberContigs, args.keepSPAdesScaffolds)
args.maxNumberContigs, args.keepSPAdesScaffolds, spades_version=spades_version,
estimated_coverage=estimated_coverage,
spades_not_use_isolate=args.spadesNotUseIsolate)
runs['SPAdes'] = [run_successfully, pass_qc, time_taken, failing, warning, 'NA']

if run_successfully:
Expand Down
22 changes: 17 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ usage: INNUca.py [-h] [--version] -s "Streptococcus agalactiae" -g 2.1
[--trimKeepFiles] [--doNotTrimCrops] [--trimCrop N]
[--trimHeadCrop N] [--trimSlidingWindow window:meanQuality]
[--trimLeading N] [--trimTrailing N] [--trimMinLength N]
[--spadesVersion 3.13.0] [--spadesNotUseCareful]
[--spadesVersion 3.13.0] [--spadesNotUseCareful] [--spadesNotUseIsolate]
[--spadesMinContigsLength N] [--spadesMaxMemory N]
[--spadesMinCoverageAssembly N] [--spadesMinKmerCovContigs N]
[--spadesKmers 55 77 [55 77 ...] | --spadesDefaultKmers]
Expand Down Expand Up @@ -295,12 +295,24 @@ Trimmomatic options:
length (default: 55) (default: 55)
SPAdes options:
--spadesVersion 3.13.0
--spadesVersion 3.14.0
Tells INNUca.py which SPAdes version to use (available
options: 3.10.1, 3.11.1, 3.13.0) (default: 3.13.0)
options: 3.11.1, 3.13.0, 3.14.0) (default: 3.14.0)
--spadesNotUseCareful
Tells SPAdes to only perform the assembly without the
--careful option (default: False)
Tells SPAdes to perform the assembly without the --careful option.
When the SPAdes --isolate option is allowed to be used (for SPAdes >= v4.14.0
and in the cases that INNUca --spadesNotUseIsolate option is not used) and the
estimated depth of coverage is >= 100x, the SPAdes --careful option is not used
anyway. (default: False)
--spadesNotUseIsolate
Tells SPAdes to not use --isolate option (only possible for SPAdes >= v3.14.0).
The SPAdes --isolate option is used when the estimated depth of coverage
is >= 100x (unless the INNUca --spadesNotUseIsolate is used) and automatically
turns on the INNUca --spadesNotUseCareful option and consequently do not use
the SPAdes --careful option.
Accordingally to SPAdes, the --isolate option is highly recommended for
high-coverage isolate and multi-cell data (improves the assembly quality and
running time). (default: False)
--spadesMinContigsLength N
Filter SPAdes contigs for length greater or equal than
this value (default: maximum reads size or 200 bp)
Expand Down
Loading

0 comments on commit ee1f1dc

Please sign in to comment.