forked from bxlab/metaWRAP
-
Notifications
You must be signed in to change notification settings - Fork 0
/
CHANGELOG
executable file
·359 lines (335 loc) · 17.8 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
metaWRAP v=1.4 (in development)
Kraken
added new module "kraken2" for running kraken2 instead of kraken1 for read or assembly classification
metaWRAP v=1.3 (August 2020)
Assembly
fixed typo where megahit was not using the specified tmp directory
Annotate_bins
removed manual perl library handling feature
Quant_bins
bin coverage is now estimated with weighted median coverage (reduces contribution of outliers)
fixed warning message to pring to stderr
allow for periods in sample names
allow for all 0 columns or rows
General
enforce MKL build of blas software to prevent CONCOCT infinite warning messages
return correct error code instead of always returning 0
Binning
added SAM file cleanup step
fixed library dependancy causing endless OPEN blas warning when running CONCOCT
fixed assembly truncation bug after interrupted binning run
fixed bug that left trailing characters in contigs of concoct bins, leading to refinement problems later
Bin_refinement
added additional checkpoints to prevent re-running checkm if program terminated before completing
algorithm now ignores bins smaller than 50Kbp or larger than 20Mbp to save time
Blobology
fixed perl path to #!/usr/bin/env perl in scripts
Kraken
fixed syntax bug where some systems would misinterpret comments at the end of lines
metaWRAP v=1.2 (April 2019)
General
config-metawrap sets database locations to $HOME by defualt during installation
numpy, openblas, and scipy versions updated for concoct 1.0
enforce CheckM v1.0.12 (newer versions are reported to fail)
fixed depricated matplotlib functions in plotting scripts
updated to Samtools v1.9 to avoid libcrypto.so.1.0.0 issues
simplified metaWRAP preprequisite programs to base minimum number (but with strict version control)
Assembly
added "--restart-from last" feature for metaspades to allow continuation of previously terminated runs
Read_qc
added support for non-human host contamination removal with bmtagger
Bin_refinement
set_axis_bgcolor was replaced by set_facecolor in plotting function
the module is able to continue previous progress is the .stats files are present
matplotlib depricated option fixes
Reassemble_bins
set_axis_bgcolor was replaced by set_facecolor in plotting function
added check limit on maximum number of open python file handles
Binning
fixed syntax bug coming from comment lines following "next line" character
split the alignment and alignment sorting stages to help resolve stalled jobs
matplotlib depricated option fixes
added support for CONCOCT v1.0, improving binning performance
added recommended contig splitting step for CONCOCT, improving its performance on larger contigs
CONCOCT now supports multi-threading greatly speeding up binning on larger data
fixed outdated path handling in new CONCOCT implementation
updated to maxbin2 v2.2.6
Blobology
removed shuffle reads features
use paired reads for bowtie alignment
split alignment and BAM conversions steps to clarity and debugging
Classify_bins
added exception to prune hits script to account for newer version of NCBI database
Kraken
kraken now pre-loads the database before running by default, improving speed on systems with slow IO
added --no-preload option for running form hard disk for lower memory requirements
fixed exception that crashed the module if kraken-translate could not find a taxonomy for a classification
Quant_bins
utilize the --meta option for salmon
utilize the -p option to enable multi-threading
metaWRAP v=1.1 (January 2019)
General
modified conda requirements for software version (multiple)
added perl-lwp-simple as a dependancy for MaxBin2
clarifying notes in many help messages
changed python script shebangs to force use python2.7
deleted depricated modules "all" and "phylosift", to avoid confusion
main help message now displays metawrap version
Binning
removed insert size calculations to avoid stalled bwa run bug
added proper scikit dependancy to avoid GMM errors when running CONCOCT
fixed Maxbin distribution to use conda's Perl instead of system Perl (in maxbin v=2.2.5)
fixed checkm folder error when resuming a previously attempted run
changed default MaxBin2 marker set to the Bacterial marker set
added --universal option to set MaxBin2 to use the universal marker genes for improved Archaea binning
added -l option that sets the minimum contig length to bin for all four binners
removed cleanup feature so re-running the module with a new binner is possible
Bin_refinement
major speed improvement with new --quick option, especially on low-memory systems
major big fix that caused the module to sometimes needlessly throw away good bins, especially at lower completion thresholds (-c)
advanced cleanup if using an existing output directory
improved error and status messages
fixed bug that created dandom tmp files in output directory when using two bin sets
output metawrap folder is now in lower case
Blobology
added failsafe if assembly file does not exist
Kraken
added help message details
added progress messages during kraken translation process
Quant_bins
added exception that skips making the heatmap when the sample count is only 1
fixed error causing the heatmap funciton to fail due to bool error
added failsafes to make sure that the bin and total assembly contig naming is the same
metaWRAP v=1.0 (September 2018)
General
removed gcc dependancy to prevent library errors with concoct
cleaned up some help messages
added --show-config option to metawrap to display program configuration details
improved installation documentation
Assembly
changed option name --use-metaspades to --metaspades
changed option name --use-megahit to --megahit
default memory allowance is now 24GB
Quant_bins
the -a option is now optional, but strongly recommended
the bin abundance heatmap is now standardized to the total abundance of the bins, not the reads
Reassemble_bins
added -l option to specify minimum contig length
default memory allowance is now 40GB
Classify_bins
fixed major bug that resulted inacurate classification due to mis-calculating contig lengths
Fixed major bug that caused the module to ignore the last contig in each bin
fixed bug that caused crashes when a contig was classified as "subfamily"
Bin_refinement
default memory allowance is now 40GB
metaWRAP v=0.9 (June 2018)
General
fixed major bug with running metawrap in a custom conda environments
fixed bug caused by multiple metawrap installations
bin/metaWRAP excecutable is not a symlink pointing to bin/metawrap
Bin_refinement
fixed bug with plotting refinement iterations
delete bins that have a size 0 after dereplication
fixed shebang in summarize_checkm.py
allowed pplacer to use multiple threads, drastically speeding up the module
module no longer clears output DIR before starting. safer this way
Binning
nos uspports single-end read inputs
added metabat1 binning option
fixed checkm tmp directory handling when using the --run-checkm option
fixed perl5 library handling when running maxbin2
added diagnostic info for perl libraries
added use of MaxBin v2.2.5 - this fixes some issues with newer gcc
checkm now uses pplacer multi-threading
Blobology
added figure output folders for easier viewing
Kraken
removed -a option from ktImportText
added checkpoint to make sure that fasta or fastq inputs are given
Annotate_bins
fixed perl5 library handling when running prokka
added minced dependancy
added diagnostic info for perl libraries
Classify_bins
removed percent confidance values for each classification
Assembly
use metaSPAdes v3.12.0
Reassemble_bins
read alignemnt and splitting is now done simultaneously, and no longer requires a lot of RAM
signifficantly sped up the read-splitting function and prevented freezes
added and fixed --parallel option bug
use SPAdes v3.12.0
checkm now uses multi-threading for pplacer
the user can now change the strict/permissive read mapping SNP cut off parameters
metaWRAP v=0.8 (March 2018)
General
every module now returns the total run time at the end
every module now gives the full run command back
added trim-galore dependancy to conda installation
Conda installation:
added R dependancy
fixed channel ordering
metaWRAP now immediately exits if it cannot find the config file
Read_qc:
removed unnecessary large intermediate files
added clean-up feature
--skip-bmtagger flag now properly works
Binning:
fixed bug that caused maxbin to fail with absolute path
removed unnecessary large intermediate files
added pre- and post-run clean-up feature
removed CPU bottleneck from the read alignment stage
Bin_refinement
added output dir cleanup at start of module if the directory already exists
added extra debugging info during run
.stats files now contain the proper binner source in the last column
fixed bug caused by "=" characters in contig naming
Reassemble_bins:
changed plotting color scheme
removed last column (useless NA values)
Blobology
added clean-up feature
plotting bugfixes
added more colors for bin labeling
fixed axis labels
fixed bug with error handling when bin and assembly contig sdont match
Quant_bins
fixed bug that threw an error when contig names had spaces
fixed bug that would caused the last contig in the assembly file to not be read
Annotate_bins
fixed improper perl5 library handling on some systems
removed parallelization of prokka (did not work on all systems)
improved error handling
Kraken
added -a option to ktImportText - now kronas do not require internet to view
changed krona tools version requirement to 2.5 to allow for -a option
metaWRAP v=0.7 (January 2017)
Conda installation:
removed gcc as a run dependancy to help resolve gcc library issues on some servers
changed taxator-tk version from 1.3.3 to 1.3.3e, which sources from a more updated binary source
Quant_bins module:
fixed bug where quantitation would fail if the contig names had a different naming conventaion from metaSPAdes
improved run time by not making unnecesary coppies of assembly
Classify_bins module:
fixed bug in processing columns of the blast output
Bin_refinement:
improved error handling
fixed bugs related to plot making
New module: Annotate_bins:
quickly functionally annotate a set of bins with PROKKA
metaWRAP v=0.6 (Late December 2017)
General:
completely moved to installation of metaWRAP and ALL dependancies with conda
added detailed database download and configuration instructions
reduced benign mkdir errors throughout
Assembly module:
fixed tmp directory handeling bug
Kraken module:
fixed major bug preventing subsetting reads if there were spaces in the read identifiers
added jellyfish v=1.1.11 to conda env file for kraken-build
Reassembly module:
major bug fixes with tmp directory handling
restored parallelization of SPAdes - all bins assemble simmultaneously with 1 thread now
metaWRAP v=0.5 (Mid December 2017)
General:
Added overview flowcharts for most modules, and updated older flowcharts
metaWRAP is now much more binning oriented (see overview flowchart)
Assembly module:
added minimum contig length option (defore the default was 1000bp)
sped up megahit assembly re-formatting by removing redundant sorting operations
Bin_refinement module:
parallelized Binning_refiner such that all four hydridizing operations sun simultaneously
Reassemble_bins module:
added plotting functiong to summarize the N50, completion, and contamination rankings of the before and after bins
New module: Classify bins:
Uses MEGABLAST to align the contigs in bins to RefSeq
Taxator-tx asigns taxonomy to each contig
Use weighted tree algorythm to find consensus taxonomy of each bin
metaWRAP v=0.4 (Early December 2017)
General:
Most dependancies can now be installed with a pre-configured conda environment
Binning module:
Added option to immediately run CheckM on resulting bins
ReadQC module:
Major bug fix that prevented bmtagger from filtering out human reads when the read names in the fastq had spaces
Module now exits if the provided fastq files do not exist
Bin_refinement: this module consolidats all possible binning version of the bins from the Binning module
added contig dereplication to avoid situations where a contig ends up in more than one bin
removed depricated requirement for prviding reads as input
major bug fit to prevent HMMER to run out of space in /tmp/ by providing a new tmp folder
bug fix: bin consolidation script now supports contigs in any naming format
added plotting functiong to summarize the completion and contamination rankings of the inputs, intermediate interations, and final output
Bin_reassembly: this modules takes a set of metagenomic reads, alignes them back to the bins, and uses those reads to reassemble them
removed spades parallelization due to strange signmentation fault errors...
majob bug fix of bug preventing recruiting of reads for reassembly if the read names had spaces in them
major bug fit to prevent HMMER to run out of space in /tmp/ by providing a new tmp folder
metaWRAP v=0.3 (November 2017)
General:
Error handling is greatly improved throughout the pipelines by incorporating the $? error code instead of relying on output files
Disabled PHYLOSIFT module (this software is too old)
Temporarily disabled ALL module due to increased complexity of the metaWRAP pipeline
ReadQC module:
Fixed issue with program thinking READQC did not finish correctly
Assembly module:
Removed joint assembly feature. Now you can only assemble with metaSPAdes OR Megahit.
(megahit is defualt due to its speed)
Changed minimum contig length from 500bp to 1000bp
Binning module:
This module is now split into four parts for modularity:
Binning module: this module is a basic wrapper around existing binning software (but conviniently handles intermediate files)
You can now bin with metaBAT2 and/or MaxBin2 and/or CONCOCT
Pipeline now calculates library insert sizes on the fly (and correctly)
Samtools tmp files bug fix
Samtools sort -m memory option fixes
Bin_refinement: this module consolidats all possible binning version of the bins from the Binning module
Runs CheckM on binning iterations
Produces the best possible bins
Allows user to input the minimum desired completion and maximum contamination
No longer quits if there are too many bins to plot
Bin_reassembly: this modules takes a set of metagenomic reads, alignes them back to the bins, and uses those reads to reassemble them
The reassembly function as a whole is now 100+ times faster than before due to paralelization
The read filtering for each bin is now parallelized into the same BWA alignment operation
The reassembly itself is not parallelized into 1 thread operations for each bin, which ends up being much faster
Quant_bins:
This module takes in (non-reassembled) bins and reads from many samples, and estimates the abundance of each bin in each sample with salmon
The process is paralelized and the alignment time does not scale with number of bins
Blobology module:
Allows for paralel processing of multiple read sets (but one assembly). Produces one figure per sample, with colors being consistant.
Changed annotations to blobplot file when the --bins option is selected - added additional columns
Now makes multiple bin annotation plots, annotating bins and their taxonomy
Blastn does not re-run if there is an output file with the same name already in the output folder
metaWRAP v=0.2 (August 2017)
General:
Added compatibility with long input parameters (eg. --options asdf) with getopt instead of getopts
Chenged instalation style of metaWRAP so that the contents of bin needs to be added to PATH to run. The user still needs to edit the config file first
Changed name of contig.sh to config-metawrap so be more distinguishable in PATH
All modules are now called through the master "metaWRAP" script (including the ALL module)
Changed heirarchy of pipeline scripts. Now they are hidden away in a "pipelines" folder, and only metaWRAP is visible
Changed the commenting system to be done through a python script that writes out pretty comments
New module: Phylosift. Currently all it does is randomly subsamples reads and runs Phylosift on them
ReadQC module:
Added --skip-bmtagger option (reduces runtime signifficantly)
Added --skip-trimming option
Added --skip-pre-qc-report option
Added --skip-post-qc-report option
Assembly module:
Added --only-metaspades option (for those who dont like the idea of double assembly)
Added --only-megahit option (signifficantly reduces resourse requirements and time)
Binning module:
Added --skip-checkm option (reduces runtime)
New feature where bins that are >20% completion and <10% contamination are placed into a new folder - "good bins"
Added --checkm-good-bins option, to allow making a checkm figure for only good bins (makes a prettier figure)
Added --skip-reassembly option (reduces runtime signifficantly)
New feature where after reassembly the best variant of each bin (original, strict, or permissive reassemblies) is placed into a new folder - "best bins"
Added --checkm-best-bins option, to allow re-making the checkm figure on only the best versions of each bin
Removed KRAKEN from the binning module. KRAKEN now has to be run seperately on the output bins
Not also returns the average and stdev of insert lengths for each sample - useful for library assessment
Blobology module:
Added --bins option, which allows to annotate the blobplot with bins from a binning folder
ALL module:
Major bug fixes and improvements
Added a Phylosift and Kraken on bins steps to pipeline
All output figures and reports are now saved to "metaWRAP_figures" folder at the end of the pipeline
Added --fast option, which skips some non-essential and computationally expensive parts of the pipeline
metaWRAP v=0.1 (July 2017)
First user-friendly version of metaWRAP!