Adjust bin/curate-tree-ete3.py
so it ensures the new root node is not formatted.
Made bin/curate-tree-ete3.py
able to collapse clades due to low support or
alrt values, to remove support and/or alrt values, and to scale the length of
edges root leaving a new root.
Set the length on the new root node (when using --detach
in calling
bin/curate-tree-ete3.py
) to zero, seeing as that length is related to the
distance to the node that has been deleted (after rooting).
Added --detach
option to bin/curate-tree-ete3.py
to remove the outgroup
made by rooting on a node.
Added bin/curate-tree-ete3.py
.
Small improvements to bin/tree-info.py
.
Added simple bin/tree-info.py
script to print information about tip names
and internal node labels and edge lengths in a phylogenetic tree.
Add explicit output format to samtools sort command in dark/bowtie2.py
.
More messing with indexing BAM files.
Check if the SAM file in dark/bowtie2.py
is empty by reading it and looking
for a non-header line. Undo the change of 5.0.15
. Added typing hints to
dark/bowtie2.py
.
Always make BAM in run-bowtie2.py
. Sigh.
Check whether a BAM/SAM file is empty in run-bowtie2.py
in order to avoid
calling gatk
to mark duplicates on a file with no mapped or unmapped reads,
since instead of just exiting gracefully, gatk
crashes with the typical
Java runtime stack.
Added bin/add-support-to-iqtree2-issue-343.py
script for adding support
labels to nodes in trees produced by iqtree2
when (if not run with
-keep-ident
) it adds nodes and tips for identical sequences. iqtree2
currently does not put a support label on (the edge leading to) the tip in
the original processing or onto the nodes introduced by adding tips for the
identical sequences. This is described in the iqtree2
GitHub issue
343.
This was merged late and became version 5.0.19
.
Added --rotate
and --maxWindows
options to window-split-alignment.py
.
Added intsToRanges
and intsToStringRanges
to dark/utils.py
.
Added bin/curate-trees.py
script to collapse low-support branches in
phylogenetic trees (to make polytomies) and also to re-root and ladderize
them.
Added typing hints to dark/process.py
.
Made filter-fasta.py
print an error message when the --checkResultCount
check fails even if --quiet
was used.
Add rotate
method to Read
class, plus tests.
Added --rotate
option to filter-fasta.py
.
Added --upper
, --lower
, --upperId
, and --lowerId
options to filter-fasta.py
.
Removed required
flag from --out
option of bin/plot-windowed-identity.py
.
Added bin/plot-windowed-identity.py
along with WindowedIdentity
class and tests.
Added lots of type hints. Completely removed six.
Removed gb2seq
aligner options from bin/sam-coverage-depth.py
. Bumped
major version number seeing as this will break things. I highly doubt anyone
is using the removed option. I (Terry) put it in for Christian Gabriel and
Annika Beyer during SARS-CoV-2 times.
Allow for minWindow
to be None
in window-split-alignment.py
.
Added bin/window-split-alignment.py
to split a FASTA alignment into windowed
sequences.
Added stdout
and stderr
options to dark.process.Executor
to allow
for simple printing of executed commands and their outputs.
Added bin/fasta-split-by-length.py
.
Added tiny bin/reverse-complement.py
helper script.
Use a helper function (that puts the filename into failure exceptions) to connect to sqlite3 databases.
Added bin/ids.py
script to generate incrementing lists of ids.
Added start
option to dimensionalIterator
.
Added asList
option to parseRangeString
function in utils.py
.
Added --sequenceWhitelist
, --sequenceBlacklist
,
--sequenceWhitelistFile
, --sequenceBlacklistFile
, --sequenceRegex
,
and --sequenceNegativeRegex
arguments to filter-fasta.py
to match the
corresponding options for sequence titles (ids).
Added --sortBy coverage
option to bin/sam-reference-read-counts.py
and
moved much of its code into dark/sam.py
.
Fix tiny bug in bowtie2.py
so that BAM is not accidentally created when
removing duplicates.
Prevent an IndexError
in bin/sam-reference-read-counts.py
when there
are no matching reads.
Fixed (hopefully!) a bug in bin/fasta-identity-table.py
that could cause
a KeyError
when producing a non-square table,
as described here.
Moved getNoCoverageCounts
from bin/fasta-identity-table.py
into
dark/reads.py
. Made it handle an empty set of no-coverage chars. Added
tests.
Added allowedTaxonomicRanks
argument to the SqliteIndexWriter
class for
building a protein database, and corresponding command line argument to
bin/make-protein-database.py
.
Wrap the savefig
call to make the pathogens panel (in
dark/civ/proteins.py
) in a try/except
to catch the ValueError
that
results if the image has a dimension exceeding 2^16.
Made aa-info.py
slightly more useful by identifying stop codons. Added
--details
option to request printing of amino acid property numeric
details.
Corrected aaVars
import in a few bins scripts.
Change to more recent mysql-connector-python
in setup.py
.
Added bin/fastq-set-quality.py
script.
Fixed circular import.
Removed bin/print-blast-xml.py
.
Removed bin/print-blast-xml-for-derek.py
.
Nothing.
Removed find-hits.py
from setup.py
Added type hints to scory.py and some to reads.py.
Removed bin/find-hits.py
. Fixed matplotlib deprecation warnings. Ran black.
Backed out 4.0.59 and 4.0.60 changes.
More index checking in bin/sam-coverage-depth.py
.
Subtract one from stop
arg to pysam.pileup because, despite being
0-based, pysam apparently includes the final index.
Make bin/download-refseq-viral-gbff.sh
exit non-zero if no viral genome
files are downloaded.
Use [0-9]' instead of
\din
bin/download-refseq-viral-gbff.shso that
egrep` works with the brew-installed GNU egrep on OS X.
Make utils.py
asHandle
work when passed a pathlib.Path
argument.
Added --includeAmbiguousMatches
and --includeNonGapMismatches
options to
bin/compare-sequences.py
.
Added includeAmbiguousMatches
and includeNonGapMismatches
options to
matchToString
in dark/dna.py
.
Replace use of features
in sam-coverage-depth.py
with the SAM file
reference length, to avoid trying to use an undefined features
variable
when no reference is given.
Added --force
option to bcftools index
command in dark/bowtie2.py
.
Added --reverse
and --complement
options to bin/fasta-find.py
to tell
it to also look for simple inversions and complement sequences.
Added bin/fasta-find.py
which, like bin/fasta-match-offsets.py
, also
reports matching sequence offsets in FASTA files but can match numeric
regions and also look for reverse complemented matches.
Added bin/fasta-match-offsets.py
to print offsets of sequence regular
expression matches.
More careful calling of pysam.pileup
in sam-coverage-depth.py
to avoid
an index error if a read mapping extends beyond the end of the reference
genome. pysam
was returning an invalid column.reference_pos
in that
case (invalid because the value is beyond the end of the reference, so it
can't be used as a reference offset).
Added printing of transversion and transition counts to
bin/codon-distance.py
. Output is sorted first by distance, then by number
of transitions (highest to lowest) then by codon DNA. The idea being to
present the possible codon changes to get from one amino acid to another in
the order that requires the least change to the most.
The lists in the values in the dict returned by codonInformation
now
contain 2-tuples instead of lists of length 2. If you pass
countTransitions=True
to that function it will also put the count of the
number of transitions (as opposed to transversions) into the tuple. See the
tests in test/test_codonDistance.py
.
Added --reference
option to sam-coverage-depth.py
.
Return an empty list of reads from mafft and needle when dryRun is True.
Added optional executor
and dryRun
arguments to MAFFT and needle
aligners to allow the caller to pass in a pre-existing process
executor. Added --format
option to newick-to-ascii.py
tree printing to
allow loading a wider range of Newick files.
Improve output of compare-aa-sequences.py
to show the percentage of
matches in regions that do not involve a gap in either sequence.
Bump mysql connector version due to security issue with 8.0.13
Added --sort
option to bin/fasta-identity-table.py
.
Fixed small bug in Reads.temporalBaseCounts
. Improved identity
calculation in fasta-identity-table.py
.
Pass the showNoCoverage
option value to the making of the HTML table in
bin/fasta-identity-table.py
.
Added --noNoCoverageLocations
, --noCoverageChars
, and --gapChars
option to bin/compare-sequences.py
. Fixed identity calculation bug in
bin/fasta-identity-table.py
due to not including gaps resulting from the
pairwise alignment into the calculation.
Added noCoverageChars
option to compareDNAReads
and
includeNoCoverageLocations
option to matchToString
.
Make compareDNAReads
more forgiving of unexpected characters in a DNA
sequence (specifically to deal with '?' that is used by Geneious to
indicate lack of coverage).
Added --digits
, --reverse
, --sortBy
, --header
and --sortChars
options to bin/fasta-coverage.py
. Added --regex
and reverse
options
to bin/fasta-sort.py
. Added allowGaps
and untranslatable
options to
findORF
in dark/reads.py
Added simple fasta-translate.py
script.
Small fix to text table output in bin/fasta-identity-table.py
.
Added --addZeroes
and --highlightBest
options to
bin/fasta-identity-table.py
.
Added bin/fasta-coverage.py
command. Added --upperOnly
option to
bin/fasta-identity-table.py
(and fixed bug).
Added temporalBaseCounts
method to dark.Reads
.
Added --align
, --aligner
, and --numberedColumns
options to
bin/fasta-identity-table.py
. Added checking of pre-existing gap symbols
to the edlib aligner.
Added -i
and -v
options to bin/genbank-grep.py
.
Renamed bin/genbank-to-fasta.py
to bin/genbank-grep.py
and made it able
to print in GenBank format as well as FASTA.
Added bin/genbank-to-fasta.py
to extract FASTA from GenBank flat files.
Added findORF
method to DNARead
and edlib
as an aligner option to
compare-sequences.py
.
Added --format
option to ncbi-fetch-id.py
to allow fetching of GenBank
format flat files (use --format gb
).
Removed a second unneeded viral check from dark/taxonomy.py
.
Removed unneeded viral check from dark/taxonomy.py
. Added 'generic' as a
pathogen type for protein reporting (in dark/civ/proteins.py
) and that
can be given to bin/proteins-to-pathogens-civ.py
.
Added matchAmbiguous
option to edlibAlign
.
Fixed simple-consensus.py
argument error that shouldn't have been committed.
Added edlib
alignment method.
Changed simple-consensus.py to use 'N' for no coverage and 'n' for low coverage. Fixed issue with hard-clipped bases in CIGAR strings.
Made it so the genome/protein database builder can read compressed GenBank or JSON files. Added simple scripts to download NCBI refseq viral FASTA or GenBank flat file data.
Removed assertion of incorrect assumption that a read CIGAR string cannot start with an insertion, based on a SAM file mapped using Geneious.
Added a simple consensus caller. Still needs to be stress tested in the wild, and will need more tests and code refactoring to make it more broadly useful.
Added internal optimization to make the SAM filtering fast when no filtering is needed (this is the merge of a pull request from July 2021).
Added bin/fasta-count-chars.py
script. Fixed argument bug in
run-bowtie2.py
print statement.
Fixed another tiny bug when printing dry run output in
bin/make-consensus.py
.
Fixed tiny bug when printing dry run output in bin/make-consensus.py
.
Updated TravisCI config and README build status badge URL.
Moved some code out of dark/civ/proteins.py into dark/genbank.py to make it more useful to others.
Added DistanceMatrix
class to dark/sam.py
for computing distances
between references according to which reads match both (and how well, if an
alignment score tag is given).
Color overall number of reads per pathogen in HTML output.
Fixed subtle bug introduced into bin/filter-fasta.py
due to a code
reordering.
Added printing of reference lengths to bin/sam-reference-read-counts.py
.
Changed the --sort
option of that script to --sortBy
. Added options to
stop printing references once they either have no reads mapped to them or
the number of new reads mapped to them ("new" as in not already mapping to
an earlier reference) falls to zero.
Added fasta-split.py
command to split a FASTA/Q file into multiple files,
each containing a given number of sequences. Added dark/utils.py
function
take
to repeatedly yield lists of a given number of things from an
iterable.
Added --topReferenceIdsFile
option to sam-reference-read-counts.py
to
allow the ids of the best-matching reference to be saved. Probably this
should save the FASTQ.
Improved sam-reference-read-counts.py
output to not double-count reads
that fall into multiple categories and also to report how many reads match
references that don't match any earlier reported reference (to give some
idea of how many reads uniquely match references, where 'unique' means
didn't match any other reported reference (with more reads when --sort
is
also used).
Added --sort
option to sam-reference-read-counts.py
to sort output
(i.e., matched references) by highest number of mapped reads.
btop2cigar
now returns a generator that gives the individual parts of a
CIGAR string. Use ''.join(btop2cigar(...))
to get the full CIGAR string.
This is not backwards compatible, hence the major version number change.
Pass the tmpChmod
argument to a call to Bowtie2
for which it was
missing.
Added --maxMatchingReads
to title fitering.
Added BAM file sorting to make-consensus.py
so ivar primer trimming actually
works.
Added ivar primer trimming (via the --ivarBedFile
argument) to
make-consensus.py
.
Changed how ivarFrequencyThreshold
is set and checked in
make-consensus.py
.
Added option to use ivar
for consensus calling in make-consensus.py
.
Added warning when --removeDuplicatesUseMD5
is used without one of
--removeDuplicates
or --removeDuplicatesById
. Added --tmpChmod
option
to run-bowtie2.py
.
Dropped the FastaFaiIndex
class. Use SqliteIndex
instead. Fixed two
skipped Python 3 tests (for gzipped and bz2'd file reading).
Another attempt to fix the bug in bowtie2 that was introduced by adding the
--removePrimersFromBedFile
option to run-bowtie2.py
.
Fixed bug in bowtie2 that was introduced by adding the
--removePrimersFromBedFile
option to run-bowtie2.py
.
Fixed two failing tests due to changed DIAMOND (version 2.0.6) bitscore calculation. If you have an older DIAMOND version you may need to update it, if you plan to run the tests.
Added --removePrimersFromBedFile
option to run-bowtie2.py
. Fixed small
bug in codon-distance.py
.
Added an option to trim primers using ivar trim to run-bowtie2.py
.
sam-coverage.py
now prints the min and max coverage per alignment too.
Added indexing to callHaplotypesBcftools
in bowtie2.py
.
Fixed errors from new version of flake8.
Added --callHaplotypesBcftools
to bowtie2.py
.
Added --noFilter
option to sam-coverage.py
and sam-coverage-depth.py
to allow them to run faster when no special filtering is needed.
Added --callHaplotypesGATK
to bin-make-consensys.py
.
Added code to combine multiple sequences (see bin/combine-sequences.py
).
Changed --maskNoCoverage
to --maskLowCoverage
in make-consensus.py
and
have it take an argument of the minimum coverage at which to call the
consensus.
Added --sample-ploidy 1
to halpotype calling in bowtie2.py
.
Improved aa-info.py
to also match on partial full names. Improved
taxonomic detection of plant-only viruses.
Added count
variable to format-fasta.py
and a --start
option to set
its starting value.
Added format-fasta.py
script. Reverted unecessary hacks to
fasta-sequences.py
to print MD5 sums, etc.
Added --md5OneLine
argument to fasta-sequences.py
. Prints TAB-separated
MD5 (sequence) sum, then the read id, then the sequence (and quality, if
any).
Added --maxNFraction
argument to filter-fasta.py
.
Added --md5
arg to fasta-sequences.py
.
Added expand all and collapse all buttons to HTML output generated by
proteins-to-pathogens-civ.py
.
Tiny change to improve the HTML output generated by
proteins-to-pathogens-civ.py
.
Many small changes to improve the HTML output generated by
proteins-to-pathogens-civ.py
.
Fix idiotic logic error. Working too fast on complicated HTML-producing code with no tests. But still idiotic.
Sort sample names in HTML output of civ/proteins.py
. Added several names
to PLANT_ONLY_VIRUS_REGEX
in taxonomy.py
.
Added long comment and ValueError
explanation regarding protein look-up
error likely due a protein database that's out-of-date with respect to
earlier result files.
Fix dumb error in accessing proteinAccession
method.
Improve error messages to help debugging etc., when proteins cannot be looked up and when exclusive host viruses are excluded in making a protein/genome database.
Added parseRangeExpression
to utils.py
.
Fixed stupid argument mixup in calling bcftools consensus
in
make-consensus.py
.
Doubled some undoubled percentages in format string in make-consensus.py
.
Added --maskNoCoverage
option to make-consensus.py
.
Added sequenceToRegex
function to dark/dna.py
.
Added fasta-variable-sites.py
script.
Improve the printing of ambiguous sites to not show gaps in
compare-sequences.py
.
Added --showAmbiguous
to compare-sequences.py
. Pass --iupac-codes
to
bcftools consensus
in make-consensus.py
. Add --max-depth 5000
to
bcftools mpileup
call, also in make-consensus.py
.
Make sam-coverage-depth.py
not throw an error if there is no coverage at
all.
Pass the reference id to the idLambda
function in make-consensus.py
.
Moved tempdir
assignment in make-consensus.py
out one level so that it
happens including when we are given a VCF file.
Wrapped reads saving in a Reads()
instance in compare-sequences.py
.
Unreleased.
Added --sampleName
and --readGroup
options to Bowtie2.align
. Added
--id
and --idLambda
options to make-consensus.py
to make it possible
to set (or adjust) the name of the generated consensus.
Trivial change to setting of tempdir
in run-bowtie2.py
.
Change call to samtools.
Added --tempdir
argument to run-bowtie2.py
.
Added make-consensus.py
script and tests (for a new dark/bowtie2.py
file). Upgraded run-bowtie2.py
.
Added --picard
to run-bowtie2.py
command to allow removal of duplicates
found by Picard.
Added run-bowtie2.py
command.
Make compare-sequences.py
fall back to use stretcher
if the call to
needle
fails because the sequences are too long.
Fixed incorrect calculation of covered offset and total bases counts when
excluding reads based on minimum number of overlapping offsets in
dark/genomes.py
.
Fix problem with bash set -u checking in download-genbank.sh
.
Add final genome-protein-summary.py
script.
Drop Python 3.8 from Travis and tox checking.
Drop Python2 from Travis and tox checking. Ugh.
Made sequence translation code work under Python 2 (again, even more hopefully than the last time).
Made sequence translation code work under Python 2 (again, even more hopefully).
Made sequence translation code work under Python 2 (hopefully).
Added sam-coverage-depth.py
.
Added --minGenomeLength
to make-protein-database.py
.
Removed the unused taxonomy
(VARCHAR
) column from the genomes table of
the protein database.
Fixed silly bug in alignment filtering code that was somehow not tested.
Added --percentagePositiveCutoff
argument to alignment-panel-civ.py
and noninteractive-alignment-panel.py
and all that implies, down to
reading DIAMOND output with the ppos
value, storing it, restoring it,
and filtering on it in dark.alignments.ReadsAlignmentsFilter
, plus tests.
Added --minProteinCount
argument to proteins-to-pathogens-civ.py
and
proteins-to-pathogens.py
Fixed a tiny Python <= 3.6 test output difference.
Fixed trivial Python2 incompatibility.
Added --percentageIdenticalCutoff
argument to alignment-panel-civ.py
and noninteractive-alignment-panel.py
and all that implies, down to
reading DIAMOND output with the pident
value, storing it, restoring it,
and filtering on it in dark.alignments.ReadsAlignmentsFilter
, plus tests.
Minor changes to HTML output.
Allow multiple --preamble
args to proteins-to-pathogens-civ.py
. Tiny
cosmetic adjustments to output HTML.
Added read count color levels indicator in HTML. Added
--bootstrapTreeviewDir
to proteins-to-pathogens-civ.py
and the toHTML
method of dark.civ.proteins.ProteinGrouper
.
Fix (again) newline in HTML summary output.
Fix newline in HTML summary output.
Fix tiny bug in print arguments in dark/civ/proteins.py
.
Added --readCountColor
and --defaultReadCountColor
to
proteins-to-pathogens-civ.py
for differential coloring of read
counts. Added citrus yellow
to plant-only virus name regex in
dark/taxonomy.py
.
Added a const value for --pathogenPanelFilename
in proteins-to-pathogens-civ.py
.
Added --dnaOnly
and --maxGenomeLength
args to make-protein-database.py
.
Improved isRNAVirus
function so it returns True
on retroviridae.
Improved the log output of the same script. Added a test (for HIV as an RNA
virus) and some small clean-ups.
Fixed incorrect with
statement in taxonomy.py
. Improved description of
fields in civ proteins HTML.
Standardized scripts that need a taxonomy database to use
--taxonomyDatabase
on the command line and two utility functions in
dark/taxonomy.py
to read them and also look in the
DARK_MATTER_TAXONOMY_DATABASE
environment variable.
Added dryRun
, useStderr
, and handling of keyword arguments to the
Executor.execute
method (in dark/process.py
).
Add LineageElement
to taxonomy.py
. Get rid of _preprocessLineage
function and instead just have the Taxonomy.lineageFromTaxid
method
adjust the 'no rank' ranks to be -
. Added skipFunc
and stopFunc
to
lineage processing.
Make it so get-taxonomy.py
and get-hosts.py
can accept a name (e.g.,
Hymenoptera) as well as a taxonomy id or accession number.
Fixed setup.py
error.
Added get-hosts.py
and get-taxonomy.py
scripts.
Added extremely basic bin/describe-protein-database.py
command. To be
added to.
Added taxonomy info to HTML output.
Removed unguarded call to self.pathogenPanel
that should have been
deleted on the last commit in dark.civ.proteins.py
toHTML
method.
Don't try to make a pathogen panel (in the dark.civ.proteins.py
toHTML
method) if no pathogens were matched.
Fixed old code in toStr
.
Fixed args in call to toStr
in proteins-to-pathogens-civ.py
.
Added dark.civ
to packages in setup.py
.
Added make-protein-database.py
, download-genbank.sh
, and
parse-genbank-flat-file.py
scripts, as well as doc/protein-database.md
with some instructions on how to make a protein database. Added CIV
(Charite Institute of Virology) scripts proteins-to-pathogens-civ.py
and
alignment-panel-civ.py
.
Added bin/create-newick-relabeling-output.py
and bin/relabel-newick-tree.py
.
Add --omitVirusLinks
and --omitSampleProteinCount
options to
proteins-to-pathogens.py
to make HTML output less confusing when running
on RVDB or OKIAV databases. Removed highlighting of pathogens with high
protein fraction since that was done in a non-useful way. Removed index
field from HTML output and removed HSP count unless it differs from the
read count.
Fixed silly import error.
Added link to per-pathogen reads in protein-to-pathogens HTML output for Julia.
Slightly adjust appearance of HTML links for pathogens.
Added search link for ICTV for viruses in HTML output.
Added --whitelistFile
and --blacklistFile
options to
noninteractive-alignment-panel.py
.
Adjust how protein and genome accession numbers are looked for in
ProteinGrouper
depending on whether we guess they are NCBI or RVDB
sequence ids.
Make NCBISequenceLinkURL
raise a more informative IndexError
when it
cannot extract the wanted field.
Fixed stupid typo in proteins-to-pathogens.py
.
Added titleRegex
and negativeTitleRegex
to ProteinGrouper
class and
--titleRegex
and --negativeTitleRegex
arguments to
proteins-to-pathogens.py
.
Added --title
and --preamble
args to output from
proteins-to-pathogens.py
. Fixed ProteinGrouper
HTML NCBI protein link
and added genome link. Added positive and negative filtering by regex to
TitlesAlignments
and tests. Improved NCBI link generation and tests.
master
Refactored SAMFilter
to allow filtering alignments in pileups. Added
bin/sam-coverage.py
.
Use dark.utils.StringIO
everywhere as it can be used as a context manager
in Python 2.
Added pysam
to setup.py
install_requires
list. Removed cffi
. Fixed
tests that were failing under Linux (apart from pyfaidx tests which are now
skipped on Linux). Removed mocking File
class and replaced it with
StringIO
.
master
Fixed AAread.ORFs
function in the AARead
class and moved the
--allowOpenORFs
(True/False) check to within the function. Added a
DNAKozakRead
class. Changed extract-ORFs.py
so that information
about Kozak consensus sequences can be returned.
Removed bone-headed use of full path to fasta-join.sh
from
bin/fasta-diff.sh
.
Added compareAaReads
and matchToString
to aa.py
. Wrote tests in
test_aa.py
for both. Moved countPrint
to utils, used by matchToString
in dna.py
and aa.py
. Added compare-aa-sequences.py
to the bin.
Added matchToString
to dna.py
to allow printing of a DNA match.
Added --reverse
and --reverseComplement
options to filter-fasta.py
and the underlying ReadFilter
class.
In reads.py
, changed the _makeComplementTable
function so that
uppercase and lowercase bases are correctly reverse complemented into their
respective uppercase and lowercase complementary letters. Added a test to
test/reads.py
to confirm that reverseComplement
does this.
Added --sampleName
option to proteins-to-pathogens
.
Added --maxORFLength
option to extract-ORFs.py
. Fixed logic in
retrospect.
Added --removeIds
option to fasta-diff.sh
(and its helper script
fasta-join.py
).
Make convert-diamond-to-sam.py
print the correct (nucleotide) offset of
the start of the match, as though its subject sequence had been
nucleotides.
Make convert-diamond-to-sam.py
print the list of required fields in case
an input line cannot be split into the expected number of fields.
convert-diamond-to-sam.py
was putting the incorrect (AA, not nucleotide)
reference length into the SAM output. Introduced some spaces for easier
table layout into the HTML generated by fasta-identity-table.py
when
called by compare-consensuses.py
.
Fixed #650,
exception in SAMFilter
when quality is *
in a SAM file.
Added hard-clipping to CIGAR in SAM created by convert-diamond-to-sam.py
.
Use from six import StringIO
to avoid a PY2/3 incompatibility.
Added bin/convert-diamond-to-sam.py
script to convert DIAMOND output
format 6 to SAM.
Added btop2cigar
to dark.btop
to convert BTOP strings to CIGAR strings.
Fixed flake8
warnings about single backslashes in strings.
Make fasta-diff.sh
use GNU parallel, if installed.
Make fasta-diff.sh
handle compressed files, exit with diff's status,
and make it possible to pass command line options through to diff.
Added bin/fasta-diff.sh
as a quick diff function that knows about
FASTA/FASTQ files. Added bin/fasta-join.py
as a helper function for
bin/fasta-diff.sh
.
Fix 636 in which SAM file
parsing threw an exception when an unmapped sequence with no CIGAR string
occurred in a SAM file (this can happen when running bowtie2 --all
).
Fixed thinko in 3.0.42.
Pysam issue 716
wasn't solved the way we hoped it would be, so now the filter-sam.py
command always passes the template of the original SAM file to the
constructor for the new file. As a result, the new file will have @SN
entries for all the original sequences, even when --referenceId
is passed
to filter-sam.py
to restrict output to a specific set of reference ids.
Added --showDiffs
option to bin/compare-sequences.py
.
Force use of mysql-connector-python
version 8.0.11
in
requirements.txt
due to segmentation fault running tests using TravisCI.
Set version 0.5.0
of pyfaidx
to avoid error in pyfaidx/__init__.py
line 711 (AttributeError: 'str' object has no attribute 'decode') in later
pyfaidx
version. Removed some deprecation warnings when running tests.
The fix to solve #630 was insufficient. That's fixed in this release, hopefully!
Fixed #630 to deal with non-hard-clipped queries that have a CIGAR string that indicates they have been clipped.
Add a --titlesJSONFile
option to noninteractive-alignment-panel.py
.
Added whitelistFile
and blacklistFile
to ReadsAlignmentsFilter
class
in dark/alignments.py
.
Fixed small bug in filter-hits-to-fasta.py
.
Added flushing of intermediate output in noninteractive-alignment-panel.py
.
Factored common SAM filtering code out into dark.sam.SAMFilter
. Split
common FASTA command-line options into those for filtering (this is just
inclusion/exclusion) and those for editing (which change the FASTA records
in some way).
Added compare-consensuses.py
script.
Added bin/newick-to-ascii.py
script.
Added storeQueryIds
option to PaddedSAM.queries
method.
Added alignmentCount
attribute to PaddedSAM
class.
Renamed alsoYieldAlignments
option of PaddedSAM.queries
to addAlignment
and add the alignment to the Read
instance instead of returning a tuple.
Added alsoYieldAlignments
option to PaddedSAM.queries
method to have
the returned generator also yield the pysam.AlignedSegment
instance with
the gap-padded query sequence. This makes it possible to retrieve padded
queries from SAM/BAM and generate SAM/BAM (or FASTQ) of some subset of the
queries.
Added bin/filter-sam.py
script.
Made a change in dark/dna.py
, to make identicalMatchCount
only count non-
ambiguous matches. Added testfunction testMatchwithIdenticalAmbuguity
.
Added TravisCI Slack notifications.
Made noninteractive-alignment-panel.py
option --outputDir
to be required.
Added error message for this in graphics.py
.
Changed the way reference sequence insertions are stored in a
dark.sam.PaddedSAM
instance to make it possible to tell which query
sequences caused reference insertions.
Made dark/sam.py
properly deal with secondary alignments that are missing
a SEQ.
Added sam-reference-read-counts.py
script.
Updated ViralZone search URL in dark/proteins.py
.
Added --sites
argument to compare-dna-sequences.py
and corresponding
offsets
argument to the underlying function.
Fixed bug that got introduced when doing 3.0.17 June 14, 2018
.
Fixed bug that got introduced when doing 3.0.16 June 14, 2018
.
Made a change in dark/proteins.py
, to make the minProteinFraction
work
on a per sample basis, not per pathogen.
Fixed another bug (unreference variable) in graphics.py
that crept in in
version 3.0.10 June 11, 2018
.
Fixed a bug in diamond/alignments.py
that crept in in version
3.0.10 June 11, 2018
.
Fixed a bug in noninteractive-alignment-panel.py
that crept in after
version 3.0.10 June 11, 2018
.
pip install mysql-connector-python
now works, so addedmysql-connector-python>=8.0.11
torequirements.txt
, removedinstall-dependencies.sh
, and updated install line in.travis.yml
.
- Added
bin/sam-to-fasta-alignment.py
script.
Dropped requirement that noninteractive-alignment-panel.py
must be passed
information about the subject database. This is now only needed if
--showOrfs
is given. The issue is that making the subject database can
take a long time and display of the subject ORFs is usuallly not needed.
Internal only.
- Added
--color
option tofasta-identity-table.py
.
- Changed
Makefile
upload
target rule.
- Moved all GOR4 amino acid structure prediction code into its own repo, at https://github.com/acorg/gor4.
- As a result, the
gor4
method on thedark.reads.AAread
class has been removed. This could be re-added by includinggor4
as a requirement but for now if you want that functionality you'll need to installgor4
yourself and write a trivial function to call thegor4
method on the read (or make a subclass ofAARead
that adds that method). I've done it this way because we have someone using the code who does not have a working C compiler and this was causing a problem building dark matter. Not a good reason, I know, but the GOR4 code makes for a good standalone code base in any case.
- Added
--sampleIndexFilename
and--pathogenIndexFilename
toproteins-to-pathogens.py
. These cause the writing of files containing lines with an integer index, a space, then a sample or pathogen name. These can be later used to identify the de-duplicated reads files for a given sample or pathogen name.
- Added number of identical and positive amino acid matches to BLAST and DIAMOND hsps.
- The protein grouper now de-duplicates read by id, not sequence.
- Fixed HTML tiny formatting error in
toHTML
method ofProteinGrouper
indark/proteins.py
.
- The
--indices
option tofilter-fasta.py
was changed to accept a string range (like 10-20,25-30,51,60) instead of a list of single integers. It is renamed to--keepSequences
and is also now 1-based not 0-based, like its friends--keepSites
. --removeSequences
was added as an option tofilter-fasta.py
.- The options
--keepIndices
,--keepIndicesFile
,--removeIndices
, andremoveIndicesFile
tofilter-fasta.py
are now named--keepSites
,--keepSitesFile
,--removeSites
, andremoveSitesFile
though the old names are still supported for now. - The
indicesMatching
methods ofReads
is renamed tositesMatching
. removeSequences
was added to read filtering inReadFilter
and as a--removeSequences
option tofilter-fasta.py
.