Skip to content

Commit

Permalink
Merge pull request #44 from aaranyue/extract-ref-flanks
Browse files Browse the repository at this point in the history
bump version to V1.2.3
  • Loading branch information
Echoring authored Dec 16, 2024
2 parents f7a0256 + 0ab0a01 commit 382fc47
Show file tree
Hide file tree
Showing 4 changed files with 65 additions and 28 deletions.
20 changes: 16 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,12 @@ Task include:
- [CentroMiner](#CentroMiner): centromere candidate prediction

## Version Change log
1.2.3
- Add new '--extract-ref-flanks' option for AssemblyMapper, which allow chimeric contig output for gap filling. (see [issue #42](https://github.com/aaranyue/quarTeT/issues/42) for detail)
- Support Unimap as aligner. As a optimized version of minimap2, it still use the '--minimapoption'.
- Fix a bug that monopolizer contig length standard is too high.
- Deprecate the web server for maintenance issue (sorry!).

1.2.2
- Add new '--keep' option for AssemblyMapper, which add all unplaced contigs in the draft genome.
- Add a new output AGP file for GapFiller to describe the modified chromosome structure.
Expand All @@ -29,7 +35,7 @@ Task include:
- Support RepeatMasker's TE annotation format for CentroMiner module.

1.1.6
- add new option 'maximum TR length' (-r) for CentroMiner to avoid trf stuck. (Thanks to atotickov, PR#22)
- add new option 'maximum TR length' (-r) for CentroMiner to avoid trf stuck. (Thanks to atotickov, [PR #22](https://github.com/aaranyue/quarTeT/pull/22))
- disable unstable 'join' mode for GapFiller by default. Use '--enablejoin' option to enable this mode. '--fillonly' option is removed.

1.1.5
Expand Down Expand Up @@ -78,12 +84,13 @@ Task include:

## Getting Started
### Use quarTeT on Web
quarTeT can be easily accessed on [our web server](http://www.atcgn.com:8080/quarTeT/home.html). (Currently its version is lagged.)
~~quarTeT can be easily accessed on [our web server](http://www.atcgn.com:8080/quarTeT/home.html).~~ (Currently deprecated.)
### Use quarTeT on local
quarTeT command-line program is availble for Linux.
#### Dependencies
- [Python3](https://www.python.org/) (>3.6, tested on 3.7.4 and 3.9.12)
- [Minimap2](https://github.com/lh3/minimap2) (tested on 2.24-r1122 and 2.24-r1155-dirty)
- (Optional)[Unimap](https://github.com/lh3/unimap) (tested on 0.1-r41)
- [MUMmer4](https://github.com/mummer4/mummer) (tested on 4.0.0rc1)
- [trf](https://github.com/Benson-Genomics-Lab/TRF) (tested on 4.09)
- [CD-hit](https://github.com/weizhongli/cdhit) (tested on 4.6 and 4.8.1)
Expand Down Expand Up @@ -130,6 +137,7 @@ Note that contigs should be phased.
It's recommended to obtain such an assembly using [hifiasm](https://github.com/chhylp123/hifiasm).

you can convert `{prefix}.bp.hap1.p_ctg.gfa` and `{prefix}.bp.hap2.p_ctg.gfa` generated by hifiasm to FASTA format as input, respectively.

```
Usage: python3 quartet.py AssemblyMapper <parameters>
-h, --help show this help message and exit
Expand All @@ -142,14 +150,17 @@ Usage: python3 quartet.py AssemblyMapper <parameters>
The min alignment identity to be select (%), default: 90
-p PREFIX The prefix used on generated files, default: quarTeT
-t THREADS Use number of threads, default: 1
-a {minimap2,mummer} Specify alignment program (support minimap2 and mummer), default: minimap2
-a {minimap2,unimap,mummer}
Specify alignment program (support minimap2, unimap and mummer), default: minimap2
--nofilter Use original sequence input, no filtering.
--keep Keep the unplaced contigs in draft genome
--extract-ref-flanks CHIMERA
Add an output of chimera contig containing reference flanks of x bp (check issue#42 for detail), default: 0 (off)
--plot Plot a colinearity graph for draft genome to reference alignments. (will cost more time)
--noplot Skip all ploting.
--overwrite Overwrite existing alignment file instead of reuse.
--minimapoption MINIMAPOPTION
Pass additional parameters to minimap2 program, default: -x asm5
Pass additional parameters to minimap2/unimap program, default: -x asm5
--nucmeroption NUCMEROPTION
Pass additional parameters to nucmer program.
--deltafilteroption DELTAFILTEROPTION
Expand All @@ -174,6 +185,7 @@ A gap-tied genome and corresponding long-reads are required as input, both in fa
If possible, using long-reads assembled and polished contigs instead of reads may improve the quality.

GapFiller support two mode: `fill` and `join`. `fill` means find a segment that can be placed in the gap and link them together. `join` means find an evidence that the gap edge is overlaped and can be directly merged into one. For now, `join` mode is unstable and disabled by default. If you want to enable it, be careful and check the result manually.

```
Usage: python3 quartet.py GapFiller <parameters>
-h, --help show this help message and exit
Expand Down
2 changes: 1 addition & 1 deletion quartet.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import sys

usage = '''quarTeT: Telomere-to-telomere Toolkit
version 1.2.2
version 1.2.3
Usage: python3 quartet.py <module> <parameters>
Expand Down
61 changes: 43 additions & 18 deletions quartet_assemblymapper.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/usr/bin/env python3
# Last modified: V1.2.2
# Last modified: V1.2.3

import argparse
import subprocess
Expand All @@ -10,7 +10,7 @@

### MAIN PROGRAM ###
def AssemblyMapper(args):
refgenomefile, qryfile, mincontiglength, minalignmentlength, minalignmentidentity, prefix, threads, aligner, nofilter, keep, plot, noplot, overwrite, nucmeroption, deltafilteroption, minimapoption = args
refgenomefile, qryfile, mincontiglength, minalignmentlength, minalignmentidentity, prefix, threads, aligner, nofilter, keep, chimera, plot, noplot, overwrite, nucmeroption, deltafilteroption, minimapoption = args

# split scaffolds to contigs and remove short contigs
print('[Info] Filtering contigs input...')
Expand Down Expand Up @@ -54,7 +54,7 @@ def AssemblyMapper(args):
if line.startswith('#'):
continue
tigid, tiglen, status = line.split()[0:3]
if status == 'both' and int(tiglen) >= minchrlen:
if status == 'both' and int(tiglen) >= minchrlen / 2:
monopolize.append(tigid)
elif status == 'left':
forceleft.append(tigid)
Expand Down Expand Up @@ -100,8 +100,8 @@ def AssemblyMapper(args):
alignment['sumnegative'] += int(qrylen)
allAlignment[f'{refid}#{qryid}'] = alignment

elif aligner == 'minimap2':
paffile = quartet_util.minimap(refgenomefile, contigfile, prefix, 'contig_map_ref', minimapoption, doplot, overwrite)
elif aligner == 'minimap2' or aligner == 'unimap':
paffile = quartet_util.minimap(refgenomefile, contigfile, prefix, 'contig_map_ref', minimapoption, doplot, overwrite, aligner)

with open(paffile, 'r') as f:
for line in f:
Expand All @@ -111,7 +111,7 @@ def AssemblyMapper(args):
if float(match) / float(alignlen) < minalignmentidentity:
continue
if f'{refid}#{qryid}' not in allAlignment:
alignment = {'refid': refid, 'qryid': qryid, 'weight': 0, 'sumposition': 0, 'sumpositive': 0, 'sumnegative': 0, 'score': 0}
alignment = {'refid': refid, 'qryid': qryid, 'weight': 0, 'sumposition': 0, 'sumpositive': 0, 'sumnegative': 0, 'score': 0, 'refstart': 0, 'refend': 0}
else:
alignment = allAlignment[f'{refid}#{qryid}']
alignment['weight'] += int(alignlen)
Expand All @@ -121,6 +121,10 @@ def AssemblyMapper(args):
alignment['sumpositive'] += int(alignlen)
else:
alignment['sumnegative'] += int(alignlen)
if alignment['refstart'] < int(refstart):
alignment['refstart'] = int(refstart)
if alignment['refend'] < int(refend):
alignment['refend'] = int(refend)
allAlignment[f'{refid}#{qryid}'] = alignment
if allAlignment == {}:
print(f'[Error] All alignments are filtered. Recommended to adjust filter arguments.')
Expand All @@ -134,11 +138,11 @@ def AssemblyMapper(args):
if alignment['qryid'] in monopolize:
strand = '+' if alignment['sumpositive'] >= alignment['sumnegative'] else '-'
if alignment['qryid'] not in map:
map[alignment['qryid']] = {'refid': alignment['refid'], 'score': alignment['score'], 'position': 0, 'strand': strand}
map[alignment['qryid']] = {'refid': alignment['refid'], 'score': alignment['score'], 'position': 0, 'strand': strand, 'refstart': alignment['refstart'], 'refend': alignment['refend']}
else:
# solve double mapping by score
if alignment['score'] > map[alignment['qryid']]['score']:
map[alignment['qryid']] = {'refid': alignment['refid'], 'score': alignment['score'], 'position': 0, 'strand': strand}
map[alignment['qryid']] = {'refid': alignment['refid'], 'score': alignment['score'], 'position': 0, 'strand': strand, 'refstart': alignment['refstart'], 'refend': alignment['refend']}
monopolizedchr = [y['refid'] for x, y in map.items()]

for key, alignment in allAlignment.items():
Expand All @@ -151,11 +155,11 @@ def AssemblyMapper(args):
else:
position = alignment['sumposition'] / alignment['weight']
if alignment['qryid'] not in map:
map[alignment['qryid']] = {'refid': alignment['refid'], 'score': alignment['score'], 'position': position, 'strand': strand}
map[alignment['qryid']] = {'refid': alignment['refid'], 'score': alignment['score'], 'position': position, 'strand': strand, 'refstart': alignment['refstart'], 'refend': alignment['refend']}
else:
# solve double mapping by score
if alignment['score'] > map[alignment['qryid']]['score']:
map[alignment['qryid']] = {'refid': alignment['refid'], 'score': alignment['score'], 'position': position, 'strand': strand}
map[alignment['qryid']] = {'refid': alignment['refid'], 'score': alignment['score'], 'position': position, 'strand': strand, 'refstart': alignment['refstart'], 'refend': alignment['refend']}


# write all contigs' destination
Expand Down Expand Up @@ -188,6 +192,25 @@ def AssemblyMapper(args):
w.write(f'{tigid}\t{tiglen}\t{target}\n')
print(f'[Output] Mapping result for each contigs write to: {contigmapinfofile}')

# chimera option
if chimera != 0:
refdict = quartet_util.readFastaAsDict(refgenomefile)
with open(f'{prefix}.chimera_contig.fasta', 'w') as nw:
for tigid in map:
refid = map[tigid]['refid']
refstart = map[tigid]['refstart']
refend = map[tigid]['refend']
strand = map[tigid]['strand']
tigseq = totaldict[tigid]
left = max(refstart-chimera, 1)
right = min(refend+chimera, len(refdict[refid]))
if strand == '+':
seqout = refdict[refid][left-1:refstart] + tigseq + refdict[refid][refend:right+1]
else:
seqout = quartet_util.reversedseq(refdict[refid][refend:right+1]) + tigseq + quartet_util.reversedseq(refdict[refid][left-1:refstart])
nw.write(f'>{tigid}_chimera_{refid}_{left}~{refstart}+this+{refend}~{right}\n{seqout}\n')
print(f'[Output] Chimeric contigs write to: {prefix}.chimera_contig.fasta')

# check if all Chr have contigs mapped.
mappedchrlist = []
for qryid, mapinfo in map.items():
Expand Down Expand Up @@ -273,8 +296,8 @@ def AssemblyMapper(args):
print('[Info] Aligning draft and ref to plot...')
if aligner == 'mummer':
quartet_util.mummer(refgenomefile, draftgenomefastafile, prefix, 'draftgenome_mummer_ref', nucmeroption, deltafilteroption, plot, overwrite)
if aligner == 'minimap2':
quartet_util.minimap(refgenomefile, draftgenomefastafile, prefix, 'draftgenome_map_ref', minimapoption, plot, overwrite)
if aligner == 'minimap2' or aligner == 'unimap':
quartet_util.minimap(refgenomefile, draftgenomefastafile, prefix, 'draftgenome_map_ref', minimapoption, plot, overwrite, aligner)

### RUN ###
if __name__ == '__main__':
Expand All @@ -287,13 +310,14 @@ def AssemblyMapper(args):
parser.add_argument('-i', dest='min_alignment_identity', type=float, default=90, help='The min alignment identity to be select (%%), default: 90')
parser.add_argument('-p', dest='prefix', default='quarTeT', help='The prefix used on generated files, default: quarTeT')
parser.add_argument('-t', dest='threads', default='1', help='Use number of threads, default: 1')
parser.add_argument('-a', dest='aligner', choices=['minimap2', 'mummer'], default='minimap2', help='Specify alignment program (support minimap2 and mummer), default: minimap2')
parser.add_argument('-a', dest='aligner', choices=['minimap2', 'unimap', 'mummer'], default='minimap2', help='Specify alignment program (support minimap2, unimap and mummer), default: minimap2')
parser.add_argument('--nofilter', dest='nofilter', action='store_true', default=False, help='Use original sequence input, no filtering.')
parser.add_argument('--keep', dest='keep', action='store_true', default=False, help='Keep the unplaced contigs in draft genome')
parser.add_argument('--extract-ref-flanks', dest='chimera', type=int, default=0, help='Add an output of chimera contig containing reference flanks of x bp (check issue#42 for detail), default: 0 (off)')
parser.add_argument('--plot', dest='plot', action='store_true', default=False, help='Plot a colinearity graph for draft genome to reference alignments. (will cost more time)')
parser.add_argument('--noplot', dest='noplot', action='store_true', default=False, help='Skip all ploting.')
parser.add_argument('--overwrite', dest='overwrite', action='store_true', default=False, help='Overwrite existing alignment file instead of reuse.')
parser.add_argument('--minimapoption', dest='minimapoption', default='-x asm5', help='Pass additional parameters to minimap2 program, default: -x asm5')
parser.add_argument('--minimapoption', dest='minimapoption', default='-x asm5', help='Pass additional parameters to minimap2/unimap program, default: -x asm5')
parser.add_argument('--nucmeroption', dest='nucmeroption', default='', help='Pass additional parameters to nucmer program.')
parser.add_argument('--deltafilteroption', dest='deltafilteroption', default='', help='Pass additional parameters to delta-filter program.')

Expand All @@ -311,6 +335,7 @@ def AssemblyMapper(args):
aligner = parser.parse_args().aligner
nofilter = parser.parse_args().nofilter
keep = parser.parse_args().keep
chimera = parser.parse_args().chimera
plot = parser.parse_args().plot
noplot = parser.parse_args().noplot
overwrite = parser.parse_args().overwrite
Expand All @@ -320,14 +345,14 @@ def AssemblyMapper(args):
nucmeroption = parser.parse_args().nucmeroption + f' -t {threads}'
deltafilteroption = parser.parse_args().deltafilteroption + f' -l {minalignmentlength} -i {minalignmentidentity}'
minimapoption = ''
if aligner == 'minimap2':
quartet_util.check_prerequisite(['minimap2'])
if aligner == 'minimap2' or aligner == 'unimap':
quartet_util.check_prerequisite([aligner])
nucmeroption = ''
deltafilteroption = ''
minimapoption = parser.parse_args().minimapoption + f' -t {threads}'

# run
args = [refgenomefile, qryfile, mincontiglength, minalignmentlength, minalignmentidentity,
prefix, threads, aligner, nofilter, keep, plot, noplot, overwrite, nucmeroption, deltafilteroption, minimapoption]
print(f'[Info] Paramater: refgenomefile={refgenomefile}, qryfile={qryfile}, mincontiglength={mincontiglength}, minalignmentlength={minalignmentlength}, minalignmentidentity={minalignmentidentity}, prefix={prefix}, threads={threads}, aligner={aligner}, nofilter={nofilter}, keep={keep}, plot={plot}, noplot={noplot}, overwrite={overwrite}, nucmeroption={nucmeroption}, deltafilteroption={deltafilteroption}, minimapoption={minimapoption}')
prefix, threads, aligner, nofilter, keep, chimera, plot, noplot, overwrite, nucmeroption, deltafilteroption, minimapoption]
print(f'[Info] Paramater: refgenomefile={refgenomefile}, qryfile={qryfile}, mincontiglength={mincontiglength}, minalignmentlength={minalignmentlength}, minalignmentidentity={minalignmentidentity}, prefix={prefix}, threads={threads}, aligner={aligner}, nofilter={nofilter}, keep={keep}, chimera={chimera}, plot={plot}, noplot={noplot}, overwrite={overwrite}, nucmeroption={nucmeroption}, deltafilteroption={deltafilteroption}, minimapoption={minimapoption}')
quartet_util.run(AssemblyMapper, args)
10 changes: 5 additions & 5 deletions quartet_util.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Last modified: V1.2.0
# Last modified: V1.2.3
import time
import sys
import math
Expand Down Expand Up @@ -113,16 +113,16 @@ def mummer(reffasta, qryfasta, prefix, suffix, nucmeroption, deltafilteroption,
subprocess.run(f'mv -t tmp -f {prefix}.{suffix}.gp {prefix}.{suffix}.fplot {prefix}.{suffix}.rplot {prefix}.{suffix}.delta {prefix}.{suffix}.filter.delta {prefix}.{suffix}.coords', stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
return f'tmp/{prefix}.{suffix}.coords'

def minimap(reffasta, qryfasta, prefix, suffix, minimapoption, plot, overwrite):
print('[Info] Running minimap2...')
def minimap(reffasta, qryfasta, prefix, suffix, minimapoption, plot, overwrite, aligner):
print(f'[Info] Running {aligner}...')
subprocess.run(f'mv -t ./ -f tmp/{prefix}.{suffix}.paf', stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
if not os.path.exists(f'{prefix}.{suffix}.paf') or overwrite == True:
cmdr = subprocess.run(f'minimap2 {minimapoption} -c -o {prefix}.{suffix}.paf {reffasta} {qryfasta}', stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
cmdr = subprocess.run(f'{aligner} {minimapoption} -c -o {prefix}.{suffix}.paf {reffasta} {qryfasta}', stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
if '[morecore]' in cmdr.stderr.decode("utf-8") or cmdr.returncode < 0:
print(f'[Error] Memory insufficient.')
sys.exit(0)
elif cmdr.returncode != 0:
print(f'[Error] Unexcepted error occur in minimap2 as follow:')
print(f'[Error] Unexcepted error occur in {aligner} as follow:')
print(f'cmd: {cmdr.args}')
print(f'returncode: {cmdr.returncode}')
print('stdout:')
Expand Down

0 comments on commit 382fc47

Please sign in to comment.