Skip to content

Commit

Permalink
Merge branch 'master' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
aineniamh authored Apr 12, 2024
2 parents a8231b6 + 55aa31b commit 258303a
Show file tree
Hide file tree
Showing 3 changed files with 157 additions and 91 deletions.
180 changes: 141 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,75 +1,177 @@
# snipit
Summarise snps relative to a reference sequence


<img src="./docs/genome_graph.png" width="700">

### Usage
### Install

```
pip install snipit
```
usage: snipit <alignment> [options]

snipit
### Example Usage

positional arguments:
alignment Input alignment fasta file
- Basic usage for nucleotide alignments:
```
snipit test.fasta \
--output-file test
```
Default format output is `png`. Only specify output path/name (not extension).

- To change output format, use `--format`:
```
snipit test.fasta \
--output-file test \
--format pdf
```
Options: `png`, `jpg`, `pdf`, `svg`, `tiff`.

- To change color scheme, use `--colour-palette`:
```
snipit test.fasta \
--output-file test \
--colour-palette classic_extended
```

Other colours schemes:
```
classic, classic_extended, primary, purine-pyrimidine, greyscale, wes,verity, ugene
```
Use `ugene` for protein (aa) alignments.
Use `classic_extended` for colouring ambiguous bases.

- There are multiple options to control which SNPs or indels are included/excluded:
```
snipit test.fasta \
--show-indels \
--include-positions '100-150' \
--exclude-positions '223 224 225'
```

- For control over ambiguous bases, use `--ambig-mode` to specify how ambiguous bases are handled:
```
[all] include all ambig such as N,Y,B in all positions
[snps] only include ambig if a snp is present at the same position - Default
[exclude] remove all ambig, same as depreciated --exclude-ambig-pos
```
Use the colour palette `classic_extended` when plotting with `all` or `snps`.

- Recombination mode is designed to assist with recombination analysis for SC2. This mode allows for colouring of mutations present in two references. For recombination mode, three flags are required: `--reference`,`--recombi-mode`,`--recombi-references`.

The specified `--reference` must be different from the `--recombi-references`.
```
snipit test.fasta \
--reference USA_3 \
--recombi-mode \
--recombi-references "USA_1,USA_2"
```

For amino acid alignments, specify the sequence type as `aa`, use the colour palette `ugene`:
```
snipit test.prot.fasta \
--sequence-type aa \
--colour-palette ugene \
--output-file test.prot
```

There are several more options, see below for full usage.

### Full Usage
```
snipit
optional arguments:
-h, --help show this help message and exit
Input options:
alignment Input alignment fasta file
-t {nt,aa}, --sequence-type {nt,aa}
Input sequence type: aa or nt
-r REFERENCE, --reference REFERENCE
Indicates which sequence in the alignment is the reference (by sequence ID). Default: first sequence in
Indicates which sequence in the alignment is the
reference (by sequence ID). Default: first sequence in
alignment
-l LABELS, --labels LABELS
Optional csv file of labels to show in output snipit plot. Default: sequence names
Optional csv file of labels to show in output snipit
plot. Default: sequence names
--l-header LABEL_HEADERS
Comma separated string of column headers in label csv. First field indicates sequence name column, second
the label column. Default: 'name,label'
Comma separated string of column headers in label csv.
First field indicates sequence name column, second the
label column. Default: 'name,label'
Mode options:
--recombi-mode Allow colouring of query seqeunces by mutations
present in two 'recombi-references' from the input
alignment fasta file
--recombi-references RECOMBI_REFERENCES
Specify two comma separated sequence IDs in the input
alignment to use as 'recombi-references'. Ex.
Sequence_ID_A,Sequence_ID_B
--cds-mode Assumes sequence supplied is a coding sequence
Output options:
-d OUTPUT_DIR, --output-dir OUTPUT_DIR
Output directory. Default: current working directory
-o OUTFILE, --output-file OUTFILE
Output file name stem. Default: snp_plot
-s, --write-snps Write out the SNPs in a csv file.
-f FORMAT, --format FORMAT
Format options (png, jpg, pdf, svg, tiff) Default: png
Figure options:
--height HEIGHT Overwrite the default figure height
--width WIDTH Overwrite the default figure width
--size-option SIZE_OPTION
Specify options for sizing. Options: expand, scale
--solid-background Force the plot to have a solid background, rather than a
transparent one.
--flip-vertical Flip the orientation of the plot so sequences are below the
reference rather than above it.
--snps-only Ignore insertion and deletion mutations and only plot SNPs
(legacy behaviour).
--include-positions INCLUDED_POSITIONS [INCLUDED_POSITIONS ...]
One or more range (closed, inclusive; one-indexed) or specific position only included in the output. Ex.
'100-150' or Ex. '100 101' Considered before '--exclude-positions'.
--exclude-positions EXCLUDED_POSITIONS [EXCLUDED_POSITIONS ...]
One or more range (closed, inclusive; one-indexed) or specific position to exclude in the output. Ex.
'100-150' or Ex. '100 101' Considered after '--include-positions'.
--exclude-ambig-pos Exclude positions with ambig base in any sequences. Considered
after '--include-positions'
--solid-background Force the plot to have a solid background, rather than
a transparent one.
-c , --colour-palette
Specify colour palette. Options: [classic,
classic_extended, primary, purine-pyrimidine,
greyscale, wes, verity, ugene]. Use ugene for protein
alignments.
--flip-vertical Flip the orientation of the plot so sequences are
below the reference rather than above it.
--sort-by-mutation-number
Render the graph with sequences sorted by the number of SNPs relative to the reference (fewest to most).
Render the graph with sequences sorted by the number
of SNPs relative to the reference (fewest to most).
Default: False
--sort-by-id Sort sequences alphabetically by sequence id. Default: False
--sort-by-id Sort sequences alphabetically by sequence id. Default:
False
--sort-by-mutations SORT_BY_MUTATIONS
Sort sequences by bases at specified positions. Positions are comma separated integers. Ex. '1,2,3'
--high-to-low If sorted by mutation number is selected, show the sequences
with the fewest SNPs closest to the
Sort sequences by bases at specified positions.
Positions are comma separated integers. Ex. '1,2,3'
--high-to-low If sorted by mutation number is selected, show the
sequences with the fewest SNPs closest to the
reference. Default: False
SNP options:
--show-indels Include insertion and deletion mutations in snipit
plot.
--include-positions INCLUDED_POSITIONS [INCLUDED_POSITIONS ...]
One or more range (closed, inclusive; one-indexed) or
specific position only included in the output. Ex.
'100-150' or Ex. '100 101' Considered before '--
exclude-positions'.
--exclude-positions EXCLUDED_POSITIONS [EXCLUDED_POSITIONS ...]
One or more range (closed, inclusive; one-indexed) or
specific position to exclude in the output. Ex.
'100-150' or Ex. '100 101' Considered after '--
include-positions'.
--ambig-mode {all,snps,exclude}
Controls how ambiguous bases are handled - [all]
include all ambig such as N,Y,B in all positions;
[snps] only include ambig if a snp is present at the
same position; [exclude] remove all ambig, same as
depreciated --exclude-ambig-pos
Misc options:
-v, --version show program's version number and exit
-c COLOUR_PALETTE, --colour-palette COLOUR_PALETTE
Specify colour palette. Options: primary, classic, purine-pyrimidine, greyscale, wes, verity
--recombi-mode Allow colouring of query seqeunces by mutations present in two
'recombi-references' from the input
alignment fasta file
--recombi-references RECOMBI_REFERENCES
Specify two comma separated sequence IDs in the input alignment to use as 'recombi-references'. Ex.
Sequence_ID_A,Sequence_ID_B
```

### Install
### Cite

Please cite this tool as follows:
```
pip install snipit
Aine O'Toole, snipit (2024) GitHub repository, https://github.com/aineniamh/snipit
```
22 changes: 7 additions & 15 deletions snipit/command.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
import sys
import os
import argparse
import textwrap
import pkg_resources
import collections

Expand All @@ -28,7 +27,6 @@ def main(sysargs = sys.argv[1:]):

i_group = parser.add_argument_group('Input options')
i_group.add_argument('alignment',help="Input alignment fasta file")
i_group.add_argument("-t","--sequence-type", choices=['nt','aa'], action="store",help="Input sequence type: aa or nt", default="nt", dest="sequence_type")
i_group.add_argument("-r","--reference", action="store",help="Indicates which sequence in the alignment is\nthe reference (by sequence ID).\nDefault: first sequence in alignment", dest="reference")
i_group.add_argument("-l","--labels", action="store",help="Optional csv file of labels to show in output snipit plot. Default: sequence names", dest="labels")
i_group.add_argument("--l-header", action="store",help="Comma separated string of column headers in label csv. First field indicates sequence name column, second the label column. Default: 'name,label'", dest="label_headers",default="name,label")
Expand All @@ -49,10 +47,7 @@ def main(sysargs = sys.argv[1:]):
f_group.add_argument("--width",action="store",type=float,help="Overwrite the default figure width",default=0)
f_group.add_argument("--size-option",action="store",help="Specify options for sizing. Options: expand, scale",dest="size_option",default="scale")
f_group.add_argument("--solid-background",action="store_true",help="Force the plot to have a solid background, rather than a transparent one.",dest="solid_background")
f_group.add_argument("-c","--colour-palette",dest="colour_palette",action="store",
help="Specify colour palette. Options: [classic, classic_extended, primary, purine-pyrimidine, greyscale, wes, verity, ugene]. Use ugene for protein alignments.",default="classic",
choices=["classic","classic_extended","primary","purine-pyrimidine","greyscale","wes","verity","ugene"],
metavar='')
f_group.add_argument("-c","--colour-palette",dest="colour_palette",action="store",help="Specify colour palette. Options: primary, classic, purine-pyrimidine, greyscale, wes, verity",default="classic")
f_group.add_argument("--flip-vertical",action='store_true',help="Flip the orientation of the plot so sequences are below the reference rather than above it.",dest="flip_vertical")
f_group.add_argument("--sort-by-mutation-number", action='store_true',
help="Render the graph with sequences sorted by the number of SNPs relative to the reference (fewest to most). Default: False", dest="sort_by_mutation_number")
Expand All @@ -67,11 +62,8 @@ def main(sysargs = sys.argv[1:]):
s_group.add_argument("--show-indels",action='store_true',help="Include insertion and deletion mutations in snipit plot.",dest="show_indels")
s_group.add_argument('--include-positions', dest='included_positions', type=sfunks.bp_range, nargs='+', default=None, help="One or more range (closed, inclusive; one-indexed) or specific position only included in the output. Ex. '100-150' or Ex. '100 101' Considered before '--exclude-positions'.")
s_group.add_argument('--exclude-positions', dest='excluded_positions', type=sfunks.bp_range, nargs='+', default=None, help="One or more range (closed, inclusive; one-indexed) or specific position to exclude in the output. Ex. '100-150' or Ex. '100 101' Considered after '--include-positions'.")
s_group.add_argument("--ambig-mode", dest="ambig_mode",choices=['all', 'snps', 'exclude'], default='snpsambi',
help=textwrap.dedent('''Controls how ambiguous bases are handled -
[all] include all ambig such as N,Y,B in all positions;
[snps] only include ambig if a snp is present at the same position;
[exclude] remove all ambig, same as depreciated --exclude-ambig-pos'''))
s_group.add_argument("--exclude-ambig-pos",dest="exclude_ambig_pos",action='store_true',help="Exclude positions with ambig base in any sequences. Considered after '--include-positions'")

misc_group = parser.add_argument_group('Misc options')
misc_group.add_argument("-v","--version", action='version', version=f"snipit {__version__}")

Expand Down Expand Up @@ -105,9 +97,9 @@ def main(sysargs = sys.argv[1:]):

reference,alignment = sfunks.get_ref_and_alignment(args.alignment,ref_input,label_map)

snp_dict,record_snps,num_snps = sfunks.find_snps(reference,alignment,args.show_indels,args.sequence_type,args.ambig_mode)
snp_dict,record_snps,num_snps = sfunks.find_snps(reference,alignment,args.show_indels)

record_ambs = sfunks.find_ambiguities(alignment, snp_dict, args.sequence_type)
record_ambs = sfunks.find_ambiguities(alignment, snp_dict)

colours = sfunks.get_colours(args.colour_palette)

Expand All @@ -131,14 +123,14 @@ def main(sysargs = sys.argv[1:]):
args.flip_vertical,
args.included_positions,
args.excluded_positions,
args.ambig_mode,
args.exclude_ambig_pos,
args.sort_by_mutation_number,
args.high_to_low,
args.sort_by_id,
args.sort_by_mutations,
args.recombi_mode,
args.recombi_references)
print(sfunks.green(f"Snipping Complete: {output}"))


if __name__ == '__main__':
main()
Loading

0 comments on commit 258303a

Please sign in to comment.