From 819aebdf12a6f8348740aab051acee5dc80b0158 Mon Sep 17 00:00:00 2001 From: GitHub Actions Date: Fri, 9 Aug 2024 02:46:19 +0000 Subject: [PATCH] Deployed 51a1a4a to v0.3.1 with MkDocs 1.6.0 and mike 2.1.2 --- v0.3.1/api/index.html | 92 ++++++++++++++++++++++++++++++++- v0.3.1/search/search_index.json | 2 +- 2 files changed, 92 insertions(+), 2 deletions(-) diff --git a/v0.3.1/api/index.html b/v0.3.1/api/index.html index 7b9ca6c..dcf57b1 100644 --- a/v0.3.1/api/index.html +++ b/v0.3.1/api/index.html @@ -1373,6 +1373,36 @@

Overview

Alias for run_blast with tblastn specified +Framework +camlhmp.framework +read_framework +Read the framework YAML file + + +Framework +camlhmp.framework +print_version +Print the version of the framework + + +Framework +camlhmp.framework +get_types +Get the types from the framework + + +Framework +camlhmp.framework +check_types +Check the types against the results + + +Framework +camlhmp.framework +check_regions +Check the region types against the results + + Parser camlhmp.parsers.blast get_blast_allele_hits @@ -1385,11 +1415,71 @@

Overview

Parse BLAST output for region hits -parser +Parser camlhmp.parsers.blast get_blast_target_hits Parse BLAST output for target hits + +Utils +camlhmp.utils +execute +Execute a command + + +Utils +camlhmp.utils +check_dependencies +Check if all dependencies are installed + + +Utils +camlhmp.utils +get_platform +Get the platform of the executing machine + + +Utils +camlhmp.utils +validate_file +Validate a file exists and not empty + + +Utils +camlhmp.utils +file_exists_error +Determine if a file exists and raise an error + + +Utils +camlhmp.utils +parse_seq +Parse a sequence file containing a single record + + +Utils +camlhmp.utils +parse_seqs +Parse a sequence file containing a multiple records + + +Utils +camlhmp.utils +parse_table +Parse a delimited file + + +Utils +camlhmp.utils +parse_yaml +Parse a YAML file + + +Utils +camlhmp.utils +write_tsv +Write the dictionary to a TSV file + diff --git a/v0.3.1/search/search_index.json b/v0.3.1/search/search_index.json index a774a3c..95d914c 100644 --- a/v0.3.1/search/search_index.json +++ b/v0.3.1/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"],"fields":{"title":{"boost":1000.0},"text":{"boost":1.0},"tags":{"boost":1000000.0}}},"docs":[{"location":"","title":"camlhmp","text":"

\ud83d\udc2a camlhmp \ud83d\udc2a - Classification through yAML Heuristic Mapping Protocol

camlhmp is a tool for generating organism typing tools from YAML schemas. Through discussions with Tim Read, we identified a need for a straightforward method to define and manage typing schemas for organisms of interest. YAML was chosen for its simplicity and readability.

"},{"location":"#purpose","title":"Purpose","text":"

The primary purpose of camlhmp is to provide a framework that enables researchers to independently define typing schemas for their organisms of interest using YAML. This approach facilitates the management and analysis biological data for researchers at any level of experience.

camlhmp does not supply pre-defined typing schemas. Instead, it equips researchers with the necessary tools to create and maintain their own schemas, ensuring these schemas can easily remain up to date with the latest scientific developments.

Finally, the development of camlhmp was driven by a practical need to streamline maintenance of multiple organism typing tools. Managing these tools separately is time-consuming and challenging. camlhmp simplifies this by providing a single framework for each tool.

"},{"location":"#documentation-overview","title":"Documentation Overview","text":"

Installation Information for installing camlhmp on your system

Available Tools A list of available typing tools utilizing camlhmp

Schema Definition Details about defining schemas for use with camlhmp

CLI Reference Details about available CLI commands from camlhmp

API Reference Details about using the camlhmp package in your own code

About Information about the development and funding of camlhmp

"},{"location":"#funding","title":"Funding","text":"

Support for this project came (in part) from the Wyoming Public Health Division, and the Center for Applied Pathogen Epidemiology and Outbreak Control (CAPE).

"},{"location":"#citing-camlhmp","title":"Citing camlhmp","text":"

If you make use of camlhmp in your analysis, please cite the following:

"},{"location":"CHANGELOG/","title":"Changelog","text":""},{"location":"CHANGELOG/#v100-rpetit3camlhmp-dromedary-202408","title":"v1.0.0 rpetit3/camlhmp \"Dromedary\" 2024/08/??","text":""},{"location":"CHANGELOG/#added","title":"Added","text":""},{"location":"CHANGELOG/#fixed","title":"Fixed","text":""},{"location":"CHANGELOG/#v031-rpetit3camlhmp-maybe-a-cat-20240805","title":"v0.3.1 rpetit3/camlhmp \"Maybe a cat?\" 2024/08/05","text":""},{"location":"CHANGELOG/#fixed_1","title":"Fixed","text":""},{"location":"CHANGELOG/#v030-rpetit3camlhmp-more-bunnies-and-fewer-baby-birds-20240805","title":"v0.3.0 rpetit3/camlhmp \"More bunnies and fewer baby birds\" 2024/08/05","text":""},{"location":"CHANGELOG/#added_1","title":"Added","text":""},{"location":"CHANGELOG/#v022-rpetit3camlhmp-even-a-few-baby-birds-20240722","title":"v0.2.2 rpetit3/camlhmp \"Even a few baby birds\" 2024/07/22","text":""},{"location":"CHANGELOG/#fixed_2","title":"Fixed","text":""},{"location":"CHANGELOG/#v021-rpetit3camlhmp-and-a-bunch-of-birds-20240722","title":"v0.2.1 rpetit3/camlhmp \"And a bunch of birds\" 2024/07/22","text":""},{"location":"CHANGELOG/#added_2","title":"Added","text":""},{"location":"CHANGELOG/#fixed_3","title":"Fixed","text":""},{"location":"CHANGELOG/#v020-rpetit3camlhmp-four-little-bunnies-20240722","title":"v0.2.0 rpetit3/camlhmp \"Four little bunnies\" 2024/07/22","text":""},{"location":"CHANGELOG/#added_3","title":"Added","text":""},{"location":"CHANGELOG/#v010-rpetit3camlhmp-little-baby-legs-20240430","title":"v0.1.0 rpetit3/camlhmp \"Little baby legs\" 2024/04/30","text":""},{"location":"CHANGELOG/#added_4","title":"Added","text":""},{"location":"CHANGELOG/#v001-rpetit3camlhmp-not-even-walking-yet-20240424","title":"v0.0.1 rpetit3/camlhmp \"Not even walking yet\" 2024/04/24","text":"

This is a development release for getting things on PyPi and Bioconda. Not expected to be stable.

"},{"location":"CHANGELOG/#added_5","title":"Added","text":""},{"location":"about/","title":"About","text":""},{"location":"about/#naming","title":"Naming","text":"

I really wanted to name a tool with \"camel\" in it because they are my wife's favorite animal\ud83d\udc2a and camels also remind me of my friends in Oman!

Once it was decided YAML was going to be the format for defining schemas, I immediately was drawn into \"Classification through YAML\", or CAML\", but quickly found out many others had also thought of this (for other use cases). We went through a few other iterations of CAML without any success. Fortunately, Tim Read came through with a clutch save suggested \"Heuristic Mapping Protocol\". So, here we are - camlhmp!

"},{"location":"about/#funding","title":"Funding","text":"

Support for this project came (in part) from the Wyoming Public Health Division, and the Center for Applied Pathogen Epidemiology and Outbreak Control (CAPE).

"},{"location":"about/#citing-camlhmp","title":"Citing camlhmp","text":"

If you make use of camlhmp in your analysis, please cite the following:

"},{"location":"available-tools/","title":"Available Tools","text":"

Below is a list of available typing tools utilizing camlhmp. Each tool is designed to analyze specific sequence data and generate a typing profile based on the schema provided.

Tip

If you've developed a typing tool that utilizes camlhmp, or know of one, we'd love to add it to this list. To do so, open an issue on the camlhmp GitHub repository

Tool Organism Description pasty Pseudomonas aeruginosa in silico serogrouping of Pseudomonas aeruginosa isolates pbptyper Streptococcus pneumoniae In silico Penicillin Binding Protein (PBP) typer for Streptococcus pneumoniae assemblies sccmec Staphylococcus aureus A tool for typing SCCmec cassettes in assemblies"},{"location":"installation/","title":"Installation","text":"

camlhmp is available through PyPI and Bioconda. While you can install it through PyPi, it is recommended to install it through BioConda so that non-Python dependencies are also installed.

conda create -n camlhmp -c conda-forge -c bioconda camlhmp\nconda activate camlhmp\ncamlhmp\n
"},{"location":"schema/","title":"Schema Reference","text":"

The schema structure is designed to be simple and intuitive. Here is a basic skeleton of the expected schema structure:

%YAML 1.2\n---\n# metadata: general information about the schema\nmetadata:\n  id: \"\"          # unique identifier for the schema\n  name: \"\"        # name of the schema\n  description: \"\" # description of the schema\n  version: \"\"     # version of the schema\n  curators: []    # A list of curators of the schema\n\n# engine: specifies the computational tools and additional parameters used for sequence\n#         analysis.\nengine:\n  type: \"\"        # The type of tool used to generate the data\n  tool: \"\"        # The tool used to generate the data\n\n# targets: Lists the specific sequence targets such as genes, proteins, or markers that the\n#          schema will analyze. These should be included in the associated sequence query data\ntargets: []\n\n# aliases: groups multiple targets under a common name for easier reference\naliases:\n  - name: \"\"     # name of the alias\n    targets: []  # list of targets that are part of the alias\n\n# types: define specific combinations of targets and aliases to form distinct types\ntypes:\n  - name: \"\"     # name of the profile\n    targets: []  # list of targets (can use aliases) that are part of the profile\n    excludes: [] # list of targets (or aliases) that will automatically fail the type\n

From this schema we have a few sections:

Within each section there are additional fields that will be descibed in the next sections.

"},{"location":"schema/#metadata","title":"metadata","text":"

The metadata section provides general information about the schema. This includes:

Field Type Description id string A unique identifier for the schema name string The name of the schema description string A brief description of the schema version string The version of the schema curators list A list of curators of the schema"},{"location":"schema/#engine","title":"engine","text":"

The engine section specifies the computational tools used for sequence analysis.

Field Type Description type string The type of engine used for analysis tool string The specific tool to be used for the engine"},{"location":"schema/#targets","title":"targets","text":"

The targets section lists the specific sequence targets such as genes, proteins, or markers that the schema will analyze. These should be included in the associated sequence query data.

Field Type Description targets list A list of targets to be analyzed"},{"location":"schema/#aliases","title":"aliases","text":"

aliases are a convenient way to group multiple targets under a common name for easier reference.

Field Type Description name string The name of the alias targets list A list of targets that are part of the alias"},{"location":"schema/#types","title":"types","text":"

The types section defines specific combinations of targets and aliases to form distinct types.

Field Type Description name string The name of the profile targets list A list of targets (or aliases) that are part of the type excludes list A list of targets (or aliases) that will automatically fail the type"},{"location":"schema/#example-schema-partial-sccmec-typing","title":"Example Schema: Partial SCCmec Typing","text":"

Here is an example of a partial schema for SCCmec typing:

%YAML 1.2\n---\n# metadata: general information about the schema\nmetadata:\n  id: \"sccmec_partial\"                                # unique identifier for the schema\n  name: \"SCCmec Typing\"                              # name of the schema\n  description: \"A partial schema for SCCmec typing\"  # description of the schema\n  version: \"0.0.1\"                                     # version of the schema\n  curators:                                          # A list of curators of the schema\n    - \"Robert Petit\"\n\n# engine: specifies the computational tools and additional parameters used for sequence\n#         analysis.\nengine:\n  type: blast   # The type of tool used to generate the data\n  tool: blastn  # The tool used to generate the data\n\n# targets: Lists the specific sequence targets such as genes, proteins, or markers that the\n#          schema will analyze. These should be included in the associated sequence query data\ntargets:\n  - \"ccrA1\"\n  - \"ccrA2\"\n  - \"ccrA3\"\n  - \"ccrB1\"\n  - \"ccrB2\"\n  - \"ccrB3\"\n  - \"IS431\"\n  - \"IS1272\"\n  - \"mecA\"\n  - \"mecI\"\n  - \"mecR1\"\n\n# aliases: groups multiple targets under a common name for easier reference\naliases:\n  - name: \"ccr Type 1\"           # name of the alias\n    targets: [\"ccrA1\", \"ccrB1\"]  # list of targets that are part of the alias\n  - name: \"ccr Type 2\"\n    targets: [\"ccrA2\", \"ccrB2\"]\n  - name: \"ccr Type 3\"\n    targets: [\"ccrA3\", \"ccrB3\"]\n  - name: \"mec Class A\"\n    targets: [\"IS431\", \"mecA\", \"mecR1\", \"mecI\"]\n  - name: \"mec Class B\"\n    targets: [\"IS431\", \"mecA\", \"mecR1\", \"IS1272\"]\n\n# types: define specific combinations of targets and aliases to form distinct types\ntypes:\n  - name: \"I\"          # name of the profile\n    targets:           # list of targets (can use aliases) that are part of the profile\n      - \"ccr Type 1\"\n      - \"mec Class B\"\n  - name: \"II\"\n    targets:\n      - \"ccr Type 2\"\n      - \"mec Class A\"\n  - name: \"III\"\n    targets:\n      - \"ccr Type 3\"\n      - \"mec Class A\"\n  - name: \"IV\"\n    targets:\n      - \"ccr Type 2\"\n      - \"mec Class B\"\n

From this schema, camlhmp can generate a typing tool that can be used to analyze input assemblies. This is only a partial schema, as there are many more SCCmec types and subtypes. But using this schema it should be straight forward to add additional targets and profiles.

"},{"location":"api/","title":"camlhmp API Reference","text":"

At it's core camlhmp is a library that provides a set of functions for typing organisms. It includes functions for running programs and parsing their outputs. In situations where the available CLI commands do not meet your needs, you can use the API functions to build your own custom workflows.

Currently the following modules are available in the camlhmp API:

Type Module Function Description Engine camlhmp.engines.blast run_blast Run BLAST program Engine camlhmp.engines.blast run_blast Alias for run_blast with blastn specified Engine camlhmp.engines.blast run_blast Alias for run_blast with tblastn specified Parser camlhmp.parsers.blast get_blast_allele_hits Parse BLAST output for allele hits Parser camlhmp.parsers.blast get_blast_region_hits Parse BLAST output for region hits parser camlhmp.parsers.blast get_blast_target_hits Parse BLAST output for target hits"},{"location":"api/framework/","title":"camlhmp.framework","text":"

Below are the functions available in the camlhmp.framework module.

"},{"location":"api/framework/#camlhmp.framework.read_framework","title":"camlhmp.framework.read_framework(yamlfile)","text":"

Read the framework YAML file.

Parameters:

Name Type Description Default yamlfile str

input YAML file to be read

required

Returns:

Name Type Description dict dict

the parsed YAML file

Examples:

>>> from camlhmp.framework import read_framework\n>>> framework = read_framework(yaml_path)\n
Source code in camlhmp/framework.py
def read_framework(yamlfile: str) -> dict:\n    \"\"\"\n    Read the framework YAML file.\n\n    Args:\n        yamlfile (str): input YAML file to be read\n\n    Returns:\n        dict: the parsed YAML file\n\n    Examples:\n        >>> from camlhmp.framework import read_framework\n        >>> framework = read_framework(yaml_path)\n    \"\"\"\n    return parse_yaml(yamlfile)\n
"},{"location":"api/framework/#camlhmp.framework.print_version","title":"camlhmp.framework.print_version(framework)","text":"

Print the version of the framework, then exit

Parameters:

Name Type Description Default framework dict

the parsed YAML framework

required

Examples:

>>> from camlhmp.framework import print_version\n>>> print_version(framework)\n
Source code in camlhmp/framework.py
def print_version(framework: dict) -> None:\n    \"\"\"\n    Print the version of the framework, then exit\n\n    Args:\n        framework (dict): the parsed YAML framework\n\n    Examples:\n        >>> from camlhmp.framework import print_version\n        >>> print_version(framework)\n    \"\"\"\n    print(f\"camlhmp, version {camlhmp.__version__}\", file=sys.stderr)\n    print(f\"schema {framework['metadata']['id']}, version {framework['metadata']['version']}\", file=sys.stderr)\n    sys.exit(0)\n
"},{"location":"api/framework/#camlhmp.framework.get_types","title":"camlhmp.framework.get_types(framework)","text":"

Get the types from the framework.

Example framework: aliases: - name: \"ccr Type 2\" targets: [\"ccrA1\", \"ccrB1\"] types: - name: \"I\" targets: - \"ccr Type 1\" - \"mec Class B\"

Parameters:

Name Type Description Default framework dict

the parsed YAML framework

required

Returns:

Name Type Description dict dict

the types with associated targets

Examples:

>>> from camlhmp.framework import get_types\n>>> types = get_types(framework)\n
Source code in camlhmp/framework.py
def get_types(framework: dict) -> dict:\n    \"\"\"\n    Get the types from the framework.\n\n    Example framework:\n    aliases:\n    - name: \"ccr Type 2\"\n      targets: [\"ccrA1\", \"ccrB1\"]\n    types:\n    - name: \"I\"\n      targets:\n        - \"ccr Type 1\"\n        - \"mec Class B\"\n\n    Args:\n        framework (dict): the parsed YAML framework\n\n    Returns:\n        dict: the types with associated targets\n\n    Examples:\n        >>> from camlhmp.framework import get_types\n        >>> types = get_types(framework)\n    \"\"\"\n    types = {}\n    aliases = {}\n\n    # If aliases are present, save their targets\n    if \"aliases\" in framework:\n        for alias in framework[\"aliases\"]:\n            aliases[alias[\"name\"]] = alias[\"targets\"]\n\n    # Save the types and their targets\n    for profile in framework[\"types\"]:\n        types[profile[\"name\"]] = {\n            \"targets\": [],\n            \"excludes\": [],\n        }\n        for target in profile[\"targets\"]:\n            if target in aliases:\n                types[profile[\"name\"]][\"targets\"] = [\n                    *types[profile[\"name\"]][\"targets\"],\n                    *aliases[target],\n                ]\n            elif target in framework[\"targets\"]:\n                types[profile[\"name\"]][\"targets\"].append(target)\n            else:\n                raise ValueError(f\"Target {target} not found in framework\")\n\n        # Capture any targets that should cause a profile to fail\n        if \"excludes\" in profile:\n            for exclude in profile[\"excludes\"]:\n                if exclude in aliases:\n                    types[profile[\"name\"]][\"excludes\"] = [\n                        *types[profile[\"name\"]][\"excludes\"],\n                        *aliases[exclude],\n                    ]\n                elif exclude in framework[\"targets\"]:\n                    types[profile[\"name\"]][\"excludes\"].append(exclude)\n                else:\n                    raise ValueError(f\"Target {exclude} not found in framework\")\n\n    # Debugging information\n    logging.debug(\"camlhmp.framework.get_types\")\n    if \"aliases\" in framework:\n        logging.debug(f\"Aliases: {framework['aliases']}\")\n    logging.debug(f\"Targets: {framework['targets']}\")\n    logging.debug(f\"Types: {types}\")\n\n    return types\n
"},{"location":"api/framework/#camlhmp.framework.check_types","title":"camlhmp.framework.check_types(types, results)","text":"

Check the types against the results.

Parameters:

Name Type Description Default types dict

the types with associated targets

required results dict

the BLAST results

required

Returns:

Name Type Description dict dict

the types and their outcome

Examples:

>>> from camlhmp.framework import check_types\n>>> type_hits = check_types(types, target_results)\n
Source code in camlhmp/framework.py
def check_types(types: dict, results: dict) -> dict:\n    \"\"\"\n    Check the types against the results.\n\n    Args:\n        types (dict): the types with associated targets\n        results (dict): the BLAST results\n\n    Returns:\n        dict: the types and their outcome\n\n    Examples:\n        >>> from camlhmp.framework import check_types\n        >>> type_hits = check_types(types, target_results)\n    \"\"\"\n    type_hits = {}\n    for type, vals in types.items():\n        targets = vals[\"targets\"]\n        excludes = vals[\"excludes\"]\n        type_hits[type] = {\n            \"status\": False,\n            \"targets\": [],\n            \"missing\": [],\n            \"comment\": \"\",\n        }\n        matched_all_targets = True\n        for target in targets:\n            if results[target]:\n                type_hits[type][\"targets\"].append(target)\n            else:\n                type_hits[type][\"missing\"].append(target)\n                matched_all_targets = False\n\n        # Check if any of the excludes are present\n        for exclude in excludes:\n            if results[exclude]:\n                type_hits[type][\n                    \"comment\"\n                ] = f\"Excluded target {exclude} found, failing type {type}\"\n                logging.debug(f\"Excluded target {exclude} found, failing type {type}\")\n                matched_all_targets = False\n        type_hits[type][\"status\"] = matched_all_targets\n\n    # Debugging information\n    logging.debug(\"camlhmp.framework.check_types\")\n    logging.debug(f\"Type Hits: {type_hits}\")\n\n    return type_hits\n
"},{"location":"api/framework/#camlhmp.framework.check_regions","title":"camlhmp.framework.check_regions(types, results, min_coverage)","text":"

Check the region types against the results.

Parameters:

Name Type Description Default types dict

the types with associated targets

required results dict

the BLAST results

required min_coverage int

the minimum coverage required for a region

required

Returns:

Name Type Description dict dict

the types and their outcome

Examples:

>>> from camlhmp.framework import check_regions\n>>> type_hits = check_regions(types, target_results, min_coverage)\n
Source code in camlhmp/framework.py
def check_regions(types: dict, results: dict, min_coverage: int) -> dict:\n    \"\"\"\n    Check the region types against the results.\n\n    Args:\n        types (dict): the types with associated targets\n        results (dict): the BLAST results\n        min_coverage (int): the minimum coverage required for a region\n\n    Returns:\n        dict: the types and their outcome\n\n    Examples:\n        >>> from camlhmp.framework import check_regions\n        >>> type_hits = check_regions(types, target_results, min_coverage)\n    \"\"\"\n    type_hits = {}\n    for type, vals in types.items():\n        targets = vals[\"targets\"]\n        excludes = vals[\"excludes\"]\n        type_hits[type] = {\n            \"status\": False,\n            \"targets\": [],\n            \"missing\": [],\n            \"coverage\": [],\n            \"hits\": [],\n            \"comment\": [],\n        }\n        matched_all_targets = True\n        for target in targets:\n            if target in results:\n                if results[target][\"coverage\"] >= min_coverage:\n                    type_hits[type][\"targets\"].append(target)\n                else:\n                    type_hits[type][\"missing\"].append(target)\n                    matched_all_targets = False\n\n                type_hits[type][\"coverage\"].append(f\"{results[target]['coverage']:.2f}\")\n                type_hits[type][\"hits\"].append(str(len(results[target][\"hits\"])))\n                if len(targets) > 1:\n                    if results[target][\"comment\"]:\n                        formatted_comments = []\n                        for comment in results[target][\"comment\"]:\n                            formatted_comments.append(f\"{target}:{comment}\")\n                        if formatted_comments:\n                            type_hits[type][\"comment\"].append(\n                                \";\".join(formatted_comments)\n                            )\n                else:\n                    if results[target][\"comment\"]:\n                        type_hits[type][\"comment\"].append(\n                            \";\".join(results[target][\"comment\"])\n                        )\n            else:\n                matched_all_targets = False\n\n        # Check if any of the excludes are present\n        for exclude in excludes:\n            if results[exclude]:\n                if results[exclude][\"coverage\"] >= min_coverage:\n                    type_hits[type][\"comment\"].append(\n                        f\"Excluded target {exclude} found, failing type {type}\"\n                    )\n                    logging.debug(\n                        f\"Excluded target {exclude} found, failing type {type}\"\n                    )\n                    matched_all_targets = False\n\n        type_hits[type][\"status\"] = matched_all_targets\n\n    # Debugging information\n    logging.debug(\"camlhmp.framework.check_regions\")\n    logging.debug(f\"Type Hits: {type_hits}\")\n\n    return type_hits\n
"},{"location":"api/utils/","title":"camlhmp.utils","text":"

Below are the functions available in the camlhmp.utils module.

"},{"location":"api/utils/#camlhmp.utils.execute","title":"camlhmp.utils.execute(cmd, directory=Path.cwd(), capture=False, stdout_file=None, stderr_file=None, allow_fail=False)","text":"

A simple wrapper around executor.

Parameters:

Name Type Description Default cmd str

The command to be executed

required directory Path

The directory to execute the command in. Defaults to Path.cwd().

cwd() capture bool

Capture the output of the command. Defaults to False.

False stdout_file Path

The file to write stdout to. Defaults to None.

None stderr_file Path

The file to write stderr to. Defaults to None.

None allow_fail bool

Allow the command to fail. Defaults to False.

False

Returns:

Type Description

Union[bool, list]: True if successful, otherwise a list of stdout and stderr

Raises:

Type Description ExternalCommandFailed

If the command fails and allow_fail is True

Examples:

>>> from camlhmp.utils import execute\n>>> stdout, stderr = execute(\n        f\"{cat_type} {subject} | {engine} -query {query} -subject - -outfmt '6 {outfmt}' {qcov_hsp_perc} {perc_identity}\",\n        capture=True,\n    )\n
Source code in camlhmp/utils.py
def execute(\n    cmd,\n    directory=Path.cwd(),\n    capture=False,\n    stdout_file=None,\n    stderr_file=None,\n    allow_fail=False,\n):\n    \"\"\"\n    A simple wrapper around executor.\n\n    Args:\n        cmd (str): The command to be executed\n        directory (Path, optional): The directory to execute the command in. Defaults to Path.cwd().\n        capture (bool, optional): Capture the output of the command. Defaults to False.\n        stdout_file (Path, optional): The file to write stdout to. Defaults to None.\n        stderr_file (Path, optional): The file to write stderr to. Defaults to None.\n        allow_fail (bool, optional): Allow the command to fail. Defaults to False.\n\n    Returns:\n        Union[bool, list]: True if successful, otherwise a list of stdout and stderr\n\n    Raises:\n        ExternalCommandFailed: If the command fails and allow_fail is True\n\n    Examples:\n        >>> from camlhmp.utils import execute\n        >>> stdout, stderr = execute(\n                f\"{cat_type} {subject} | {engine} -query {query} -subject - -outfmt '6 {outfmt}' {qcov_hsp_perc} {perc_identity}\",\n                capture=True,\n            )\n    \"\"\"\n    try:\n        command = ExternalCommand(\n            cmd,\n            directory=directory,\n            capture=True,\n            capture_stderr=True,\n            stdout_file=stdout_file,\n            stderr_file=stderr_file,\n        )\n\n        command.start()\n        logging.debug(command.decoded_stdout)\n        logging.debug(command.decoded_stderr)\n\n        if capture:\n            return [command.decoded_stdout, command.decoded_stderr]\n        return True\n    except ExternalCommandFailed as e:\n        if allow_fail:\n            logging.error(e)\n            sys.exit(e.returncode)\n        else:\n            return None\n
"},{"location":"api/utils/#camlhmp.utils.check_dependencies","title":"camlhmp.utils.check_dependencies()","text":"

Check if all dependencies are installed.

Examples:

>>> from camlhmp.utils import check_dependencies\n>>> check_dependencies()\n
Source code in camlhmp/utils.py
def check_dependencies():\n    \"\"\"\n    Check if all dependencies are installed.\n\n    Examples:\n        >>> from camlhmp.utils import check_dependencies\n        >>> check_dependencies()\n    \"\"\"\n    exit_code = 0\n    print(\"Checking dependencies...\", file=sys.stderr)\n    for program in [\"blastn\"]:\n        which_path = which(program)\n        if which_path:\n            print(f\"Found {program} at {which_path}\", file=sys.stderr)\n        else:\n            print(f\"{program} not found\", file=sys.stderr)\n            exit_code = 1\n\n    if exit_code == 1:\n        print(\"Missing dependencies, please check.\", file=sys.stderr)\n    else:\n        print(\"You are all set!\", file=sys.stderr)\n    sys.exit(exit_code)\n
"},{"location":"api/utils/#camlhmp.utils.get_platform","title":"camlhmp.utils.get_platform()","text":"

Get the platform of the executing machine

Returns:

Name Type Description str str

The platform of the executing machine

Examples:

>>> from camlhmp.utils import get_platform\n>>> platform = get_platform()\n
Source code in camlhmp/utils.py
def get_platform() -> str:\n    \"\"\"\n    Get the platform of the executing machine\n\n    Returns:\n        str: The platform of the executing machine\n\n    Examples:\n        >>> from camlhmp.utils import get_platform\n        >>> platform = get_platform()\n    \"\"\"\n    if platform == \"darwin\":\n        return \"mac\"\n    elif platform == \"win32\":\n        # Windows is not supported\n        logging.error(\"Windows is not supported.\")\n        sys.exit(1)\n    return \"linux\"\n
"},{"location":"api/utils/#camlhmp.utils.validate_file","title":"camlhmp.utils.validate_file(filename)","text":"

Validate a file exists and not empty, if passing return the absolute path

Parameters:

Name Type Description Default filename str

a file to validate exists

required

Returns:

Name Type Description str str

absolute path to file

Raises:

Type Description FileNotFoundError

if the file does not exist

ValueError

if the file is empty

Examples:

>>> from camlhmp.utils import validate_file\n>>> file = validate_file(\"data.fasta\")\n
Source code in camlhmp/utils.py
def validate_file(filename: str) -> str:\n    \"\"\"\n    Validate a file exists and not empty, if passing return the absolute path\n\n    Args:\n        filename (str): a file to validate exists\n\n    Returns:\n        str: absolute path to file\n\n    Raises:\n        FileNotFoundError: if the file does not exist\n        ValueError: if the file is empty\n\n    Examples:\n        >>> from camlhmp.utils import validate_file\n        >>> file = validate_file(\"data.fasta\")\n    \"\"\"\n    f = Path(filename)\n    if not f.exists():\n        raise FileNotFoundError(f\"File ('{filename}') not found, cannot continue\")\n    elif f.stat().st_size == 0:\n        raise ValueError(f\"File ('{filename}') is empty, cannot continue\")\n    return f.absolute()\n
"},{"location":"api/utils/#camlhmp.utils.file_exists_error","title":"camlhmp.utils.file_exists_error(filename, force=False)","text":"

Determine if a file exists and raise an error if it does.

Parameters:

Name Type Description Default filename str

the file to check

required force bool

force overwrite. Defaults to False.

False

Raises:

Type Description FileExistsError

if the file exists and force is False

Source code in camlhmp/utils.py
def file_exists_error(filename: str, force: bool = False):\n    \"\"\"\n    Determine if a file exists and raise an error if it does.\n\n    Args:\n        filename (str): the file to check\n        force (bool, optional): force overwrite. Defaults to False.\n\n    Raises:\n        FileExistsError: if the file exists and force is False\n    \"\"\"\n    if Path(filename).exists() and not force:\n        raise FileExistsError(\n            f\"Results already exists! Use --force to overwrite: {filename}\"\n        )\n
"},{"location":"api/utils/#camlhmp.utils.parse_seq","title":"camlhmp.utils.parse_seq(seqfile, format)","text":"

Parse a sequence file containing a single record.

Parameters:

Name Type Description Default seqfile str

input file to be read

required format str

format of the input file

required

Returns:

Name Type Description SeqIO SeqIO

the parsed file as a SeqIO object

Examples:

>>> from camlhmp.utils import parse_seq\n>>> seq = parse_seq(\"data.fasta\", \"fasta\")\n
Source code in camlhmp/utils.py
def parse_seq(seqfile: str, format: str) -> SeqIO:\n    \"\"\"\n    Parse a sequence file containing a single record.\n\n    Args:\n        seqfile (str): input file to be read\n        format (str): format of the input file\n\n    Returns:\n        SeqIO: the parsed file as a SeqIO object\n\n    Examples:\n        >>> from camlhmp.utils import parse_seq\n        >>> seq = parse_seq(\"data.fasta\", \"fasta\")\n    \"\"\"\n    with open(seqfile, \"rt\") as fh:\n        return SeqIO.read(fh, format)\n
"},{"location":"api/utils/#camlhmp.utils.parse_seqs","title":"camlhmp.utils.parse_seqs(seqfile, format)","text":"

Parse a sequence file containing a multiple records.

Parameters:

Name Type Description Default seqfile str

input file to be read

required format str

format of the input file

required

Returns:

Name Type Description SeqIO SeqIO

the parsed file as a SeqIO object

Examples:

>>> from camlhmp.utils import parse_seqs\n>>> seqs = parse_seqs(\"data.fasta\", \"fasta\")\n
Source code in camlhmp/utils.py
def parse_seqs(seqfile: str, format: str) -> SeqIO:\n    \"\"\"\n    Parse a sequence file containing a multiple records.\n\n    Args:\n        seqfile (str): input file to be read\n        format (str): format of the input file\n\n    Returns:\n        SeqIO: the parsed file as a SeqIO object\n\n    Examples:\n        >>> from camlhmp.utils import parse_seqs\n        >>> seqs = parse_seqs(\"data.fasta\", \"fasta\")\n    \"\"\"\n    with open(seqfile, \"rt\") as fh:\n        return list(SeqIO.parse(fh, format))\n
"},{"location":"api/utils/#camlhmp.utils.parse_table","title":"camlhmp.utils.parse_table(csvfile, delimiter='\\t', has_header=True)","text":"

Parse a delimited file.

Parameters:

Name Type Description Default csvfile str

input delimited file to be parsed

required delimiter str

delimter used to separate column values. Defaults to ' '.

'\\t' has_header bool

the first line should be treated as a header. Defaults to True.

True

Returns:

Type Description Union[list, dict]

Union[list, dict]: A dict is returned if a header is present, otherwise a list is returned

Examples:

>>> from camlhmp.utils import parse_table\n>>> data = parse_table(\"data.tsv\")\n
Source code in camlhmp/utils.py
def parse_table(\n    csvfile: str, delimiter: str = \"\\t\", has_header: bool = True\n) -> Union[list, dict]:\n    \"\"\"\n    Parse a delimited file.\n\n    Args:\n        csvfile (str): input delimited file to be parsed\n        delimiter (str, optional): delimter used to separate column values. Defaults to '\\t'.\n        has_header (bool, optional): the first line should be treated as a header. Defaults to True.\n\n    Returns:\n        Union[list, dict]: A dict is returned if a header is present, otherwise a list is returned\n\n    Examples:\n        >>> from camlhmp.utils import parse_table\n        >>> data = parse_table(\"data.tsv\")\n    \"\"\"\n    data = []\n    with open(csvfile, \"rt\") as fh:\n        for row in (\n            csv.DictReader(fh, delimiter=delimiter)\n            if has_header\n            else csv.reader(fh, delimiter=delimiter)\n        ):\n            data.append(row)\n    return data\n
"},{"location":"api/utils/#camlhmp.utils.parse_yaml","title":"camlhmp.utils.parse_yaml(yamlfile)","text":"

Parse a YAML file.

Parameters:

Name Type Description Default yamlfile str

input YAML file to be read

required

Returns:

Type Description Union[list, dict]

Union[list, dict]: the values parsed from the YAML file

Examples:

>>> from camlhmp.utils import parse_yaml\n>>> data = parse_yaml(\"data.yaml\")\n
Source code in camlhmp/utils.py
def parse_yaml(yamlfile: str) -> Union[list, dict]:\n    \"\"\"\n    Parse a YAML file.\n\n    Args:\n        yamlfile (str): input YAML file to be read\n\n    Returns:\n        Union[list, dict]: the values parsed from the YAML file\n\n    Examples:\n        >>> from camlhmp.utils import parse_yaml\n        >>> data = parse_yaml(\"data.yaml\")\n    \"\"\"\n    with open(yamlfile, \"rt\") as fh:\n        return yaml.safe_load(fh)\n
"},{"location":"api/utils/#camlhmp.utils.write_tsv","title":"camlhmp.utils.write_tsv(data, output)","text":"

Write the dictionary to a TSV file.

Parameters:

Name Type Description Default data list

a list of dicts to be written

required output str

The output file

required

Examples:

>>> from camlhmp.utils import write_tsv\n>>> write_tsv(data, \"results.tsv\")\n
Source code in camlhmp/utils.py
def write_tsv(data: list, output: str):\n    \"\"\"\n    Write the dictionary to a TSV file.\n\n    Args:\n        data (list): a list of dicts to be written\n        output (str): The output file\n\n    Examples:\n        >>> from camlhmp.utils import write_tsv\n        >>> write_tsv(data, \"results.tsv\")\n    \"\"\"\n    logging.debug(f\"Writing TSV results to {output}\")\n    with open(output, \"w\") as csvfile:\n        writer = csv.DictWriter(csvfile, delimiter=\"\\t\", fieldnames=data[0].keys())\n        writer.writeheader()\n        if next(iter(data[0].values())) != \"NO_HITS\":\n            # Data is not empty\n            writer.writerows(data)\n        else:\n            # Data is empty\n            logging.debug(\"NO_HITS found, only writing the column headers\")\n
"},{"location":"api/engines/blast/","title":"camlhmp.engines.blast","text":"

Below are the functions available in the camlhmp.engines.blast module.

"},{"location":"api/engines/blast/#camlhmp.engines.blast.run_blast","title":"camlhmp.engines.blast.run_blast(engine, subject, query, min_pident, min_coverage)","text":"

Query sequences against a input subject using a specified BLAST+ algorithm.

Parameters:

Name Type Description Default engine str

The BLAST engine to use

required subject str

The subject database (input)

required query str

The query file (targets)

required min_pident float

The minimum percent identity to count a hit

required min_coverage int

The minimum percent coverage to count a hit

required

Returns:

Name Type Description list list

The parsed BLAST results, raw blast results, and stderr

Examples:

>>> from camlhmp.engines.blast import run_blast\n>>> hits, blast_stdout, blast_stderr = run_blast(\n        framework[\"engine\"][\"tool\"], input_path, targets_path, min_pident, min_coverage\n    )\n
Source code in camlhmp/engines/blast.py
def run_blast(engine: str, subject: str, query: str, min_pident: float, min_coverage: int) -> list:\n    \"\"\"\n    Query sequences against a input subject using a specified BLAST+ algorithm.\n\n    Args:\n        engine (str): The BLAST engine to use\n        subject (str): The subject database (input)\n        query (str): The query file (targets)\n        min_pident (float): The minimum percent identity to count a hit\n        min_coverage (int): The minimum percent coverage to count a hit\n\n    Returns:\n        list: The parsed BLAST results, raw blast results, and stderr\n\n    Examples:\n        >>> from camlhmp.engines.blast import run_blast\n        >>> hits, blast_stdout, blast_stderr = run_blast(\n                framework[\"engine\"][\"tool\"], input_path, targets_path, min_pident, min_coverage\n            )\n    \"\"\"\n    outfmt = \" \".join(BLASTN_COLS)\n    cat_type = \"zcat\" if str(subject).endswith(\".gz\") else \"cat\"\n    qcov_hsp_perc = f\"-qcov_hsp_perc {min_coverage}\" if min_coverage else \"\"\n    perc_identity = f\"-perc_identity {min_pident}\" if min_pident and engine != \"tblastn\" else \"\"\n    stdout, stderr = execute(\n        f\"{cat_type} {subject} | {engine} -query {query} -subject - -outfmt '6 {outfmt}' {qcov_hsp_perc} {perc_identity}\",\n        capture=True,\n    )\n\n    # Convert BLAST results to a list of dicts\n    results = []\n    target_hits = []\n    for line in stdout.split(\"\\n\"):\n        if line == \"\":\n            continue\n        cols = line.split(\"\\t\")\n        results.append(dict(zip(BLASTN_COLS, cols)))\n        target_hits.append(cols[0])\n\n    if not results:\n        # Create an empty dict if no results are found\n        results.append(dict(zip(BLASTN_COLS, [\"NO_HITS\"] * len(BLASTN_COLS))))\n\n    return [target_hits, results, stderr]\n
"},{"location":"api/engines/blast/#camlhmp.engines.blast.run_blastn","title":"camlhmp.engines.blast.run_blastn(subject, query, min_pident, min_coverage)","text":"

An alias for run_blast which uses blastn

Parameters:

Name Type Description Default subject str

The subject database (input)

required query str

The query file (targets)

required min_pident float

The minimum percent identity to count a hit

required min_coverage int

The minimum percent coverage to count a hit

required

Returns:

Name Type Description list list

The parsed BLAST results, raw blast results, and stderr

Examples:

>>> from camlhmp.engines.blast import run_blastn\n>>> hits, blast_stdout, blast_stderr = run_blastn(\n        input_path, targets_path, min_pident, min_coverage\n    )\n
Source code in camlhmp/engines/blast.py
def run_blastn(subject: str, query: str, min_pident: float, min_coverage: int) -> list:\n    \"\"\"\n    An alias for `run_blast` which uses `blastn`\n\n    Args:\n        subject (str): The subject database (input)\n        query (str): The query file (targets)\n        min_pident (float): The minimum percent identity to count a hit\n        min_coverage (int): The minimum percent coverage to count a hit\n\n    Returns:\n        list: The parsed BLAST results, raw blast results, and stderr\n\n    Examples:\n        >>> from camlhmp.engines.blast import run_blastn\n        >>> hits, blast_stdout, blast_stderr = run_blastn(\n                input_path, targets_path, min_pident, min_coverage\n            )\n    \"\"\"\n    return run_blast(\"blastn\", subject, query, min_pident, min_coverage)\n
"},{"location":"api/engines/blast/#camlhmp.engines.blast.run_tblastn","title":"camlhmp.engines.blast.run_tblastn(subject, query, min_pident, min_coverage)","text":"

An alias for run_blast which uses tblastn.

Parameters:

Name Type Description Default subject str

The subject database (input)

required query str

The query file (targets)

required min_pident float

The minimum percent identity to count a hit

required min_coverage int

The minimum percent coverage to count a hit

required

Returns:

Name Type Description list list

The parsed BLAST results, raw blast results, and stderr

Examples:

>>> from camlhmp.engines.blast import run_tblastn\n>>> hits, blast_stdout, blast_stderr = run_tblastn(\n        input_path, targets_path, min_pident, min_coverage\n    )\n
Source code in camlhmp/engines/blast.py
def run_tblastn(subject: str, query: str, min_pident: float, min_coverage: int) -> list:\n    \"\"\"\n    An alias for `run_blast` which uses `tblastn`.\n\n    Args:\n        subject (str): The subject database (input)\n        query (str): The query file (targets)\n        min_pident (float): The minimum percent identity to count a hit\n        min_coverage (int): The minimum percent coverage to count a hit\n\n    Returns:\n        list: The parsed BLAST results, raw blast results, and stderr\n\n    Examples:\n        >>> from camlhmp.engines.blast import run_tblastn\n        >>> hits, blast_stdout, blast_stderr = run_tblastn(\n                input_path, targets_path, min_pident, min_coverage\n            )\n    \"\"\"\n    return run_blast(\"tblastn\", subject, query, min_pident, min_coverage)\n
"},{"location":"api/parsers/blast/","title":"camlhmp.parsers.blast","text":"

Below are the functions available in the camlhmp.parsers.blast module.

"},{"location":"api/parsers/blast/#camlhmp.parsers.blast.get_blast_allele_hits","title":"camlhmp.parsers.blast.get_blast_allele_hits(targets, results, min_pident, min_coverage)","text":"

Find the allele hits in the BLAST results.

Parameters:

Name Type Description Default targets dict

The list of target sequences {id: len(seq)}

required results list of dict

The BLAST results

required min_pident float

The minimum percent identity to count a hit

required min_coverage int

The minimum percent coverage to count a hit

required

Returns:

Name Type Description dict dict

The allele hits

Examples:

>>> from camlhmp.parsers.blast import get_blast_allele_hits\n>>> target_results = get_blast_allele_hits(framework[\"targets\"], blast_stdout, min_pident, min_coverage)\n
Source code in camlhmp/parsers/blast.py
def get_blast_allele_hits(\n    targets: dict, results: dict, min_pident: float, min_coverage: int\n) -> dict:\n    \"\"\"\n    Find the allele hits in the BLAST results.\n\n    Args:\n        targets (dict): The list of target sequences {id: len(seq)}\n        results (list of dict): The BLAST results\n        min_pident (float): The minimum percent identity to count a hit\n        min_coverage (int): The minimum percent coverage to count a hit\n\n    Returns:\n        dict: The allele hits\n\n    Examples:\n        >>> from camlhmp.parsers.blast import get_blast_allele_hits\n        >>> target_results = get_blast_allele_hits(framework[\"targets\"], blast_stdout, min_pident, min_coverage)\n    \"\"\"\n    # Aggregate the hits for each target\n    target_results = {}\n\n    for result in results:\n        # Only process real hits\n        if result[\"qseqid\"] != \"NO_HITS\":\n            target, allele = result[\"qseqid\"].rsplit(\"_\", 1)\n            if target not in target_results:\n                target_results[target] = {\n                    \"known\": [],\n                    \"novel\": [],\n                }\n\n            # only process hits that meet minimum criteria\n            if float(result[\"pident\"]) >= min_pident and int(result[\"qcovs\"]) >= min_coverage:\n                # hits that meet requirements\n\n                # Default to \"NEW\" allele, if perfect match use the allele ID\n                final_allele = \"NEW\"\n                final_type = \"novel\"\n                if float(result[\"pident\"]) == 100 and int(result[\"qcovs\"]) == 100:\n                    final_allele = allele\n                    final_type = \"known\"\n\n                target_results[target][final_type].append({\n                        \"id\": final_allele,\n                        \"qcovs\": result[\"qcovs\"],\n                        \"pident\": float(result[\"pident\"]),\n                        \"bitscore\": result[\"bitscore\"],\n                })\n\n    final_allele_hits = {}\n    for target in targets:\n        final_allele_hits[target] = {\n            \"id\": \"-\",\n            \"qcovs\": 0,\n            \"pident\": 0,\n            \"bitscore\": 0,\n            \"comment\": \"No hits met thresholds\",\n        }\n\n    for target in target_results:\n        if len(target_results[target][\"known\"]):\n            # exact matches to known alleles were found\n            if len(target_results[target][\"known\"]) == 1:\n                final_allele_hits[target] = target_results[target][\"known\"][0]\n                final_allele_hits[target][\"comment\"] = \"\"\n            else:\n                # multiple hits\n                final_alleles = []\n                for hit in target_results[target][\"known\"]:\n                    final_alleles.append(hit[\"id\"])\n\n                final_allele_hits[target] = target_results[target][\"known\"][0]\n                final_allele_hits[target][\"id\"] = \",\".join(final_alleles)\n                final_allele_hits[target][\"comment\"] = \"Exact matches to multiple alleles\"\n        elif len(target_results[target][\"novel\"]):\n            # no exact matches to known alleles were found, but thresholds were met\n\n            # report the top scores\n            if len(target_results[target][\"novel\"]) == 1:\n                final_allele_hits[target] = target_results[target][\"novel\"][0]\n                final_allele_hits[target][\"comment\"] = \"\"\n            else:\n                # multiple hits, only report highest score\n                final_allele_hits[target] = sorted(target_results[target][\"novel\"], key=lambda x: x[\"bitscore\"], reverse=True)[0]\n                final_allele_hits[target][\"comment\"] = \"No exact matches to known alleles\"\n\n    # Debugging information\n    logging.debug(\"camlhmp.engines.blast.get_blast_allele_hits\")\n    logging.debug(f\"Allele Hits: {final_allele_hits}\")\n\n    return final_allele_hits\n
"},{"location":"api/parsers/blast/#camlhmp.parsers.blast.get_blast_region_hits","title":"camlhmp.parsers.blast.get_blast_region_hits(targets, results, min_pident, min_coverage)","text":"

Aggregate multiple target hits for a region from the BLAST results.

Parameters:

Name Type Description Default targets dict

The list of target sequences {id: len(seq)}

required results list of dict

The BLAST results

required min_pident float

The minimum percent identity to count a hit

required min_coverage int

The minimum percent coverage to count a hit

required

Returns:

Name Type Description dict dict

The target hits

Examples:

>>> from camlhmp.parsers.blast import get_blast_region_hits\n>>> target_results = get_blast_region_hits(target_lengths, blast_stdout, min_pident, min_coverage)\n
Source code in camlhmp/parsers/blast.py
def get_blast_region_hits(\n    targets: dict, results: dict, min_pident: float, min_coverage: int\n) -> dict:\n    \"\"\"\n    Aggregate multiple target hits for a region from the BLAST results.\n\n    Args:\n        targets (dict): The list of target sequences {id: len(seq)}\n        results (list of dict): The BLAST results\n        min_pident (float): The minimum percent identity to count a hit\n        min_coverage (int): The minimum percent coverage to count a hit\n\n    Returns:\n        dict: The target hits\n\n    Examples:\n        >>> from camlhmp.parsers.blast import get_blast_region_hits\n        >>> target_results = get_blast_region_hits(target_lengths, blast_stdout, min_pident, min_coverage)\n    \"\"\"\n    # Aggregate the hits for each target\n    target_results = {}\n    for target, length in targets.items():\n        target_results[target] = {\n            \"hits\": [],\n            \"coverage\": [0] * length,  # Used to calculate coverage across multiple hits\n            \"comment\": [],\n        }\n\n    # Process each blast hit\n    for result in results:\n        # Only process real hits\n        if result[\"qseqid\"] != \"NO_HITS\":\n            # Only keep hits that pass the minimum percent identity\n            if float(result[\"pident\"]) >= min_pident:\n                # Add hit to list of hits\n                target_results[result[\"qseqid\"]][\"hits\"].append(result)\n\n                # Set the coverage to 1 for each base in the hit\n                for i in range(int(result[\"qstart\"]) - 1, int(result[\"qend\"])):\n                    target_results[result[\"qseqid\"]][\"coverage\"][i] += 1\n\n    # Determine coverage for each target\n    final_results = {}\n    for target, vals in target_results.items():\n        final_results[target] = {\n            \"hits\": vals[\"hits\"],\n            \"coverage\": 100\n            * (\n                sum([1 for i in vals[\"coverage\"] if i > 0])\n                / float(len(vals[\"coverage\"]))\n            ),\n            \"comment\": [],\n        }\n        if len(vals[\"hits\"]) > 1:\n            final_results[target][\"comment\"].append(\n                f\"Coverage based on {len(vals['hits'])} hits\"\n            )\n\n        if sum([1 for i in vals[\"coverage\"] if i > 1]):\n            final_results[target][\"comment\"].append(\n                \"There were one or more overlapping hits\"\n            )\n\n    # Debugging information\n    logging.debug(\"camlhmp.engines.blast_region.get_blast_region_hits\")\n    logging.debug(f\"Profile Hits: {final_results}\")\n\n    return final_results\n
"},{"location":"api/parsers/blast/#camlhmp.parsers.blast.get_blast_target_hits","title":"camlhmp.parsers.blast.get_blast_target_hits(targets, results)","text":"

Find the target hits in the BLAST results.

Parameters:

Name Type Description Default targets list

The list of target sequences

required results dict

The BLAST results

required

Returns:

Name Type Description dict dict

The target hits

Examples:

>>> from camlhmp.parsers.blast import get_blast_target_hits\n>>> target_results = get_blast_target_hits(framework[\"targets\"], hits)\n
Source code in camlhmp/parsers/blast.py
def get_blast_target_hits(targets: list, results: dict) -> dict:\n    \"\"\"\n    Find the target hits in the BLAST results.\n\n    Args:\n        targets (list): The list of target sequences\n        results (dict): The BLAST results\n\n    Returns:\n        dict: The target hits\n\n    Examples:\n        >>> from camlhmp.parsers.blast import get_blast_target_hits\n        >>> target_results = get_blast_target_hits(framework[\"targets\"], hits)\n    \"\"\"\n    target_hits = {}\n    for target in targets:\n        target_hits[target] = False\n        if target in results:\n            target_hits[target] = True\n\n    # Debugging information\n    logging.debug(\"camlhmp.engines.blast.get_blast_target_hits\")\n    logging.debug(f\"Profile Hits: {target_hits}\")\n\n    return target_hits\n
"},{"location":"cli/","title":"camlhmp CLI Reference","text":"

camlhmp provides a set of command line interface (CLI) commands for typing organisms. These commands are designed to be easy to use and provide a simple way to type organisms using the available engines and schemas.

Currently the following commands are available in the camlhmp CLI:

Command Description camlhmp-blast-alleles Classify assemblies using BLAST against alleles of a set of genes camlhmp-blast-regions Classify assemblies using BLAST against larger genomic regions camlhmp-blast-targets Classify assemblies using BLAST against individual genes or proteins camlhmp-extract Extract typing targets from a set of reference sequences"},{"location":"cli/camlhmp-extract/","title":"camlhmp-extract","text":""},{"location":"cli/camlhmp-extract/#camlhmp-extract","title":"camlhmp-extract","text":"

camlhmp-extract is a command that allows users to extract targets from a set of references. You should think of this script as a \"helper\" script for curators. It allows you to maintain a TSV file with the targets and their positions in the reference sequences. camlhmp-extract will then extract the targets from the reference sequences and write them to a FASTA file.

"},{"location":"cli/camlhmp-extract/#usage","title":"Usage","text":"
 \ud83d\udc2a camlhmp-extract \ud83d\udc2a - Extract typing targets from a set of reference sequences\n\n\u256d\u2500 Required Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 *  --path     -i  TEXT  The path where input files are located [required]                   \u2502\n\u2502 *  --targets  -t  TEXT  A TSV of targets to extract in FASTA format [required]              \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n\u256d\u2500 Additional Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 --outdir   -o  TEXT  The path to save the extracted targets                                 \u2502\n\u2502 --verbose            Increase the verbosity of output                                       \u2502\n\u2502 --silent             Only critical errors will be printed                                   \u2502\n\u2502 --version  -V        Show the version and exit.                                             \u2502\n\u2502 --help               Show this message and exit.                                            \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n
"},{"location":"cli/blast/camlhmp-blast-alleles/","title":"camlhmp-blast-alleles","text":"

camlhmp-blast-alleles is a command that allows users to type their samples using a provided schema with BLAST algorithms. This command is useful when the schema is typing specific alleles of a gene or set of genes (e.g. MLST).

 Usage: camlhmp-blast-alleles [OPTIONS]\n\n \ud83d\udc2a camlhmp-blast-alleles \ud83d\udc2a - Classify assemblies using BLAST against alleles of\n a set of genes\n\n\u256d\u2500 Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 *  --input         -i  TEXT     Input file in FASTA format to classify         \u2502\n\u2502                                 [required]                                     \u2502\n\u2502 *  --yaml          -y  TEXT     YAML file documenting the targets and types    \u2502\n\u2502                                 [required]                                     \u2502\n\u2502 *  --targets       -t  TEXT     Query targets in FASTA format [required]       \u2502\n\u2502    --outdir        -o  PATH     Directory to write output [default: ./]        \u2502\n\u2502    --prefix        -p  TEXT     Prefix to use for output files                 \u2502\n\u2502                                 [default: camlhmp]                             \u2502\n\u2502    --min-pident        INTEGER  Minimum percent identity to count a hit        \u2502\n\u2502                                 [default: 95]                                  \u2502\n\u2502    --min-coverage      INTEGER  Minimum percent coverage to count a hit        \u2502\n\u2502                                 [default: 95]                                  \u2502\n\u2502    --force                      Overwrite existing reports                     \u2502\n\u2502    --verbose                    Increase the verbosity of output               \u2502\n\u2502    --silent                     Only critical errors will be printed           \u2502\n\u2502    --version                    Print schema and camlhmp version               \u2502\n\u2502    --help                       Show this message and exit.                    \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n
"},{"location":"cli/blast/camlhmp-blast-alleles/#example-usage","title":"Example Usage","text":"

To run camlhmp-blast-alleles, you will need a FASTA file of your input sequences, a YAML file with the schema, and a FASTA file with the targets. Below is an example of how to run camlhmp-blast-alleles using available test data.

camlhmp-blast-alleles \\\n    --yaml tests/data/blast/alleles/spn-pbptype.yaml \\\n    --targets tests/data/blast/alleles/spn-pbptype.fasta \\\n    --input tests/data/blast/alleles/SRR2912551.fna.gz\n\nRunning camlhmp with following parameters:\n    --input tests/data/blast/alleles/SRR2912551.fna.gz\n    --yaml tests/data/blast/alleles/spn-pbptype.yaml\n    --targets tests/data/blast/alleles/spn-pbptype.fasta\n    --outdir ./\n    --prefix camlhmp\n    --min-pident 95\n    --min-coverage 95\n\nStarting camlhmp for S. pneumoniae PBP typing...\nRunning tblastn...\nProcessing hits...\nFinal Results...\n                               S. pneumoniae PBP typing\n\u250f\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2513\n\u2503 \u2026 \u2503 \u2026 \u2503 \u2026 \u2503 \u2026 \u2503 \u2026 \u2503 \u2026 \u2503 \u2026 \u2503 \u2026 \u2503 \u2026 \u2503 1\u2026 \u2503 \u2026 \u2503 2\u2026 \u2503 \u2026 \u2503 2\u2026 \u2503 \u2026 \u2503 2\u2026 \u2503 \u2026 \u2503 2\u2026 \u2503 \u2026 \u2503 2\u2026 \u2503\n\u2521\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2529\n\u2502 \u2026 \u2502 \u2026 \u2502 \u2026 \u2502 \u2026 \u2502 \u2026 \u2502 \u2026 \u2502 \u2026 \u2502 \u2026 \u2502 \u2026 \u2502    \u2502 0 \u2502 1\u2026 \u2502 \u2026 \u2502 5\u2026 \u2502   \u2502 2  \u2502 \u2026 \u2502 1\u2026 \u2502 \u2026 \u2502    \u2502\n\u2514\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2518\nWriting outputs...\nFinal predicted type written to ./camlhmp.tsv\ntblastn results written to ./camlhmp.tblastn.tsv\n

Note

The table printed to STDOUT by camlhmp-blast-alleles has been purposefully truncated for viewing on the docs. It is the same information that that is in {PREFIX}.tsv.

"},{"location":"cli/blast/camlhmp-blast-alleles/#output-files","title":"Output Files","text":"

camlhmp-blast-region will generate three output files:

File Name Description {PREFIX}.tsv A tab-delimited file with the predicted type {PREFIX}.blast.tsv A tab-delimited file of all blast hits"},{"location":"cli/blast/camlhmp-blast-alleles/#prefixtsv","title":"{PREFIX}.tsv","text":"

The {PREFIX}.tsv file is a tab-delimited file with the predicted type. The columns are:

Column Description sample The sample name as determined by --prefix schema The schema used to determine the type schema_version The version of the schema used camlhmp_version The version of camlhmp used params The parameters used for the analysis {TARGET}_id The allele ID for a target hit {TARGET}_pident The percent identity of the hit {TARGET}_qcovs The percent coverage of the hit {TARGET}_bitscore The bitscore of the hit {TARGET}_comment A small comment about the hit

Below is an example of the {PREFIX}.tsv file:

sample  schema  schema_version  camlhmp_version params  1A_id   1A_pident   1A_qcovs    1A_bitscore 1A_comment  2B_id   2B_pident   2B_qcovs    2B_bitscore 2B_comment  2X_id   2X_pident   2X_qcovs    2X_bitscore 2X_comment\ncamlhmp pbptype_partial 0.0.1   0.3.1   min-coverage=95;min-pident=95   23  100.0   100 556     0   100.0   100 567     2   100.0   100 741 \n
"},{"location":"cli/blast/camlhmp-blast-alleles/#prefixblasttsv","title":"{PREFIX}.blast.tsv","text":"

The {PREFIX}.blast.tsv file is a tab-delimited file of the raw output for all blast hits. The columns are the standard BLAST output with -outfmt 6.

Here is an example of the {PREFIX}.blast.tsv file:

qseqid  sseqid  pident  qcovs   qlen    slen    length  nident  mismatch    gapopen qstart  qend    sstart  send    evalue  bitscore\n1A_0    NODE_223_length_8196_cov_21.291849  99.638  100 276 8324    276 275 1   0   1   276 1807    2634    0.0 555\n1A_1    NODE_223_length_8196_cov_21.291849  99.638  100 276 8324    276 275 1   0   1   276 1807    2634    0.0 555\n1A_2    NODE_223_length_8196_cov_21.291849  99.275  100 276 8324    276 274 2   0   1   276 1807    2634    0.0 554\n1A_3    NODE_223_length_8196_cov_21.291849  99.275  100 276 8324    276 274 2   0   1   276 1807    2634    0.0 553\n1A_4    NODE_223_length_8196_cov_21.291849  84.420  100 276 8324    276 233 43  0   1   276 1807    2634    3.91e-155   474\n1A_23   NODE_223_length_8196_cov_21.291849  100.000 100 276 8324    276 276 0   0   1   276 1807    2634    0.0 556\n2B_0    NODE_878_length_2854_cov_17.976875  100.000 100 277 2982    277 277 0   0   1   277 1218    2048    0.0 567\n2B_1    NODE_878_length_2854_cov_17.976875  87.365  100 277 2982    277 242 35  0   1   277 1218    2048    3.24e-173   501\n2B_2    NODE_878_length_2854_cov_17.976875  99.278  100 277 2982    277 275 2   0   1   277 1218    2048    0.0 563\n2B_3    NODE_878_length_2854_cov_17.976875  99.639  100 277 2982    277 276 1   0   1   277 1218    2048    0.0 565\n2B_4    NODE_878_length_2854_cov_17.976875  99.639  100 277 2982    277 276 1   0   1   277 1218    2048    0.0 565\n2X_0    NODE_210_length_5085_cov_16.539627  99.721  100 358 5213    358 357 1   0   1   358 3172    2099    0.0 740\n2X_1    NODE_210_length_5085_cov_16.539627  92.179  100 358 5213    358 330 28  0   1   358 3172    2099    0.0 688\n2X_1    NODE_878_length_2854_cov_17.976875  23.797  99  358 2982    395 94  230 17  1   353 915 2012    1.95e-06    45.8\n2X_2    NODE_210_length_5085_cov_16.539627  100.000 100 358 5213    358 358 0   0   1   358 3172    2099    0.0 741\n2X_3    NODE_210_length_5085_cov_16.539627  99.721  100 358 5213    358 357 1   0   1   358 3172    2099    0.0 739\n2X_4    NODE_210_length_5085_cov_16.539627  99.441  100 358 5213    358 356 2   0   1   358 3172    2099    0.0 738\n
"},{"location":"cli/blast/camlhmp-blast-alleles/#prefixdetailstsv","title":"{PREFIX}.details.tsv","text":"

The {PREFIX}.details.tsv file is a tab-delimited file with details for each type. This file can be useful for seeing how a sample did against all other types in a schema.

The columns in this file are:

Column Description sample The sample name as determined by --prefix type The predicted type status The status of the type (True if failed) targets The targets for the given type that had a match missing The targets for the given type that were not found coverage The coverage of the target region hits The number of hits used to calculate coverage of the target region schema The schema used to determine the type schema_version The version of the schema used camlhmp_version The version of camlhmp used params The parameters used for the analysis comment A small comment about the result

Below is an example of the {PREFIX}.details.tsv file:

sample  type    status  targets missing coverage    hits    schema  schema_version  camlhmp_version params  comment\ncamlhmp O1  False       O1  12.49   2   pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   Coverage based on 2 hits\ncamlhmp O2  False   O2  wzyB    100.00,0.00 1,0 pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   \ncamlhmp O3  False       O3  1.43    1   pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   \ncamlhmp O4  False       O4  13.86   2   pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   Coverage based on 2 hits\ncamlhmp O5  True    O2      100.00  1   pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   \n
"},{"location":"cli/blast/camlhmp-blast-regions/","title":"camlhmp-blast-regions","text":"

camlhmp-blast-regions is a command that allows users to search for full regions of interest. It is nearly identical to camlhmp-blast-targets, but instead of many smaller targets the idea is to instead look at full regions such as O-antigens and or similar features.

"},{"location":"cli/blast/camlhmp-blast-regions/#usage","title":"Usage","text":"
 Usage: camlhmp-blast-regions [OPTIONS]\n\n \ud83d\udc2a camlhmp-blast-regions \ud83d\udc2a - Classify assemblies using BLAST against larger genomic\n regions\n\n\u256d\u2500 Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 *  --input         -i  TEXT     Input file in FASTA format to classify [required]   \u2502\n\u2502 *  --yaml          -y  TEXT     YAML file documenting the targets and types         \u2502\n\u2502                                 [required]                                          \u2502\n\u2502 *  --targets       -t  TEXT     Query targets in FASTA format [required]            \u2502\n\u2502    --outdir        -o  PATH     Directory to write output [default: ./]             \u2502\n\u2502    --prefix        -p  TEXT     Prefix to use for output files [default: camlhmp]   \u2502\n\u2502    --min-pident        INTEGER  Minimum percent identity to count a hit             \u2502\n\u2502                                 [default: 95]                                       \u2502\n\u2502    --min-coverage      INTEGER  Minimum percent coverage to count a hit             \u2502\n\u2502                                 [default: 95]                                       \u2502\n\u2502    --force                      Overwrite existing reports                          \u2502\n\u2502    --verbose                    Increase the verbosity of output                    \u2502\n\u2502    --silent                     Only critical errors will be printed                \u2502\n\u2502    --version                    Print schema and camlhmp version                    \u2502\n\u2502    --help                       Show this message and exit.                         \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n
"},{"location":"cli/blast/camlhmp-blast-regions/#example-usage","title":"Example Usage","text":"

To run camlhmp-blast-regions, you will need a FASTA file of your input sequences, a YAML file with the schema, and a FASTA file with the targets. Below is an example of how to run camlhmp-blast-regions using available test data.

camlhmp-blast-regions \\\n    --yaml tests/data/blast/regions/pseudomonas-serogroup.yaml \\\n    --targets tests/data/blast/regions/pseudomonas-serogroup.fasta \\\n    --input tests/data/blast/regions/O1-GCF_000504045.fna.gz\n\nRunning camlhmp with following parameters:\n    --input tests/data/blast/regions/O1-GCF_000504045.fna.gz\n    --yaml tests/data/blast/regions/pseudomonas-serogroup.yaml\n    --targets tests/data/blast/regions/pseudomonas-serogroup.fasta\n    --outdir ./\n    --prefix camlhmp\n    --min-pident 95\n    --min-coverage 95\n\nStarting camlhmp for Pseudomonas Serogrouping...\nRunning blastn...\nProcessing hits...\nFinal Results...\n                               Pseudomonas Serogrouping\n\u250f\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2513\n\u2503 sample \u2503 type \u2503 targe\u2026 \u2503 cover\u2026 \u2503 hits \u2503 schema \u2503 schem\u2026 \u2503 camlh\u2026 \u2503 params \u2503 comme\u2026 \u2503\n\u2521\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2529\n\u2502 camlh\u2026 \u2502 O1   \u2502 O1     \u2502 100.00 \u2502 1    \u2502 pseud\u2026 \u2502 0.0.1  \u2502 0.3.1  \u2502 min-c\u2026 \u2502        \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\nWriting outputs...\nFinal predicted type written to ./camlhmp.tsv\nResults against each type written to ./camlhmp.details.tsv\nblastn results written to ./camlhmp.blastn.tsv\n

Note

The table printed to STDOUT by camlhmp-blast-regions has been purposefully truncated for viewing on the docs. It is the same information that that is in {PREFIX}.tsv.

"},{"location":"cli/blast/camlhmp-blast-regions/#output-files","title":"Output Files","text":"

camlhmp-blast-region will generate three output files:

File Name Description {PREFIX}.tsv A tab-delimited file with the predicted type {PREFIX}.blast.tsv A tab-delimited file of all blast hits {PREFIX}.details.tsv A tab-delimited file with details for each type"},{"location":"cli/blast/camlhmp-blast-regions/#prefixtsv","title":"{PREFIX}.tsv","text":"

The {PREFIX}.tsv file is a tab-delimited file with the predicted type. The columns are:

Column Description sample The sample name as determined by --prefix type The predicted type targets The targets for the given type that had a hit coverage The coverage of the target region hits The number of hits used to calculate coverage of the target region schema The schema used to determine the type schema_version The version of the schema used camlhmp_version The version of camlhmp used params The parameters used for the analysis comment A small comment about the result

Below is an example of the {PREFIX}.tsv file:

sample  type    targets coverage    hits    schema  schema_version  camlhmp_version params  comment\ncamlhmp O5  O2  100.00  1   pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   \n
"},{"location":"cli/blast/camlhmp-blast-regions/#prefixblasttsv","title":"{PREFIX}.blast.tsv","text":"

The {PREFIX}.blast.tsv file is a tab-delimited file of the raw output for all blast hits. The columns are the standard BLAST output with -outfmt 6.

Here is an example of the {PREFIX}.blast.tsv file:

qseqid  sseqid  pident  qcovs   qlen    slen    length  nident  mismatch    gapopen qstart  qend    sstart  send    evalue  bitscore\nwzyB    NZ_PSQS01000003.1   88.403  99  1140    6935329 595 526 69  0   545 1139    6874509 6875103 0.0 717\nwzyB    NZ_PSQS01000003.1   88.403  99  1140    6935329 595 526 69  0   545 1139    6920911 6921505 0.0 717\nwzyB    NZ_PSQS01000003.1   89.444  99  1140    6935329 540 483 56  1   1   539 6872864 6873403 0.0 680\nwzyB    NZ_PSQS01000003.1   89.444  99  1140    6935329 540 483 56  1   1   539 6919266 6919805 0.0 680\nO1  NZ_PSQS01000003.1   97.972  12  18368   6935329 1972    1932    38  2   16398   18368   6620589 6618619 0.0 3419\nO1  NZ_PSQS01000003.1   96.296  12  18368   6935329 324 312 11  1   1   323 6641914 6641591 1.68e-149   531\nO2  NZ_PSQS01000003.1   99.841  100 23303   6935329 23303   23266   30  1   1   23303   6618619 6641914 0.0 42821\nO2  NZ_PSQS01000003.1   86.935  100 23303   6935329 1240    1078    130 12  2542    3749    3864567 3863328 0.0 1363\nO3  NZ_PSQS01000003.1   94.442  13  20210   6935329 2393    2260    114 15  1   2386    6618619 6620999 0.0 3664\nO3  NZ_PSQS01000003.1   99.308  13  20210   6935329 289 287 2   0   19922   20210   6641626 6641914 3.09e-147   523\nO4  NZ_PSQS01000003.1   97.448  14  15279   6935329 1842    1795    47  0   1   1842    6618619 6620460 0.0 3142\nO4  NZ_PSQS01000003.1   99.638  14  15279   6935329 276 275 1   0   15004   15279   6641639 6641914 8.46e-142   505\n
"},{"location":"cli/blast/camlhmp-blast-regions/#prefixdetailstsv","title":"{PREFIX}.details.tsv","text":"

The {PREFIX}.details.tsv file is a tab-delimited file with details for each type. This file can be useful for seeing how a sample did against all other types in a schema.

The columns in this file are:

Column Description sample The sample name as determined by --prefix type The predicted type status The status of the type (True if failed) targets The targets for the given type that had a match missing The targets for the given type that were not found coverage The coverage of the target region hits The number of hits used to calculate coverage of the target region schema The schema used to determine the type schema_version The version of the schema used camlhmp_version The version of camlhmp used params The parameters used for the analysis comment A small comment about the result

Below is an example of the {PREFIX}.details.tsv file:

sample  type    status  targets missing coverage    hits    schema  schema_version  camlhmp_version params  comment\ncamlhmp O1  False       O1  12.49   2   pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   Coverage based on 2 hits\ncamlhmp O2  False   O2  wzyB    100.00,0.00 1,0 pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   \ncamlhmp O3  False       O3  1.43    1   pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   \ncamlhmp O4  False       O4  13.86   2   pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   Coverage based on 2 hits\ncamlhmp O5  True    O2      100.00  1   pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   \n
"},{"location":"cli/blast/camlhmp-blast-regions/#example-implementation","title":"Example Implementation","text":"

If you would like to see how camlhmp-blast-regions can be used, please see pasty. In pasty the schema is set up to directly use camlhmp-blast-regions to classify samples without any extra logic.

This allows for a simple wrapper like the following:

#!/usr/bin/env bash\npasty_dir=$(dirname $0)\n\nCAML_YAML=\"${pasty_dir}/../data/pa-osa.yaml\" \\\nCAML_TARGETS=\"${pasty_dir}/../data/pa-osa.fasta\" \\\n    camlhmp-blast-regions \\\n    \"${@:1}\"\n

This script will run camlhmp-blast-regions with the pasty schema and targets.

"},{"location":"cli/blast/camlhmp-blast-targets/","title":"camlhmp-blast-targets","text":"

camlhmp-blast-targets is a command that allows users to type their samples using a provided schema with BLAST algorithms. This command is useful when a schema is looking at full length genes or proteins.

"},{"location":"cli/blast/camlhmp-blast-targets/#usage","title":"Usage","text":"
 Usage: camlhmp-blast-targets [OPTIONS]\n\n \ud83d\udc2a camlhmp-blast-targets \ud83d\udc2a - Classify assemblies using BLAST against individual\n genes or proteins\n\n\u256d\u2500 Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 *  --input         -i  TEXT     Input file in FASTA format to classify [required]   \u2502\n\u2502 *  --yaml          -y  TEXT     YAML file documenting the targets and types         \u2502\n\u2502                                 [required]                                          \u2502\n\u2502 *  --targets       -t  TEXT     Query targets in FASTA format [required]            \u2502\n\u2502    --outdir        -o  PATH     Directory to write output [default: ./]             \u2502\n\u2502    --prefix        -p  TEXT     Prefix to use for output files [default: camlhmp]   \u2502\n\u2502    --min-pident        INTEGER  Minimum percent identity to count a hit             \u2502\n\u2502                                 [default: 95]                                       \u2502\n\u2502    --min-coverage      INTEGER  Minimum percent coverage to count a hit             \u2502\n\u2502                                 [default: 95]                                       \u2502\n\u2502    --force                      Overwrite existing reports                          \u2502\n\u2502    --verbose                    Increase the verbosity of output                    \u2502\n\u2502    --silent                     Only critical errors will be printed                \u2502\n\u2502    --version                    Print schema and camlhmp version                    \u2502\n\u2502    --help                       Show this message and exit.                         \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n
"},{"location":"cli/blast/camlhmp-blast-targets/#example-usage","title":"Example Usage","text":"

To run camlhmp-blast-targets, you will need a FASTA file of your input sequences, a YAML file with the schema, and a FASTA file with the targets. Below is an example of how to run camlhmp-blast-targets using available test data.

camlhmp-blast-targets \\\n    --yaml tests/data/blast/targets/sccmec-partial.yaml \\\n    --targets tests/data/blast/targets/sccmec-partial.fasta \\\n    --input tests/data/blast/targets/sccmec-i.fasta\n\nRunning camlhmp with following parameters:\n    --input tests/data/blast/targets/sccmec-i.fasta\n    --yaml tests/data/blast/targets/sccmec-partial.yaml\n    --targets tests/data/blast/targets/sccmec-partial.fasta\n    --outdir ./\n    --prefix camlhmp\n    --min-pident 95\n    --min-coverage 95\n\nStarting camlhmp for SCCmec Typing...\nRunning blastn...\nProcessing hits...\nFinal Results...\n                                     SCCmec Typing\n\u250f\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2513\n\u2503 sample  \u2503 type \u2503 targets   \u2503 schema    \u2503 schema_v\u2026 \u2503 camlhmp\u2026 \u2503 params    \u2503 comment \u2503\n\u2521\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2529\n\u2502 camlhmp \u2502 I    \u2502 ccrA1,cc\u2026 \u2502 sccmec_p\u2026 \u2502 0.0.1     \u2502 0.3.1    \u2502 min-cove\u2026 \u2502         \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\nWriting outputs...\nFinal predicted type written to ./camlhmp.tsv\nResults against each type written to ./camlhmp.details.tsv\nblastn results written to ./camlhmp.blastn.tsv\n

Note

The table printed to STDOUT by camlhmp-blast-targets has been purposefully truncated for viewing on the docs. It is the same information that that is in {PREFIX}.tsv.

"},{"location":"cli/blast/camlhmp-blast-targets/#output-files","title":"Output Files","text":"

camlhmp-blast-targets will generate three output files:

File Name Description {PREFIX}.tsv A tab-delimited file with the predicted type {PREFIX}.blast.tsv A tab-delimited file of all blast hits {PREFIX}.details.tsv A tab-delimited file with details for each type"},{"location":"cli/blast/camlhmp-blast-targets/#prefixtsv","title":"{PREFIX}.tsv","text":"

The {PREFIX}.tsv file is a tab-delimited file with the predicted type. The columns are:

Column Description sample The sample name as determined by --prefix type The predicted type targets The targets for the given type that had a hit schema The schema used to determine the type schema_version The version of the schema used camlhmp_version The version of camlhmp used params The parameters used for the analysis comment A small comment about the result

Below is an example of the {PREFIX}.tsv file:

sample  type    targets schema  schema_version  camlhmp_version params  comment\ncamlhmp I   ccrA1,ccrB1,IS431,IS1272,mecA,mecR1 sccmec_partial  0.0.1   0.2.1   min-coverage=95;min-pident=95   \n
"},{"location":"cli/blast/camlhmp-blast-targets/#prefixblasttsv","title":"{PREFIX}.blast.tsv","text":"

The {PREFIX}.blast.tsv file is a tab-delimited file of the raw output for all blast hits. The columns are the standard BLAST output with -outfmt 6.

Here is an example of the {PREFIX}.blast.tsv file:

qseqid  sseqid  pident  qcovs   qlen    slen    length  nident  mismatch    gapopen qstart  qend    sstart  send    evalue  bitscore\nccrA1   AB033763.2  100.000 100 1350    39332   1350    1350    0   0   1   1350    23692   25041   0.0 2494\nccrB1   AB033763.2  100.000 100 1152    39332   1152    1152    0   0   1   1152    25063   26214   0.0 2128\nIS1272  AB033763.2  100.000 100 1659    39332   1659    1659    0   0   1   1659    28423   30081   0.0 3064\nmecR1   AB033763.2  100.000 100 987 39332   987 987 0   0   1   987 30304   31290   0.0 1823\nmecA    AB033763.2  99.950  100 2007    39332   2007    2006    1   0   1   2007    31390   33396   0.0 3701\nmecA    AB033763.2  99.950  100 2007    39332   2007    2006    1   0   1   2007    31390   33396   0.0 3701\nIS431   AB033763.2  99.873  100 790 39332   790 789 1   0   1   790 35958   36747   0.0 1454\nIS431   AB033763.2  100.000 100 792 39332   792 792 0   0   1   792 35957   36748   0.0 1463\n
"},{"location":"cli/blast/camlhmp-blast-targets/#prefixdetailstsv","title":"{PREFIX}.details.tsv","text":"

The {PREFIX}.details.tsv file is a tab-delimited file with details for each type. This file can be useful for seeing how a sample did against all other types in a schema.

The columns in this file are:

Column Description sample The sample name as determined by --prefix type The predicted type status The status of the type (True if failed) targets The targets for the given type that had a match missing The targets for the given type that were not found schema The schema used to determine the type schema_version The version of the schema used camlhmp_version The version of camlhmp used params The parameters used for the analysis comment A small comment about the result

Below is an example of the {PREFIX}.details.tsv file:

sample  type    status  targets missing schema  schema_version  camlhmp_version params  comment\ncamlhmp I   True    ccrA1,ccrB1,IS431,mecA,mecR1,IS1272     sccmec_partial  0.0.1   0.2.1   min-coverage=95;min-pident=95   \ncamlhmp II  False   IS431,mecA,mecR1    ccrA2,ccrB2,mecI    sccmec_partial  0.0.1   0.2.1   min-coverage=95;min-pident=95   \ncamlhmp III False   IS431,mecA,mecR1    ccrA3,ccrB3,mecI    sccmec_partial  0.0.1   0.2.1   min-coverage=95;min-pident=95   \ncamlhmp IV  False   IS431,mecA,mecR1,IS1272 ccrA2,ccrB2 sccmec_partial  0.0.1   0.2.1   min-coverage=95;min-pident=95   \n
"},{"location":"cli/blast/camlhmp-blast-targets/#example-implementation","title":"Example Implementation","text":"

If you would like to see how camlhmp-blast-targets can be used, please see sccmec. In sccmec the schema is set up to directly use camlhmp-blast-targets to classify samples without any extra logic.

This allows for a simple wrapper like the following:

#!/usr/bin/env bash\nsccmec_dir=$(dirname $0)\n\nCAML_YAML=\"${sccmec_dir}/../data/sccmec.yaml\" \\\nCAML_TARGETS=\"${sccmec_dir}/../data/sccmec.fasta\" \\\n    camlhmp-blast-targets \\\n    \"${@:1}\"\n

This script will run camlhmp-blast-targets with the sccmec schema and targets.

"}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"],"fields":{"title":{"boost":1000.0},"text":{"boost":1.0},"tags":{"boost":1000000.0}}},"docs":[{"location":"","title":"camlhmp","text":"

\ud83d\udc2a camlhmp \ud83d\udc2a - Classification through yAML Heuristic Mapping Protocol

camlhmp is a tool for generating organism typing tools from YAML schemas. Through discussions with Tim Read, we identified a need for a straightforward method to define and manage typing schemas for organisms of interest. YAML was chosen for its simplicity and readability.

"},{"location":"#purpose","title":"Purpose","text":"

The primary purpose of camlhmp is to provide a framework that enables researchers to independently define typing schemas for their organisms of interest using YAML. This approach facilitates the management and analysis biological data for researchers at any level of experience.

camlhmp does not supply pre-defined typing schemas. Instead, it equips researchers with the necessary tools to create and maintain their own schemas, ensuring these schemas can easily remain up to date with the latest scientific developments.

Finally, the development of camlhmp was driven by a practical need to streamline maintenance of multiple organism typing tools. Managing these tools separately is time-consuming and challenging. camlhmp simplifies this by providing a single framework for each tool.

"},{"location":"#documentation-overview","title":"Documentation Overview","text":"

Installation Information for installing camlhmp on your system

Available Tools A list of available typing tools utilizing camlhmp

Schema Definition Details about defining schemas for use with camlhmp

CLI Reference Details about available CLI commands from camlhmp

API Reference Details about using the camlhmp package in your own code

About Information about the development and funding of camlhmp

"},{"location":"#funding","title":"Funding","text":"

Support for this project came (in part) from the Wyoming Public Health Division, and the Center for Applied Pathogen Epidemiology and Outbreak Control (CAPE).

"},{"location":"#citing-camlhmp","title":"Citing camlhmp","text":"

If you make use of camlhmp in your analysis, please cite the following:

"},{"location":"CHANGELOG/","title":"Changelog","text":""},{"location":"CHANGELOG/#v100-rpetit3camlhmp-dromedary-202408","title":"v1.0.0 rpetit3/camlhmp \"Dromedary\" 2024/08/??","text":""},{"location":"CHANGELOG/#added","title":"Added","text":""},{"location":"CHANGELOG/#fixed","title":"Fixed","text":""},{"location":"CHANGELOG/#v031-rpetit3camlhmp-maybe-a-cat-20240805","title":"v0.3.1 rpetit3/camlhmp \"Maybe a cat?\" 2024/08/05","text":""},{"location":"CHANGELOG/#fixed_1","title":"Fixed","text":""},{"location":"CHANGELOG/#v030-rpetit3camlhmp-more-bunnies-and-fewer-baby-birds-20240805","title":"v0.3.0 rpetit3/camlhmp \"More bunnies and fewer baby birds\" 2024/08/05","text":""},{"location":"CHANGELOG/#added_1","title":"Added","text":""},{"location":"CHANGELOG/#v022-rpetit3camlhmp-even-a-few-baby-birds-20240722","title":"v0.2.2 rpetit3/camlhmp \"Even a few baby birds\" 2024/07/22","text":""},{"location":"CHANGELOG/#fixed_2","title":"Fixed","text":""},{"location":"CHANGELOG/#v021-rpetit3camlhmp-and-a-bunch-of-birds-20240722","title":"v0.2.1 rpetit3/camlhmp \"And a bunch of birds\" 2024/07/22","text":""},{"location":"CHANGELOG/#added_2","title":"Added","text":""},{"location":"CHANGELOG/#fixed_3","title":"Fixed","text":""},{"location":"CHANGELOG/#v020-rpetit3camlhmp-four-little-bunnies-20240722","title":"v0.2.0 rpetit3/camlhmp \"Four little bunnies\" 2024/07/22","text":""},{"location":"CHANGELOG/#added_3","title":"Added","text":""},{"location":"CHANGELOG/#v010-rpetit3camlhmp-little-baby-legs-20240430","title":"v0.1.0 rpetit3/camlhmp \"Little baby legs\" 2024/04/30","text":""},{"location":"CHANGELOG/#added_4","title":"Added","text":""},{"location":"CHANGELOG/#v001-rpetit3camlhmp-not-even-walking-yet-20240424","title":"v0.0.1 rpetit3/camlhmp \"Not even walking yet\" 2024/04/24","text":"

This is a development release for getting things on PyPi and Bioconda. Not expected to be stable.

"},{"location":"CHANGELOG/#added_5","title":"Added","text":""},{"location":"about/","title":"About","text":""},{"location":"about/#naming","title":"Naming","text":"

I really wanted to name a tool with \"camel\" in it because they are my wife's favorite animal\ud83d\udc2a and camels also remind me of my friends in Oman!

Once it was decided YAML was going to be the format for defining schemas, I immediately was drawn into \"Classification through YAML\", or CAML\", but quickly found out many others had also thought of this (for other use cases). We went through a few other iterations of CAML without any success. Fortunately, Tim Read came through with a clutch save suggested \"Heuristic Mapping Protocol\". So, here we are - camlhmp!

"},{"location":"about/#funding","title":"Funding","text":"

Support for this project came (in part) from the Wyoming Public Health Division, and the Center for Applied Pathogen Epidemiology and Outbreak Control (CAPE).

"},{"location":"about/#citing-camlhmp","title":"Citing camlhmp","text":"

If you make use of camlhmp in your analysis, please cite the following:

"},{"location":"available-tools/","title":"Available Tools","text":"

Below is a list of available typing tools utilizing camlhmp. Each tool is designed to analyze specific sequence data and generate a typing profile based on the schema provided.

Tip

If you've developed a typing tool that utilizes camlhmp, or know of one, we'd love to add it to this list. To do so, open an issue on the camlhmp GitHub repository

Tool Organism Description pasty Pseudomonas aeruginosa in silico serogrouping of Pseudomonas aeruginosa isolates pbptyper Streptococcus pneumoniae In silico Penicillin Binding Protein (PBP) typer for Streptococcus pneumoniae assemblies sccmec Staphylococcus aureus A tool for typing SCCmec cassettes in assemblies"},{"location":"installation/","title":"Installation","text":"

camlhmp is available through PyPI and Bioconda. While you can install it through PyPi, it is recommended to install it through BioConda so that non-Python dependencies are also installed.

conda create -n camlhmp -c conda-forge -c bioconda camlhmp\nconda activate camlhmp\ncamlhmp\n
"},{"location":"schema/","title":"Schema Reference","text":"

The schema structure is designed to be simple and intuitive. Here is a basic skeleton of the expected schema structure:

%YAML 1.2\n---\n# metadata: general information about the schema\nmetadata:\n  id: \"\"          # unique identifier for the schema\n  name: \"\"        # name of the schema\n  description: \"\" # description of the schema\n  version: \"\"     # version of the schema\n  curators: []    # A list of curators of the schema\n\n# engine: specifies the computational tools and additional parameters used for sequence\n#         analysis.\nengine:\n  type: \"\"        # The type of tool used to generate the data\n  tool: \"\"        # The tool used to generate the data\n\n# targets: Lists the specific sequence targets such as genes, proteins, or markers that the\n#          schema will analyze. These should be included in the associated sequence query data\ntargets: []\n\n# aliases: groups multiple targets under a common name for easier reference\naliases:\n  - name: \"\"     # name of the alias\n    targets: []  # list of targets that are part of the alias\n\n# types: define specific combinations of targets and aliases to form distinct types\ntypes:\n  - name: \"\"     # name of the profile\n    targets: []  # list of targets (can use aliases) that are part of the profile\n    excludes: [] # list of targets (or aliases) that will automatically fail the type\n

From this schema we have a few sections:

Within each section there are additional fields that will be descibed in the next sections.

"},{"location":"schema/#metadata","title":"metadata","text":"

The metadata section provides general information about the schema. This includes:

Field Type Description id string A unique identifier for the schema name string The name of the schema description string A brief description of the schema version string The version of the schema curators list A list of curators of the schema"},{"location":"schema/#engine","title":"engine","text":"

The engine section specifies the computational tools used for sequence analysis.

Field Type Description type string The type of engine used for analysis tool string The specific tool to be used for the engine"},{"location":"schema/#targets","title":"targets","text":"

The targets section lists the specific sequence targets such as genes, proteins, or markers that the schema will analyze. These should be included in the associated sequence query data.

Field Type Description targets list A list of targets to be analyzed"},{"location":"schema/#aliases","title":"aliases","text":"

aliases are a convenient way to group multiple targets under a common name for easier reference.

Field Type Description name string The name of the alias targets list A list of targets that are part of the alias"},{"location":"schema/#types","title":"types","text":"

The types section defines specific combinations of targets and aliases to form distinct types.

Field Type Description name string The name of the profile targets list A list of targets (or aliases) that are part of the type excludes list A list of targets (or aliases) that will automatically fail the type"},{"location":"schema/#example-schema-partial-sccmec-typing","title":"Example Schema: Partial SCCmec Typing","text":"

Here is an example of a partial schema for SCCmec typing:

%YAML 1.2\n---\n# metadata: general information about the schema\nmetadata:\n  id: \"sccmec_partial\"                                # unique identifier for the schema\n  name: \"SCCmec Typing\"                              # name of the schema\n  description: \"A partial schema for SCCmec typing\"  # description of the schema\n  version: \"0.0.1\"                                     # version of the schema\n  curators:                                          # A list of curators of the schema\n    - \"Robert Petit\"\n\n# engine: specifies the computational tools and additional parameters used for sequence\n#         analysis.\nengine:\n  type: blast   # The type of tool used to generate the data\n  tool: blastn  # The tool used to generate the data\n\n# targets: Lists the specific sequence targets such as genes, proteins, or markers that the\n#          schema will analyze. These should be included in the associated sequence query data\ntargets:\n  - \"ccrA1\"\n  - \"ccrA2\"\n  - \"ccrA3\"\n  - \"ccrB1\"\n  - \"ccrB2\"\n  - \"ccrB3\"\n  - \"IS431\"\n  - \"IS1272\"\n  - \"mecA\"\n  - \"mecI\"\n  - \"mecR1\"\n\n# aliases: groups multiple targets under a common name for easier reference\naliases:\n  - name: \"ccr Type 1\"           # name of the alias\n    targets: [\"ccrA1\", \"ccrB1\"]  # list of targets that are part of the alias\n  - name: \"ccr Type 2\"\n    targets: [\"ccrA2\", \"ccrB2\"]\n  - name: \"ccr Type 3\"\n    targets: [\"ccrA3\", \"ccrB3\"]\n  - name: \"mec Class A\"\n    targets: [\"IS431\", \"mecA\", \"mecR1\", \"mecI\"]\n  - name: \"mec Class B\"\n    targets: [\"IS431\", \"mecA\", \"mecR1\", \"IS1272\"]\n\n# types: define specific combinations of targets and aliases to form distinct types\ntypes:\n  - name: \"I\"          # name of the profile\n    targets:           # list of targets (can use aliases) that are part of the profile\n      - \"ccr Type 1\"\n      - \"mec Class B\"\n  - name: \"II\"\n    targets:\n      - \"ccr Type 2\"\n      - \"mec Class A\"\n  - name: \"III\"\n    targets:\n      - \"ccr Type 3\"\n      - \"mec Class A\"\n  - name: \"IV\"\n    targets:\n      - \"ccr Type 2\"\n      - \"mec Class B\"\n

From this schema, camlhmp can generate a typing tool that can be used to analyze input assemblies. This is only a partial schema, as there are many more SCCmec types and subtypes. But using this schema it should be straight forward to add additional targets and profiles.

"},{"location":"api/","title":"camlhmp API Reference","text":"

At it's core camlhmp is a library that provides a set of functions for typing organisms. It includes functions for running programs and parsing their outputs. In situations where the available CLI commands do not meet your needs, you can use the API functions to build your own custom workflows.

Currently the following modules are available in the camlhmp API:

Type Module Function Description Engine camlhmp.engines.blast run_blast Run BLAST program Engine camlhmp.engines.blast run_blast Alias for run_blast with blastn specified Engine camlhmp.engines.blast run_blast Alias for run_blast with tblastn specified Framework camlhmp.framework read_framework Read the framework YAML file Framework camlhmp.framework print_version Print the version of the framework Framework camlhmp.framework get_types Get the types from the framework Framework camlhmp.framework check_types Check the types against the results Framework camlhmp.framework check_regions Check the region types against the results Parser camlhmp.parsers.blast get_blast_allele_hits Parse BLAST output for allele hits Parser camlhmp.parsers.blast get_blast_region_hits Parse BLAST output for region hits Parser camlhmp.parsers.blast get_blast_target_hits Parse BLAST output for target hits Utils camlhmp.utils execute Execute a command Utils camlhmp.utils check_dependencies Check if all dependencies are installed Utils camlhmp.utils get_platform Get the platform of the executing machine Utils camlhmp.utils validate_file Validate a file exists and not empty Utils camlhmp.utils file_exists_error Determine if a file exists and raise an error Utils camlhmp.utils parse_seq Parse a sequence file containing a single record Utils camlhmp.utils parse_seqs Parse a sequence file containing a multiple records Utils camlhmp.utils parse_table Parse a delimited file Utils camlhmp.utils parse_yaml Parse a YAML file Utils camlhmp.utils write_tsv Write the dictionary to a TSV file"},{"location":"api/framework/","title":"camlhmp.framework","text":"

Below are the functions available in the camlhmp.framework module.

"},{"location":"api/framework/#camlhmp.framework.read_framework","title":"camlhmp.framework.read_framework(yamlfile)","text":"

Read the framework YAML file.

Parameters:

Name Type Description Default yamlfile str

input YAML file to be read

required

Returns:

Name Type Description dict dict

the parsed YAML file

Examples:

>>> from camlhmp.framework import read_framework\n>>> framework = read_framework(yaml_path)\n
Source code in camlhmp/framework.py
def read_framework(yamlfile: str) -> dict:\n    \"\"\"\n    Read the framework YAML file.\n\n    Args:\n        yamlfile (str): input YAML file to be read\n\n    Returns:\n        dict: the parsed YAML file\n\n    Examples:\n        >>> from camlhmp.framework import read_framework\n        >>> framework = read_framework(yaml_path)\n    \"\"\"\n    return parse_yaml(yamlfile)\n
"},{"location":"api/framework/#camlhmp.framework.print_version","title":"camlhmp.framework.print_version(framework)","text":"

Print the version of the framework, then exit

Parameters:

Name Type Description Default framework dict

the parsed YAML framework

required

Examples:

>>> from camlhmp.framework import print_version\n>>> print_version(framework)\n
Source code in camlhmp/framework.py
def print_version(framework: dict) -> None:\n    \"\"\"\n    Print the version of the framework, then exit\n\n    Args:\n        framework (dict): the parsed YAML framework\n\n    Examples:\n        >>> from camlhmp.framework import print_version\n        >>> print_version(framework)\n    \"\"\"\n    print(f\"camlhmp, version {camlhmp.__version__}\", file=sys.stderr)\n    print(f\"schema {framework['metadata']['id']}, version {framework['metadata']['version']}\", file=sys.stderr)\n    sys.exit(0)\n
"},{"location":"api/framework/#camlhmp.framework.get_types","title":"camlhmp.framework.get_types(framework)","text":"

Get the types from the framework.

Example framework: aliases: - name: \"ccr Type 2\" targets: [\"ccrA1\", \"ccrB1\"] types: - name: \"I\" targets: - \"ccr Type 1\" - \"mec Class B\"

Parameters:

Name Type Description Default framework dict

the parsed YAML framework

required

Returns:

Name Type Description dict dict

the types with associated targets

Examples:

>>> from camlhmp.framework import get_types\n>>> types = get_types(framework)\n
Source code in camlhmp/framework.py
def get_types(framework: dict) -> dict:\n    \"\"\"\n    Get the types from the framework.\n\n    Example framework:\n    aliases:\n    - name: \"ccr Type 2\"\n      targets: [\"ccrA1\", \"ccrB1\"]\n    types:\n    - name: \"I\"\n      targets:\n        - \"ccr Type 1\"\n        - \"mec Class B\"\n\n    Args:\n        framework (dict): the parsed YAML framework\n\n    Returns:\n        dict: the types with associated targets\n\n    Examples:\n        >>> from camlhmp.framework import get_types\n        >>> types = get_types(framework)\n    \"\"\"\n    types = {}\n    aliases = {}\n\n    # If aliases are present, save their targets\n    if \"aliases\" in framework:\n        for alias in framework[\"aliases\"]:\n            aliases[alias[\"name\"]] = alias[\"targets\"]\n\n    # Save the types and their targets\n    for profile in framework[\"types\"]:\n        types[profile[\"name\"]] = {\n            \"targets\": [],\n            \"excludes\": [],\n        }\n        for target in profile[\"targets\"]:\n            if target in aliases:\n                types[profile[\"name\"]][\"targets\"] = [\n                    *types[profile[\"name\"]][\"targets\"],\n                    *aliases[target],\n                ]\n            elif target in framework[\"targets\"]:\n                types[profile[\"name\"]][\"targets\"].append(target)\n            else:\n                raise ValueError(f\"Target {target} not found in framework\")\n\n        # Capture any targets that should cause a profile to fail\n        if \"excludes\" in profile:\n            for exclude in profile[\"excludes\"]:\n                if exclude in aliases:\n                    types[profile[\"name\"]][\"excludes\"] = [\n                        *types[profile[\"name\"]][\"excludes\"],\n                        *aliases[exclude],\n                    ]\n                elif exclude in framework[\"targets\"]:\n                    types[profile[\"name\"]][\"excludes\"].append(exclude)\n                else:\n                    raise ValueError(f\"Target {exclude} not found in framework\")\n\n    # Debugging information\n    logging.debug(\"camlhmp.framework.get_types\")\n    if \"aliases\" in framework:\n        logging.debug(f\"Aliases: {framework['aliases']}\")\n    logging.debug(f\"Targets: {framework['targets']}\")\n    logging.debug(f\"Types: {types}\")\n\n    return types\n
"},{"location":"api/framework/#camlhmp.framework.check_types","title":"camlhmp.framework.check_types(types, results)","text":"

Check the types against the results.

Parameters:

Name Type Description Default types dict

the types with associated targets

required results dict

the BLAST results

required

Returns:

Name Type Description dict dict

the types and their outcome

Examples:

>>> from camlhmp.framework import check_types\n>>> type_hits = check_types(types, target_results)\n
Source code in camlhmp/framework.py
def check_types(types: dict, results: dict) -> dict:\n    \"\"\"\n    Check the types against the results.\n\n    Args:\n        types (dict): the types with associated targets\n        results (dict): the BLAST results\n\n    Returns:\n        dict: the types and their outcome\n\n    Examples:\n        >>> from camlhmp.framework import check_types\n        >>> type_hits = check_types(types, target_results)\n    \"\"\"\n    type_hits = {}\n    for type, vals in types.items():\n        targets = vals[\"targets\"]\n        excludes = vals[\"excludes\"]\n        type_hits[type] = {\n            \"status\": False,\n            \"targets\": [],\n            \"missing\": [],\n            \"comment\": \"\",\n        }\n        matched_all_targets = True\n        for target in targets:\n            if results[target]:\n                type_hits[type][\"targets\"].append(target)\n            else:\n                type_hits[type][\"missing\"].append(target)\n                matched_all_targets = False\n\n        # Check if any of the excludes are present\n        for exclude in excludes:\n            if results[exclude]:\n                type_hits[type][\n                    \"comment\"\n                ] = f\"Excluded target {exclude} found, failing type {type}\"\n                logging.debug(f\"Excluded target {exclude} found, failing type {type}\")\n                matched_all_targets = False\n        type_hits[type][\"status\"] = matched_all_targets\n\n    # Debugging information\n    logging.debug(\"camlhmp.framework.check_types\")\n    logging.debug(f\"Type Hits: {type_hits}\")\n\n    return type_hits\n
"},{"location":"api/framework/#camlhmp.framework.check_regions","title":"camlhmp.framework.check_regions(types, results, min_coverage)","text":"

Check the region types against the results.

Parameters:

Name Type Description Default types dict

the types with associated targets

required results dict

the BLAST results

required min_coverage int

the minimum coverage required for a region

required

Returns:

Name Type Description dict dict

the types and their outcome

Examples:

>>> from camlhmp.framework import check_regions\n>>> type_hits = check_regions(types, target_results, min_coverage)\n
Source code in camlhmp/framework.py
def check_regions(types: dict, results: dict, min_coverage: int) -> dict:\n    \"\"\"\n    Check the region types against the results.\n\n    Args:\n        types (dict): the types with associated targets\n        results (dict): the BLAST results\n        min_coverage (int): the minimum coverage required for a region\n\n    Returns:\n        dict: the types and their outcome\n\n    Examples:\n        >>> from camlhmp.framework import check_regions\n        >>> type_hits = check_regions(types, target_results, min_coverage)\n    \"\"\"\n    type_hits = {}\n    for type, vals in types.items():\n        targets = vals[\"targets\"]\n        excludes = vals[\"excludes\"]\n        type_hits[type] = {\n            \"status\": False,\n            \"targets\": [],\n            \"missing\": [],\n            \"coverage\": [],\n            \"hits\": [],\n            \"comment\": [],\n        }\n        matched_all_targets = True\n        for target in targets:\n            if target in results:\n                if results[target][\"coverage\"] >= min_coverage:\n                    type_hits[type][\"targets\"].append(target)\n                else:\n                    type_hits[type][\"missing\"].append(target)\n                    matched_all_targets = False\n\n                type_hits[type][\"coverage\"].append(f\"{results[target]['coverage']:.2f}\")\n                type_hits[type][\"hits\"].append(str(len(results[target][\"hits\"])))\n                if len(targets) > 1:\n                    if results[target][\"comment\"]:\n                        formatted_comments = []\n                        for comment in results[target][\"comment\"]:\n                            formatted_comments.append(f\"{target}:{comment}\")\n                        if formatted_comments:\n                            type_hits[type][\"comment\"].append(\n                                \";\".join(formatted_comments)\n                            )\n                else:\n                    if results[target][\"comment\"]:\n                        type_hits[type][\"comment\"].append(\n                            \";\".join(results[target][\"comment\"])\n                        )\n            else:\n                matched_all_targets = False\n\n        # Check if any of the excludes are present\n        for exclude in excludes:\n            if results[exclude]:\n                if results[exclude][\"coverage\"] >= min_coverage:\n                    type_hits[type][\"comment\"].append(\n                        f\"Excluded target {exclude} found, failing type {type}\"\n                    )\n                    logging.debug(\n                        f\"Excluded target {exclude} found, failing type {type}\"\n                    )\n                    matched_all_targets = False\n\n        type_hits[type][\"status\"] = matched_all_targets\n\n    # Debugging information\n    logging.debug(\"camlhmp.framework.check_regions\")\n    logging.debug(f\"Type Hits: {type_hits}\")\n\n    return type_hits\n
"},{"location":"api/utils/","title":"camlhmp.utils","text":"

Below are the functions available in the camlhmp.utils module.

"},{"location":"api/utils/#camlhmp.utils.execute","title":"camlhmp.utils.execute(cmd, directory=Path.cwd(), capture=False, stdout_file=None, stderr_file=None, allow_fail=False)","text":"

A simple wrapper around executor.

Parameters:

Name Type Description Default cmd str

The command to be executed

required directory Path

The directory to execute the command in. Defaults to Path.cwd().

cwd() capture bool

Capture the output of the command. Defaults to False.

False stdout_file Path

The file to write stdout to. Defaults to None.

None stderr_file Path

The file to write stderr to. Defaults to None.

None allow_fail bool

Allow the command to fail. Defaults to False.

False

Returns:

Type Description

Union[bool, list]: True if successful, otherwise a list of stdout and stderr

Raises:

Type Description ExternalCommandFailed

If the command fails and allow_fail is True

Examples:

>>> from camlhmp.utils import execute\n>>> stdout, stderr = execute(\n        f\"{cat_type} {subject} | {engine} -query {query} -subject - -outfmt '6 {outfmt}' {qcov_hsp_perc} {perc_identity}\",\n        capture=True,\n    )\n
Source code in camlhmp/utils.py
def execute(\n    cmd,\n    directory=Path.cwd(),\n    capture=False,\n    stdout_file=None,\n    stderr_file=None,\n    allow_fail=False,\n):\n    \"\"\"\n    A simple wrapper around executor.\n\n    Args:\n        cmd (str): The command to be executed\n        directory (Path, optional): The directory to execute the command in. Defaults to Path.cwd().\n        capture (bool, optional): Capture the output of the command. Defaults to False.\n        stdout_file (Path, optional): The file to write stdout to. Defaults to None.\n        stderr_file (Path, optional): The file to write stderr to. Defaults to None.\n        allow_fail (bool, optional): Allow the command to fail. Defaults to False.\n\n    Returns:\n        Union[bool, list]: True if successful, otherwise a list of stdout and stderr\n\n    Raises:\n        ExternalCommandFailed: If the command fails and allow_fail is True\n\n    Examples:\n        >>> from camlhmp.utils import execute\n        >>> stdout, stderr = execute(\n                f\"{cat_type} {subject} | {engine} -query {query} -subject - -outfmt '6 {outfmt}' {qcov_hsp_perc} {perc_identity}\",\n                capture=True,\n            )\n    \"\"\"\n    try:\n        command = ExternalCommand(\n            cmd,\n            directory=directory,\n            capture=True,\n            capture_stderr=True,\n            stdout_file=stdout_file,\n            stderr_file=stderr_file,\n        )\n\n        command.start()\n        logging.debug(command.decoded_stdout)\n        logging.debug(command.decoded_stderr)\n\n        if capture:\n            return [command.decoded_stdout, command.decoded_stderr]\n        return True\n    except ExternalCommandFailed as e:\n        if allow_fail:\n            logging.error(e)\n            sys.exit(e.returncode)\n        else:\n            return None\n
"},{"location":"api/utils/#camlhmp.utils.check_dependencies","title":"camlhmp.utils.check_dependencies()","text":"

Check if all dependencies are installed.

Examples:

>>> from camlhmp.utils import check_dependencies\n>>> check_dependencies()\n
Source code in camlhmp/utils.py
def check_dependencies():\n    \"\"\"\n    Check if all dependencies are installed.\n\n    Examples:\n        >>> from camlhmp.utils import check_dependencies\n        >>> check_dependencies()\n    \"\"\"\n    exit_code = 0\n    print(\"Checking dependencies...\", file=sys.stderr)\n    for program in [\"blastn\"]:\n        which_path = which(program)\n        if which_path:\n            print(f\"Found {program} at {which_path}\", file=sys.stderr)\n        else:\n            print(f\"{program} not found\", file=sys.stderr)\n            exit_code = 1\n\n    if exit_code == 1:\n        print(\"Missing dependencies, please check.\", file=sys.stderr)\n    else:\n        print(\"You are all set!\", file=sys.stderr)\n    sys.exit(exit_code)\n
"},{"location":"api/utils/#camlhmp.utils.get_platform","title":"camlhmp.utils.get_platform()","text":"

Get the platform of the executing machine

Returns:

Name Type Description str str

The platform of the executing machine

Examples:

>>> from camlhmp.utils import get_platform\n>>> platform = get_platform()\n
Source code in camlhmp/utils.py
def get_platform() -> str:\n    \"\"\"\n    Get the platform of the executing machine\n\n    Returns:\n        str: The platform of the executing machine\n\n    Examples:\n        >>> from camlhmp.utils import get_platform\n        >>> platform = get_platform()\n    \"\"\"\n    if platform == \"darwin\":\n        return \"mac\"\n    elif platform == \"win32\":\n        # Windows is not supported\n        logging.error(\"Windows is not supported.\")\n        sys.exit(1)\n    return \"linux\"\n
"},{"location":"api/utils/#camlhmp.utils.validate_file","title":"camlhmp.utils.validate_file(filename)","text":"

Validate a file exists and not empty, if passing return the absolute path

Parameters:

Name Type Description Default filename str

a file to validate exists

required

Returns:

Name Type Description str str

absolute path to file

Raises:

Type Description FileNotFoundError

if the file does not exist

ValueError

if the file is empty

Examples:

>>> from camlhmp.utils import validate_file\n>>> file = validate_file(\"data.fasta\")\n
Source code in camlhmp/utils.py
def validate_file(filename: str) -> str:\n    \"\"\"\n    Validate a file exists and not empty, if passing return the absolute path\n\n    Args:\n        filename (str): a file to validate exists\n\n    Returns:\n        str: absolute path to file\n\n    Raises:\n        FileNotFoundError: if the file does not exist\n        ValueError: if the file is empty\n\n    Examples:\n        >>> from camlhmp.utils import validate_file\n        >>> file = validate_file(\"data.fasta\")\n    \"\"\"\n    f = Path(filename)\n    if not f.exists():\n        raise FileNotFoundError(f\"File ('{filename}') not found, cannot continue\")\n    elif f.stat().st_size == 0:\n        raise ValueError(f\"File ('{filename}') is empty, cannot continue\")\n    return f.absolute()\n
"},{"location":"api/utils/#camlhmp.utils.file_exists_error","title":"camlhmp.utils.file_exists_error(filename, force=False)","text":"

Determine if a file exists and raise an error if it does.

Parameters:

Name Type Description Default filename str

the file to check

required force bool

force overwrite. Defaults to False.

False

Raises:

Type Description FileExistsError

if the file exists and force is False

Source code in camlhmp/utils.py
def file_exists_error(filename: str, force: bool = False):\n    \"\"\"\n    Determine if a file exists and raise an error if it does.\n\n    Args:\n        filename (str): the file to check\n        force (bool, optional): force overwrite. Defaults to False.\n\n    Raises:\n        FileExistsError: if the file exists and force is False\n    \"\"\"\n    if Path(filename).exists() and not force:\n        raise FileExistsError(\n            f\"Results already exists! Use --force to overwrite: {filename}\"\n        )\n
"},{"location":"api/utils/#camlhmp.utils.parse_seq","title":"camlhmp.utils.parse_seq(seqfile, format)","text":"

Parse a sequence file containing a single record.

Parameters:

Name Type Description Default seqfile str

input file to be read

required format str

format of the input file

required

Returns:

Name Type Description SeqIO SeqIO

the parsed file as a SeqIO object

Examples:

>>> from camlhmp.utils import parse_seq\n>>> seq = parse_seq(\"data.fasta\", \"fasta\")\n
Source code in camlhmp/utils.py
def parse_seq(seqfile: str, format: str) -> SeqIO:\n    \"\"\"\n    Parse a sequence file containing a single record.\n\n    Args:\n        seqfile (str): input file to be read\n        format (str): format of the input file\n\n    Returns:\n        SeqIO: the parsed file as a SeqIO object\n\n    Examples:\n        >>> from camlhmp.utils import parse_seq\n        >>> seq = parse_seq(\"data.fasta\", \"fasta\")\n    \"\"\"\n    with open(seqfile, \"rt\") as fh:\n        return SeqIO.read(fh, format)\n
"},{"location":"api/utils/#camlhmp.utils.parse_seqs","title":"camlhmp.utils.parse_seqs(seqfile, format)","text":"

Parse a sequence file containing a multiple records.

Parameters:

Name Type Description Default seqfile str

input file to be read

required format str

format of the input file

required

Returns:

Name Type Description SeqIO SeqIO

the parsed file as a SeqIO object

Examples:

>>> from camlhmp.utils import parse_seqs\n>>> seqs = parse_seqs(\"data.fasta\", \"fasta\")\n
Source code in camlhmp/utils.py
def parse_seqs(seqfile: str, format: str) -> SeqIO:\n    \"\"\"\n    Parse a sequence file containing a multiple records.\n\n    Args:\n        seqfile (str): input file to be read\n        format (str): format of the input file\n\n    Returns:\n        SeqIO: the parsed file as a SeqIO object\n\n    Examples:\n        >>> from camlhmp.utils import parse_seqs\n        >>> seqs = parse_seqs(\"data.fasta\", \"fasta\")\n    \"\"\"\n    with open(seqfile, \"rt\") as fh:\n        return list(SeqIO.parse(fh, format))\n
"},{"location":"api/utils/#camlhmp.utils.parse_table","title":"camlhmp.utils.parse_table(csvfile, delimiter='\\t', has_header=True)","text":"

Parse a delimited file.

Parameters:

Name Type Description Default csvfile str

input delimited file to be parsed

required delimiter str

delimter used to separate column values. Defaults to ' '.

'\\t' has_header bool

the first line should be treated as a header. Defaults to True.

True

Returns:

Type Description Union[list, dict]

Union[list, dict]: A dict is returned if a header is present, otherwise a list is returned

Examples:

>>> from camlhmp.utils import parse_table\n>>> data = parse_table(\"data.tsv\")\n
Source code in camlhmp/utils.py
def parse_table(\n    csvfile: str, delimiter: str = \"\\t\", has_header: bool = True\n) -> Union[list, dict]:\n    \"\"\"\n    Parse a delimited file.\n\n    Args:\n        csvfile (str): input delimited file to be parsed\n        delimiter (str, optional): delimter used to separate column values. Defaults to '\\t'.\n        has_header (bool, optional): the first line should be treated as a header. Defaults to True.\n\n    Returns:\n        Union[list, dict]: A dict is returned if a header is present, otherwise a list is returned\n\n    Examples:\n        >>> from camlhmp.utils import parse_table\n        >>> data = parse_table(\"data.tsv\")\n    \"\"\"\n    data = []\n    with open(csvfile, \"rt\") as fh:\n        for row in (\n            csv.DictReader(fh, delimiter=delimiter)\n            if has_header\n            else csv.reader(fh, delimiter=delimiter)\n        ):\n            data.append(row)\n    return data\n
"},{"location":"api/utils/#camlhmp.utils.parse_yaml","title":"camlhmp.utils.parse_yaml(yamlfile)","text":"

Parse a YAML file.

Parameters:

Name Type Description Default yamlfile str

input YAML file to be read

required

Returns:

Type Description Union[list, dict]

Union[list, dict]: the values parsed from the YAML file

Examples:

>>> from camlhmp.utils import parse_yaml\n>>> data = parse_yaml(\"data.yaml\")\n
Source code in camlhmp/utils.py
def parse_yaml(yamlfile: str) -> Union[list, dict]:\n    \"\"\"\n    Parse a YAML file.\n\n    Args:\n        yamlfile (str): input YAML file to be read\n\n    Returns:\n        Union[list, dict]: the values parsed from the YAML file\n\n    Examples:\n        >>> from camlhmp.utils import parse_yaml\n        >>> data = parse_yaml(\"data.yaml\")\n    \"\"\"\n    with open(yamlfile, \"rt\") as fh:\n        return yaml.safe_load(fh)\n
"},{"location":"api/utils/#camlhmp.utils.write_tsv","title":"camlhmp.utils.write_tsv(data, output)","text":"

Write the dictionary to a TSV file.

Parameters:

Name Type Description Default data list

a list of dicts to be written

required output str

The output file

required

Examples:

>>> from camlhmp.utils import write_tsv\n>>> write_tsv(data, \"results.tsv\")\n
Source code in camlhmp/utils.py
def write_tsv(data: list, output: str):\n    \"\"\"\n    Write the dictionary to a TSV file.\n\n    Args:\n        data (list): a list of dicts to be written\n        output (str): The output file\n\n    Examples:\n        >>> from camlhmp.utils import write_tsv\n        >>> write_tsv(data, \"results.tsv\")\n    \"\"\"\n    logging.debug(f\"Writing TSV results to {output}\")\n    with open(output, \"w\") as csvfile:\n        writer = csv.DictWriter(csvfile, delimiter=\"\\t\", fieldnames=data[0].keys())\n        writer.writeheader()\n        if next(iter(data[0].values())) != \"NO_HITS\":\n            # Data is not empty\n            writer.writerows(data)\n        else:\n            # Data is empty\n            logging.debug(\"NO_HITS found, only writing the column headers\")\n
"},{"location":"api/engines/blast/","title":"camlhmp.engines.blast","text":"

Below are the functions available in the camlhmp.engines.blast module.

"},{"location":"api/engines/blast/#camlhmp.engines.blast.run_blast","title":"camlhmp.engines.blast.run_blast(engine, subject, query, min_pident, min_coverage)","text":"

Query sequences against a input subject using a specified BLAST+ algorithm.

Parameters:

Name Type Description Default engine str

The BLAST engine to use

required subject str

The subject database (input)

required query str

The query file (targets)

required min_pident float

The minimum percent identity to count a hit

required min_coverage int

The minimum percent coverage to count a hit

required

Returns:

Name Type Description list list

The parsed BLAST results, raw blast results, and stderr

Examples:

>>> from camlhmp.engines.blast import run_blast\n>>> hits, blast_stdout, blast_stderr = run_blast(\n        framework[\"engine\"][\"tool\"], input_path, targets_path, min_pident, min_coverage\n    )\n
Source code in camlhmp/engines/blast.py
def run_blast(engine: str, subject: str, query: str, min_pident: float, min_coverage: int) -> list:\n    \"\"\"\n    Query sequences against a input subject using a specified BLAST+ algorithm.\n\n    Args:\n        engine (str): The BLAST engine to use\n        subject (str): The subject database (input)\n        query (str): The query file (targets)\n        min_pident (float): The minimum percent identity to count a hit\n        min_coverage (int): The minimum percent coverage to count a hit\n\n    Returns:\n        list: The parsed BLAST results, raw blast results, and stderr\n\n    Examples:\n        >>> from camlhmp.engines.blast import run_blast\n        >>> hits, blast_stdout, blast_stderr = run_blast(\n                framework[\"engine\"][\"tool\"], input_path, targets_path, min_pident, min_coverage\n            )\n    \"\"\"\n    outfmt = \" \".join(BLASTN_COLS)\n    cat_type = \"zcat\" if str(subject).endswith(\".gz\") else \"cat\"\n    qcov_hsp_perc = f\"-qcov_hsp_perc {min_coverage}\" if min_coverage else \"\"\n    perc_identity = f\"-perc_identity {min_pident}\" if min_pident and engine != \"tblastn\" else \"\"\n    stdout, stderr = execute(\n        f\"{cat_type} {subject} | {engine} -query {query} -subject - -outfmt '6 {outfmt}' {qcov_hsp_perc} {perc_identity}\",\n        capture=True,\n    )\n\n    # Convert BLAST results to a list of dicts\n    results = []\n    target_hits = []\n    for line in stdout.split(\"\\n\"):\n        if line == \"\":\n            continue\n        cols = line.split(\"\\t\")\n        results.append(dict(zip(BLASTN_COLS, cols)))\n        target_hits.append(cols[0])\n\n    if not results:\n        # Create an empty dict if no results are found\n        results.append(dict(zip(BLASTN_COLS, [\"NO_HITS\"] * len(BLASTN_COLS))))\n\n    return [target_hits, results, stderr]\n
"},{"location":"api/engines/blast/#camlhmp.engines.blast.run_blastn","title":"camlhmp.engines.blast.run_blastn(subject, query, min_pident, min_coverage)","text":"

An alias for run_blast which uses blastn

Parameters:

Name Type Description Default subject str

The subject database (input)

required query str

The query file (targets)

required min_pident float

The minimum percent identity to count a hit

required min_coverage int

The minimum percent coverage to count a hit

required

Returns:

Name Type Description list list

The parsed BLAST results, raw blast results, and stderr

Examples:

>>> from camlhmp.engines.blast import run_blastn\n>>> hits, blast_stdout, blast_stderr = run_blastn(\n        input_path, targets_path, min_pident, min_coverage\n    )\n
Source code in camlhmp/engines/blast.py
def run_blastn(subject: str, query: str, min_pident: float, min_coverage: int) -> list:\n    \"\"\"\n    An alias for `run_blast` which uses `blastn`\n\n    Args:\n        subject (str): The subject database (input)\n        query (str): The query file (targets)\n        min_pident (float): The minimum percent identity to count a hit\n        min_coverage (int): The minimum percent coverage to count a hit\n\n    Returns:\n        list: The parsed BLAST results, raw blast results, and stderr\n\n    Examples:\n        >>> from camlhmp.engines.blast import run_blastn\n        >>> hits, blast_stdout, blast_stderr = run_blastn(\n                input_path, targets_path, min_pident, min_coverage\n            )\n    \"\"\"\n    return run_blast(\"blastn\", subject, query, min_pident, min_coverage)\n
"},{"location":"api/engines/blast/#camlhmp.engines.blast.run_tblastn","title":"camlhmp.engines.blast.run_tblastn(subject, query, min_pident, min_coverage)","text":"

An alias for run_blast which uses tblastn.

Parameters:

Name Type Description Default subject str

The subject database (input)

required query str

The query file (targets)

required min_pident float

The minimum percent identity to count a hit

required min_coverage int

The minimum percent coverage to count a hit

required

Returns:

Name Type Description list list

The parsed BLAST results, raw blast results, and stderr

Examples:

>>> from camlhmp.engines.blast import run_tblastn\n>>> hits, blast_stdout, blast_stderr = run_tblastn(\n        input_path, targets_path, min_pident, min_coverage\n    )\n
Source code in camlhmp/engines/blast.py
def run_tblastn(subject: str, query: str, min_pident: float, min_coverage: int) -> list:\n    \"\"\"\n    An alias for `run_blast` which uses `tblastn`.\n\n    Args:\n        subject (str): The subject database (input)\n        query (str): The query file (targets)\n        min_pident (float): The minimum percent identity to count a hit\n        min_coverage (int): The minimum percent coverage to count a hit\n\n    Returns:\n        list: The parsed BLAST results, raw blast results, and stderr\n\n    Examples:\n        >>> from camlhmp.engines.blast import run_tblastn\n        >>> hits, blast_stdout, blast_stderr = run_tblastn(\n                input_path, targets_path, min_pident, min_coverage\n            )\n    \"\"\"\n    return run_blast(\"tblastn\", subject, query, min_pident, min_coverage)\n
"},{"location":"api/parsers/blast/","title":"camlhmp.parsers.blast","text":"

Below are the functions available in the camlhmp.parsers.blast module.

"},{"location":"api/parsers/blast/#camlhmp.parsers.blast.get_blast_allele_hits","title":"camlhmp.parsers.blast.get_blast_allele_hits(targets, results, min_pident, min_coverage)","text":"

Find the allele hits in the BLAST results.

Parameters:

Name Type Description Default targets dict

The list of target sequences {id: len(seq)}

required results list of dict

The BLAST results

required min_pident float

The minimum percent identity to count a hit

required min_coverage int

The minimum percent coverage to count a hit

required

Returns:

Name Type Description dict dict

The allele hits

Examples:

>>> from camlhmp.parsers.blast import get_blast_allele_hits\n>>> target_results = get_blast_allele_hits(framework[\"targets\"], blast_stdout, min_pident, min_coverage)\n
Source code in camlhmp/parsers/blast.py
def get_blast_allele_hits(\n    targets: dict, results: dict, min_pident: float, min_coverage: int\n) -> dict:\n    \"\"\"\n    Find the allele hits in the BLAST results.\n\n    Args:\n        targets (dict): The list of target sequences {id: len(seq)}\n        results (list of dict): The BLAST results\n        min_pident (float): The minimum percent identity to count a hit\n        min_coverage (int): The minimum percent coverage to count a hit\n\n    Returns:\n        dict: The allele hits\n\n    Examples:\n        >>> from camlhmp.parsers.blast import get_blast_allele_hits\n        >>> target_results = get_blast_allele_hits(framework[\"targets\"], blast_stdout, min_pident, min_coverage)\n    \"\"\"\n    # Aggregate the hits for each target\n    target_results = {}\n\n    for result in results:\n        # Only process real hits\n        if result[\"qseqid\"] != \"NO_HITS\":\n            target, allele = result[\"qseqid\"].rsplit(\"_\", 1)\n            if target not in target_results:\n                target_results[target] = {\n                    \"known\": [],\n                    \"novel\": [],\n                }\n\n            # only process hits that meet minimum criteria\n            if float(result[\"pident\"]) >= min_pident and int(result[\"qcovs\"]) >= min_coverage:\n                # hits that meet requirements\n\n                # Default to \"NEW\" allele, if perfect match use the allele ID\n                final_allele = \"NEW\"\n                final_type = \"novel\"\n                if float(result[\"pident\"]) == 100 and int(result[\"qcovs\"]) == 100:\n                    final_allele = allele\n                    final_type = \"known\"\n\n                target_results[target][final_type].append({\n                        \"id\": final_allele,\n                        \"qcovs\": result[\"qcovs\"],\n                        \"pident\": float(result[\"pident\"]),\n                        \"bitscore\": result[\"bitscore\"],\n                })\n\n    final_allele_hits = {}\n    for target in targets:\n        final_allele_hits[target] = {\n            \"id\": \"-\",\n            \"qcovs\": 0,\n            \"pident\": 0,\n            \"bitscore\": 0,\n            \"comment\": \"No hits met thresholds\",\n        }\n\n    for target in target_results:\n        if len(target_results[target][\"known\"]):\n            # exact matches to known alleles were found\n            if len(target_results[target][\"known\"]) == 1:\n                final_allele_hits[target] = target_results[target][\"known\"][0]\n                final_allele_hits[target][\"comment\"] = \"\"\n            else:\n                # multiple hits\n                final_alleles = []\n                for hit in target_results[target][\"known\"]:\n                    final_alleles.append(hit[\"id\"])\n\n                final_allele_hits[target] = target_results[target][\"known\"][0]\n                final_allele_hits[target][\"id\"] = \",\".join(final_alleles)\n                final_allele_hits[target][\"comment\"] = \"Exact matches to multiple alleles\"\n        elif len(target_results[target][\"novel\"]):\n            # no exact matches to known alleles were found, but thresholds were met\n\n            # report the top scores\n            if len(target_results[target][\"novel\"]) == 1:\n                final_allele_hits[target] = target_results[target][\"novel\"][0]\n                final_allele_hits[target][\"comment\"] = \"\"\n            else:\n                # multiple hits, only report highest score\n                final_allele_hits[target] = sorted(target_results[target][\"novel\"], key=lambda x: x[\"bitscore\"], reverse=True)[0]\n                final_allele_hits[target][\"comment\"] = \"No exact matches to known alleles\"\n\n    # Debugging information\n    logging.debug(\"camlhmp.engines.blast.get_blast_allele_hits\")\n    logging.debug(f\"Allele Hits: {final_allele_hits}\")\n\n    return final_allele_hits\n
"},{"location":"api/parsers/blast/#camlhmp.parsers.blast.get_blast_region_hits","title":"camlhmp.parsers.blast.get_blast_region_hits(targets, results, min_pident, min_coverage)","text":"

Aggregate multiple target hits for a region from the BLAST results.

Parameters:

Name Type Description Default targets dict

The list of target sequences {id: len(seq)}

required results list of dict

The BLAST results

required min_pident float

The minimum percent identity to count a hit

required min_coverage int

The minimum percent coverage to count a hit

required

Returns:

Name Type Description dict dict

The target hits

Examples:

>>> from camlhmp.parsers.blast import get_blast_region_hits\n>>> target_results = get_blast_region_hits(target_lengths, blast_stdout, min_pident, min_coverage)\n
Source code in camlhmp/parsers/blast.py
def get_blast_region_hits(\n    targets: dict, results: dict, min_pident: float, min_coverage: int\n) -> dict:\n    \"\"\"\n    Aggregate multiple target hits for a region from the BLAST results.\n\n    Args:\n        targets (dict): The list of target sequences {id: len(seq)}\n        results (list of dict): The BLAST results\n        min_pident (float): The minimum percent identity to count a hit\n        min_coverage (int): The minimum percent coverage to count a hit\n\n    Returns:\n        dict: The target hits\n\n    Examples:\n        >>> from camlhmp.parsers.blast import get_blast_region_hits\n        >>> target_results = get_blast_region_hits(target_lengths, blast_stdout, min_pident, min_coverage)\n    \"\"\"\n    # Aggregate the hits for each target\n    target_results = {}\n    for target, length in targets.items():\n        target_results[target] = {\n            \"hits\": [],\n            \"coverage\": [0] * length,  # Used to calculate coverage across multiple hits\n            \"comment\": [],\n        }\n\n    # Process each blast hit\n    for result in results:\n        # Only process real hits\n        if result[\"qseqid\"] != \"NO_HITS\":\n            # Only keep hits that pass the minimum percent identity\n            if float(result[\"pident\"]) >= min_pident:\n                # Add hit to list of hits\n                target_results[result[\"qseqid\"]][\"hits\"].append(result)\n\n                # Set the coverage to 1 for each base in the hit\n                for i in range(int(result[\"qstart\"]) - 1, int(result[\"qend\"])):\n                    target_results[result[\"qseqid\"]][\"coverage\"][i] += 1\n\n    # Determine coverage for each target\n    final_results = {}\n    for target, vals in target_results.items():\n        final_results[target] = {\n            \"hits\": vals[\"hits\"],\n            \"coverage\": 100\n            * (\n                sum([1 for i in vals[\"coverage\"] if i > 0])\n                / float(len(vals[\"coverage\"]))\n            ),\n            \"comment\": [],\n        }\n        if len(vals[\"hits\"]) > 1:\n            final_results[target][\"comment\"].append(\n                f\"Coverage based on {len(vals['hits'])} hits\"\n            )\n\n        if sum([1 for i in vals[\"coverage\"] if i > 1]):\n            final_results[target][\"comment\"].append(\n                \"There were one or more overlapping hits\"\n            )\n\n    # Debugging information\n    logging.debug(\"camlhmp.engines.blast_region.get_blast_region_hits\")\n    logging.debug(f\"Profile Hits: {final_results}\")\n\n    return final_results\n
"},{"location":"api/parsers/blast/#camlhmp.parsers.blast.get_blast_target_hits","title":"camlhmp.parsers.blast.get_blast_target_hits(targets, results)","text":"

Find the target hits in the BLAST results.

Parameters:

Name Type Description Default targets list

The list of target sequences

required results dict

The BLAST results

required

Returns:

Name Type Description dict dict

The target hits

Examples:

>>> from camlhmp.parsers.blast import get_blast_target_hits\n>>> target_results = get_blast_target_hits(framework[\"targets\"], hits)\n
Source code in camlhmp/parsers/blast.py
def get_blast_target_hits(targets: list, results: dict) -> dict:\n    \"\"\"\n    Find the target hits in the BLAST results.\n\n    Args:\n        targets (list): The list of target sequences\n        results (dict): The BLAST results\n\n    Returns:\n        dict: The target hits\n\n    Examples:\n        >>> from camlhmp.parsers.blast import get_blast_target_hits\n        >>> target_results = get_blast_target_hits(framework[\"targets\"], hits)\n    \"\"\"\n    target_hits = {}\n    for target in targets:\n        target_hits[target] = False\n        if target in results:\n            target_hits[target] = True\n\n    # Debugging information\n    logging.debug(\"camlhmp.engines.blast.get_blast_target_hits\")\n    logging.debug(f\"Profile Hits: {target_hits}\")\n\n    return target_hits\n
"},{"location":"cli/","title":"camlhmp CLI Reference","text":"

camlhmp provides a set of command line interface (CLI) commands for typing organisms. These commands are designed to be easy to use and provide a simple way to type organisms using the available engines and schemas.

Currently the following commands are available in the camlhmp CLI:

Command Description camlhmp-blast-alleles Classify assemblies using BLAST against alleles of a set of genes camlhmp-blast-regions Classify assemblies using BLAST against larger genomic regions camlhmp-blast-targets Classify assemblies using BLAST against individual genes or proteins camlhmp-extract Extract typing targets from a set of reference sequences"},{"location":"cli/camlhmp-extract/","title":"camlhmp-extract","text":""},{"location":"cli/camlhmp-extract/#camlhmp-extract","title":"camlhmp-extract","text":"

camlhmp-extract is a command that allows users to extract targets from a set of references. You should think of this script as a \"helper\" script for curators. It allows you to maintain a TSV file with the targets and their positions in the reference sequences. camlhmp-extract will then extract the targets from the reference sequences and write them to a FASTA file.

"},{"location":"cli/camlhmp-extract/#usage","title":"Usage","text":"
 \ud83d\udc2a camlhmp-extract \ud83d\udc2a - Extract typing targets from a set of reference sequences\n\n\u256d\u2500 Required Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 *  --path     -i  TEXT  The path where input files are located [required]                   \u2502\n\u2502 *  --targets  -t  TEXT  A TSV of targets to extract in FASTA format [required]              \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n\u256d\u2500 Additional Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 --outdir   -o  TEXT  The path to save the extracted targets                                 \u2502\n\u2502 --verbose            Increase the verbosity of output                                       \u2502\n\u2502 --silent             Only critical errors will be printed                                   \u2502\n\u2502 --version  -V        Show the version and exit.                                             \u2502\n\u2502 --help               Show this message and exit.                                            \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n
"},{"location":"cli/blast/camlhmp-blast-alleles/","title":"camlhmp-blast-alleles","text":"

camlhmp-blast-alleles is a command that allows users to type their samples using a provided schema with BLAST algorithms. This command is useful when the schema is typing specific alleles of a gene or set of genes (e.g. MLST).

 Usage: camlhmp-blast-alleles [OPTIONS]\n\n \ud83d\udc2a camlhmp-blast-alleles \ud83d\udc2a - Classify assemblies using BLAST against alleles of\n a set of genes\n\n\u256d\u2500 Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 *  --input         -i  TEXT     Input file in FASTA format to classify         \u2502\n\u2502                                 [required]                                     \u2502\n\u2502 *  --yaml          -y  TEXT     YAML file documenting the targets and types    \u2502\n\u2502                                 [required]                                     \u2502\n\u2502 *  --targets       -t  TEXT     Query targets in FASTA format [required]       \u2502\n\u2502    --outdir        -o  PATH     Directory to write output [default: ./]        \u2502\n\u2502    --prefix        -p  TEXT     Prefix to use for output files                 \u2502\n\u2502                                 [default: camlhmp]                             \u2502\n\u2502    --min-pident        INTEGER  Minimum percent identity to count a hit        \u2502\n\u2502                                 [default: 95]                                  \u2502\n\u2502    --min-coverage      INTEGER  Minimum percent coverage to count a hit        \u2502\n\u2502                                 [default: 95]                                  \u2502\n\u2502    --force                      Overwrite existing reports                     \u2502\n\u2502    --verbose                    Increase the verbosity of output               \u2502\n\u2502    --silent                     Only critical errors will be printed           \u2502\n\u2502    --version                    Print schema and camlhmp version               \u2502\n\u2502    --help                       Show this message and exit.                    \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n
"},{"location":"cli/blast/camlhmp-blast-alleles/#example-usage","title":"Example Usage","text":"

To run camlhmp-blast-alleles, you will need a FASTA file of your input sequences, a YAML file with the schema, and a FASTA file with the targets. Below is an example of how to run camlhmp-blast-alleles using available test data.

camlhmp-blast-alleles \\\n    --yaml tests/data/blast/alleles/spn-pbptype.yaml \\\n    --targets tests/data/blast/alleles/spn-pbptype.fasta \\\n    --input tests/data/blast/alleles/SRR2912551.fna.gz\n\nRunning camlhmp with following parameters:\n    --input tests/data/blast/alleles/SRR2912551.fna.gz\n    --yaml tests/data/blast/alleles/spn-pbptype.yaml\n    --targets tests/data/blast/alleles/spn-pbptype.fasta\n    --outdir ./\n    --prefix camlhmp\n    --min-pident 95\n    --min-coverage 95\n\nStarting camlhmp for S. pneumoniae PBP typing...\nRunning tblastn...\nProcessing hits...\nFinal Results...\n                               S. pneumoniae PBP typing\n\u250f\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2513\n\u2503 \u2026 \u2503 \u2026 \u2503 \u2026 \u2503 \u2026 \u2503 \u2026 \u2503 \u2026 \u2503 \u2026 \u2503 \u2026 \u2503 \u2026 \u2503 1\u2026 \u2503 \u2026 \u2503 2\u2026 \u2503 \u2026 \u2503 2\u2026 \u2503 \u2026 \u2503 2\u2026 \u2503 \u2026 \u2503 2\u2026 \u2503 \u2026 \u2503 2\u2026 \u2503\n\u2521\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2529\n\u2502 \u2026 \u2502 \u2026 \u2502 \u2026 \u2502 \u2026 \u2502 \u2026 \u2502 \u2026 \u2502 \u2026 \u2502 \u2026 \u2502 \u2026 \u2502    \u2502 0 \u2502 1\u2026 \u2502 \u2026 \u2502 5\u2026 \u2502   \u2502 2  \u2502 \u2026 \u2502 1\u2026 \u2502 \u2026 \u2502    \u2502\n\u2514\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2518\nWriting outputs...\nFinal predicted type written to ./camlhmp.tsv\ntblastn results written to ./camlhmp.tblastn.tsv\n

Note

The table printed to STDOUT by camlhmp-blast-alleles has been purposefully truncated for viewing on the docs. It is the same information that that is in {PREFIX}.tsv.

"},{"location":"cli/blast/camlhmp-blast-alleles/#output-files","title":"Output Files","text":"

camlhmp-blast-region will generate three output files:

File Name Description {PREFIX}.tsv A tab-delimited file with the predicted type {PREFIX}.blast.tsv A tab-delimited file of all blast hits"},{"location":"cli/blast/camlhmp-blast-alleles/#prefixtsv","title":"{PREFIX}.tsv","text":"

The {PREFIX}.tsv file is a tab-delimited file with the predicted type. The columns are:

Column Description sample The sample name as determined by --prefix schema The schema used to determine the type schema_version The version of the schema used camlhmp_version The version of camlhmp used params The parameters used for the analysis {TARGET}_id The allele ID for a target hit {TARGET}_pident The percent identity of the hit {TARGET}_qcovs The percent coverage of the hit {TARGET}_bitscore The bitscore of the hit {TARGET}_comment A small comment about the hit

Below is an example of the {PREFIX}.tsv file:

sample  schema  schema_version  camlhmp_version params  1A_id   1A_pident   1A_qcovs    1A_bitscore 1A_comment  2B_id   2B_pident   2B_qcovs    2B_bitscore 2B_comment  2X_id   2X_pident   2X_qcovs    2X_bitscore 2X_comment\ncamlhmp pbptype_partial 0.0.1   0.3.1   min-coverage=95;min-pident=95   23  100.0   100 556     0   100.0   100 567     2   100.0   100 741 \n
"},{"location":"cli/blast/camlhmp-blast-alleles/#prefixblasttsv","title":"{PREFIX}.blast.tsv","text":"

The {PREFIX}.blast.tsv file is a tab-delimited file of the raw output for all blast hits. The columns are the standard BLAST output with -outfmt 6.

Here is an example of the {PREFIX}.blast.tsv file:

qseqid  sseqid  pident  qcovs   qlen    slen    length  nident  mismatch    gapopen qstart  qend    sstart  send    evalue  bitscore\n1A_0    NODE_223_length_8196_cov_21.291849  99.638  100 276 8324    276 275 1   0   1   276 1807    2634    0.0 555\n1A_1    NODE_223_length_8196_cov_21.291849  99.638  100 276 8324    276 275 1   0   1   276 1807    2634    0.0 555\n1A_2    NODE_223_length_8196_cov_21.291849  99.275  100 276 8324    276 274 2   0   1   276 1807    2634    0.0 554\n1A_3    NODE_223_length_8196_cov_21.291849  99.275  100 276 8324    276 274 2   0   1   276 1807    2634    0.0 553\n1A_4    NODE_223_length_8196_cov_21.291849  84.420  100 276 8324    276 233 43  0   1   276 1807    2634    3.91e-155   474\n1A_23   NODE_223_length_8196_cov_21.291849  100.000 100 276 8324    276 276 0   0   1   276 1807    2634    0.0 556\n2B_0    NODE_878_length_2854_cov_17.976875  100.000 100 277 2982    277 277 0   0   1   277 1218    2048    0.0 567\n2B_1    NODE_878_length_2854_cov_17.976875  87.365  100 277 2982    277 242 35  0   1   277 1218    2048    3.24e-173   501\n2B_2    NODE_878_length_2854_cov_17.976875  99.278  100 277 2982    277 275 2   0   1   277 1218    2048    0.0 563\n2B_3    NODE_878_length_2854_cov_17.976875  99.639  100 277 2982    277 276 1   0   1   277 1218    2048    0.0 565\n2B_4    NODE_878_length_2854_cov_17.976875  99.639  100 277 2982    277 276 1   0   1   277 1218    2048    0.0 565\n2X_0    NODE_210_length_5085_cov_16.539627  99.721  100 358 5213    358 357 1   0   1   358 3172    2099    0.0 740\n2X_1    NODE_210_length_5085_cov_16.539627  92.179  100 358 5213    358 330 28  0   1   358 3172    2099    0.0 688\n2X_1    NODE_878_length_2854_cov_17.976875  23.797  99  358 2982    395 94  230 17  1   353 915 2012    1.95e-06    45.8\n2X_2    NODE_210_length_5085_cov_16.539627  100.000 100 358 5213    358 358 0   0   1   358 3172    2099    0.0 741\n2X_3    NODE_210_length_5085_cov_16.539627  99.721  100 358 5213    358 357 1   0   1   358 3172    2099    0.0 739\n2X_4    NODE_210_length_5085_cov_16.539627  99.441  100 358 5213    358 356 2   0   1   358 3172    2099    0.0 738\n
"},{"location":"cli/blast/camlhmp-blast-alleles/#prefixdetailstsv","title":"{PREFIX}.details.tsv","text":"

The {PREFIX}.details.tsv file is a tab-delimited file with details for each type. This file can be useful for seeing how a sample did against all other types in a schema.

The columns in this file are:

Column Description sample The sample name as determined by --prefix type The predicted type status The status of the type (True if failed) targets The targets for the given type that had a match missing The targets for the given type that were not found coverage The coverage of the target region hits The number of hits used to calculate coverage of the target region schema The schema used to determine the type schema_version The version of the schema used camlhmp_version The version of camlhmp used params The parameters used for the analysis comment A small comment about the result

Below is an example of the {PREFIX}.details.tsv file:

sample  type    status  targets missing coverage    hits    schema  schema_version  camlhmp_version params  comment\ncamlhmp O1  False       O1  12.49   2   pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   Coverage based on 2 hits\ncamlhmp O2  False   O2  wzyB    100.00,0.00 1,0 pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   \ncamlhmp O3  False       O3  1.43    1   pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   \ncamlhmp O4  False       O4  13.86   2   pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   Coverage based on 2 hits\ncamlhmp O5  True    O2      100.00  1   pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   \n
"},{"location":"cli/blast/camlhmp-blast-regions/","title":"camlhmp-blast-regions","text":"

camlhmp-blast-regions is a command that allows users to search for full regions of interest. It is nearly identical to camlhmp-blast-targets, but instead of many smaller targets the idea is to instead look at full regions such as O-antigens and or similar features.

"},{"location":"cli/blast/camlhmp-blast-regions/#usage","title":"Usage","text":"
 Usage: camlhmp-blast-regions [OPTIONS]\n\n \ud83d\udc2a camlhmp-blast-regions \ud83d\udc2a - Classify assemblies using BLAST against larger genomic\n regions\n\n\u256d\u2500 Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 *  --input         -i  TEXT     Input file in FASTA format to classify [required]   \u2502\n\u2502 *  --yaml          -y  TEXT     YAML file documenting the targets and types         \u2502\n\u2502                                 [required]                                          \u2502\n\u2502 *  --targets       -t  TEXT     Query targets in FASTA format [required]            \u2502\n\u2502    --outdir        -o  PATH     Directory to write output [default: ./]             \u2502\n\u2502    --prefix        -p  TEXT     Prefix to use for output files [default: camlhmp]   \u2502\n\u2502    --min-pident        INTEGER  Minimum percent identity to count a hit             \u2502\n\u2502                                 [default: 95]                                       \u2502\n\u2502    --min-coverage      INTEGER  Minimum percent coverage to count a hit             \u2502\n\u2502                                 [default: 95]                                       \u2502\n\u2502    --force                      Overwrite existing reports                          \u2502\n\u2502    --verbose                    Increase the verbosity of output                    \u2502\n\u2502    --silent                     Only critical errors will be printed                \u2502\n\u2502    --version                    Print schema and camlhmp version                    \u2502\n\u2502    --help                       Show this message and exit.                         \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n
"},{"location":"cli/blast/camlhmp-blast-regions/#example-usage","title":"Example Usage","text":"

To run camlhmp-blast-regions, you will need a FASTA file of your input sequences, a YAML file with the schema, and a FASTA file with the targets. Below is an example of how to run camlhmp-blast-regions using available test data.

camlhmp-blast-regions \\\n    --yaml tests/data/blast/regions/pseudomonas-serogroup.yaml \\\n    --targets tests/data/blast/regions/pseudomonas-serogroup.fasta \\\n    --input tests/data/blast/regions/O1-GCF_000504045.fna.gz\n\nRunning camlhmp with following parameters:\n    --input tests/data/blast/regions/O1-GCF_000504045.fna.gz\n    --yaml tests/data/blast/regions/pseudomonas-serogroup.yaml\n    --targets tests/data/blast/regions/pseudomonas-serogroup.fasta\n    --outdir ./\n    --prefix camlhmp\n    --min-pident 95\n    --min-coverage 95\n\nStarting camlhmp for Pseudomonas Serogrouping...\nRunning blastn...\nProcessing hits...\nFinal Results...\n                               Pseudomonas Serogrouping\n\u250f\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2513\n\u2503 sample \u2503 type \u2503 targe\u2026 \u2503 cover\u2026 \u2503 hits \u2503 schema \u2503 schem\u2026 \u2503 camlh\u2026 \u2503 params \u2503 comme\u2026 \u2503\n\u2521\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2529\n\u2502 camlh\u2026 \u2502 O1   \u2502 O1     \u2502 100.00 \u2502 1    \u2502 pseud\u2026 \u2502 0.0.1  \u2502 0.3.1  \u2502 min-c\u2026 \u2502        \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\nWriting outputs...\nFinal predicted type written to ./camlhmp.tsv\nResults against each type written to ./camlhmp.details.tsv\nblastn results written to ./camlhmp.blastn.tsv\n

Note

The table printed to STDOUT by camlhmp-blast-regions has been purposefully truncated for viewing on the docs. It is the same information that that is in {PREFIX}.tsv.

"},{"location":"cli/blast/camlhmp-blast-regions/#output-files","title":"Output Files","text":"

camlhmp-blast-region will generate three output files:

File Name Description {PREFIX}.tsv A tab-delimited file with the predicted type {PREFIX}.blast.tsv A tab-delimited file of all blast hits {PREFIX}.details.tsv A tab-delimited file with details for each type"},{"location":"cli/blast/camlhmp-blast-regions/#prefixtsv","title":"{PREFIX}.tsv","text":"

The {PREFIX}.tsv file is a tab-delimited file with the predicted type. The columns are:

Column Description sample The sample name as determined by --prefix type The predicted type targets The targets for the given type that had a hit coverage The coverage of the target region hits The number of hits used to calculate coverage of the target region schema The schema used to determine the type schema_version The version of the schema used camlhmp_version The version of camlhmp used params The parameters used for the analysis comment A small comment about the result

Below is an example of the {PREFIX}.tsv file:

sample  type    targets coverage    hits    schema  schema_version  camlhmp_version params  comment\ncamlhmp O5  O2  100.00  1   pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   \n
"},{"location":"cli/blast/camlhmp-blast-regions/#prefixblasttsv","title":"{PREFIX}.blast.tsv","text":"

The {PREFIX}.blast.tsv file is a tab-delimited file of the raw output for all blast hits. The columns are the standard BLAST output with -outfmt 6.

Here is an example of the {PREFIX}.blast.tsv file:

qseqid  sseqid  pident  qcovs   qlen    slen    length  nident  mismatch    gapopen qstart  qend    sstart  send    evalue  bitscore\nwzyB    NZ_PSQS01000003.1   88.403  99  1140    6935329 595 526 69  0   545 1139    6874509 6875103 0.0 717\nwzyB    NZ_PSQS01000003.1   88.403  99  1140    6935329 595 526 69  0   545 1139    6920911 6921505 0.0 717\nwzyB    NZ_PSQS01000003.1   89.444  99  1140    6935329 540 483 56  1   1   539 6872864 6873403 0.0 680\nwzyB    NZ_PSQS01000003.1   89.444  99  1140    6935329 540 483 56  1   1   539 6919266 6919805 0.0 680\nO1  NZ_PSQS01000003.1   97.972  12  18368   6935329 1972    1932    38  2   16398   18368   6620589 6618619 0.0 3419\nO1  NZ_PSQS01000003.1   96.296  12  18368   6935329 324 312 11  1   1   323 6641914 6641591 1.68e-149   531\nO2  NZ_PSQS01000003.1   99.841  100 23303   6935329 23303   23266   30  1   1   23303   6618619 6641914 0.0 42821\nO2  NZ_PSQS01000003.1   86.935  100 23303   6935329 1240    1078    130 12  2542    3749    3864567 3863328 0.0 1363\nO3  NZ_PSQS01000003.1   94.442  13  20210   6935329 2393    2260    114 15  1   2386    6618619 6620999 0.0 3664\nO3  NZ_PSQS01000003.1   99.308  13  20210   6935329 289 287 2   0   19922   20210   6641626 6641914 3.09e-147   523\nO4  NZ_PSQS01000003.1   97.448  14  15279   6935329 1842    1795    47  0   1   1842    6618619 6620460 0.0 3142\nO4  NZ_PSQS01000003.1   99.638  14  15279   6935329 276 275 1   0   15004   15279   6641639 6641914 8.46e-142   505\n
"},{"location":"cli/blast/camlhmp-blast-regions/#prefixdetailstsv","title":"{PREFIX}.details.tsv","text":"

The {PREFIX}.details.tsv file is a tab-delimited file with details for each type. This file can be useful for seeing how a sample did against all other types in a schema.

The columns in this file are:

Column Description sample The sample name as determined by --prefix type The predicted type status The status of the type (True if failed) targets The targets for the given type that had a match missing The targets for the given type that were not found coverage The coverage of the target region hits The number of hits used to calculate coverage of the target region schema The schema used to determine the type schema_version The version of the schema used camlhmp_version The version of camlhmp used params The parameters used for the analysis comment A small comment about the result

Below is an example of the {PREFIX}.details.tsv file:

sample  type    status  targets missing coverage    hits    schema  schema_version  camlhmp_version params  comment\ncamlhmp O1  False       O1  12.49   2   pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   Coverage based on 2 hits\ncamlhmp O2  False   O2  wzyB    100.00,0.00 1,0 pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   \ncamlhmp O3  False       O3  1.43    1   pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   \ncamlhmp O4  False       O4  13.86   2   pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   Coverage based on 2 hits\ncamlhmp O5  True    O2      100.00  1   pseudomonas_serogroup_partial   0.0.1   0.2.1   min-coverage=95;min-pident=95   \n
"},{"location":"cli/blast/camlhmp-blast-regions/#example-implementation","title":"Example Implementation","text":"

If you would like to see how camlhmp-blast-regions can be used, please see pasty. In pasty the schema is set up to directly use camlhmp-blast-regions to classify samples without any extra logic.

This allows for a simple wrapper like the following:

#!/usr/bin/env bash\npasty_dir=$(dirname $0)\n\nCAML_YAML=\"${pasty_dir}/../data/pa-osa.yaml\" \\\nCAML_TARGETS=\"${pasty_dir}/../data/pa-osa.fasta\" \\\n    camlhmp-blast-regions \\\n    \"${@:1}\"\n

This script will run camlhmp-blast-regions with the pasty schema and targets.

"},{"location":"cli/blast/camlhmp-blast-targets/","title":"camlhmp-blast-targets","text":"

camlhmp-blast-targets is a command that allows users to type their samples using a provided schema with BLAST algorithms. This command is useful when a schema is looking at full length genes or proteins.

"},{"location":"cli/blast/camlhmp-blast-targets/#usage","title":"Usage","text":"
 Usage: camlhmp-blast-targets [OPTIONS]\n\n \ud83d\udc2a camlhmp-blast-targets \ud83d\udc2a - Classify assemblies using BLAST against individual\n genes or proteins\n\n\u256d\u2500 Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 *  --input         -i  TEXT     Input file in FASTA format to classify [required]   \u2502\n\u2502 *  --yaml          -y  TEXT     YAML file documenting the targets and types         \u2502\n\u2502                                 [required]                                          \u2502\n\u2502 *  --targets       -t  TEXT     Query targets in FASTA format [required]            \u2502\n\u2502    --outdir        -o  PATH     Directory to write output [default: ./]             \u2502\n\u2502    --prefix        -p  TEXT     Prefix to use for output files [default: camlhmp]   \u2502\n\u2502    --min-pident        INTEGER  Minimum percent identity to count a hit             \u2502\n\u2502                                 [default: 95]                                       \u2502\n\u2502    --min-coverage      INTEGER  Minimum percent coverage to count a hit             \u2502\n\u2502                                 [default: 95]                                       \u2502\n\u2502    --force                      Overwrite existing reports                          \u2502\n\u2502    --verbose                    Increase the verbosity of output                    \u2502\n\u2502    --silent                     Only critical errors will be printed                \u2502\n\u2502    --version                    Print schema and camlhmp version                    \u2502\n\u2502    --help                       Show this message and exit.                         \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n
"},{"location":"cli/blast/camlhmp-blast-targets/#example-usage","title":"Example Usage","text":"

To run camlhmp-blast-targets, you will need a FASTA file of your input sequences, a YAML file with the schema, and a FASTA file with the targets. Below is an example of how to run camlhmp-blast-targets using available test data.

camlhmp-blast-targets \\\n    --yaml tests/data/blast/targets/sccmec-partial.yaml \\\n    --targets tests/data/blast/targets/sccmec-partial.fasta \\\n    --input tests/data/blast/targets/sccmec-i.fasta\n\nRunning camlhmp with following parameters:\n    --input tests/data/blast/targets/sccmec-i.fasta\n    --yaml tests/data/blast/targets/sccmec-partial.yaml\n    --targets tests/data/blast/targets/sccmec-partial.fasta\n    --outdir ./\n    --prefix camlhmp\n    --min-pident 95\n    --min-coverage 95\n\nStarting camlhmp for SCCmec Typing...\nRunning blastn...\nProcessing hits...\nFinal Results...\n                                     SCCmec Typing\n\u250f\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2513\n\u2503 sample  \u2503 type \u2503 targets   \u2503 schema    \u2503 schema_v\u2026 \u2503 camlhmp\u2026 \u2503 params    \u2503 comment \u2503\n\u2521\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2529\n\u2502 camlhmp \u2502 I    \u2502 ccrA1,cc\u2026 \u2502 sccmec_p\u2026 \u2502 0.0.1     \u2502 0.3.1    \u2502 min-cove\u2026 \u2502         \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\nWriting outputs...\nFinal predicted type written to ./camlhmp.tsv\nResults against each type written to ./camlhmp.details.tsv\nblastn results written to ./camlhmp.blastn.tsv\n

Note

The table printed to STDOUT by camlhmp-blast-targets has been purposefully truncated for viewing on the docs. It is the same information that that is in {PREFIX}.tsv.

"},{"location":"cli/blast/camlhmp-blast-targets/#output-files","title":"Output Files","text":"

camlhmp-blast-targets will generate three output files:

File Name Description {PREFIX}.tsv A tab-delimited file with the predicted type {PREFIX}.blast.tsv A tab-delimited file of all blast hits {PREFIX}.details.tsv A tab-delimited file with details for each type"},{"location":"cli/blast/camlhmp-blast-targets/#prefixtsv","title":"{PREFIX}.tsv","text":"

The {PREFIX}.tsv file is a tab-delimited file with the predicted type. The columns are:

Column Description sample The sample name as determined by --prefix type The predicted type targets The targets for the given type that had a hit schema The schema used to determine the type schema_version The version of the schema used camlhmp_version The version of camlhmp used params The parameters used for the analysis comment A small comment about the result

Below is an example of the {PREFIX}.tsv file:

sample  type    targets schema  schema_version  camlhmp_version params  comment\ncamlhmp I   ccrA1,ccrB1,IS431,IS1272,mecA,mecR1 sccmec_partial  0.0.1   0.2.1   min-coverage=95;min-pident=95   \n
"},{"location":"cli/blast/camlhmp-blast-targets/#prefixblasttsv","title":"{PREFIX}.blast.tsv","text":"

The {PREFIX}.blast.tsv file is a tab-delimited file of the raw output for all blast hits. The columns are the standard BLAST output with -outfmt 6.

Here is an example of the {PREFIX}.blast.tsv file:

qseqid  sseqid  pident  qcovs   qlen    slen    length  nident  mismatch    gapopen qstart  qend    sstart  send    evalue  bitscore\nccrA1   AB033763.2  100.000 100 1350    39332   1350    1350    0   0   1   1350    23692   25041   0.0 2494\nccrB1   AB033763.2  100.000 100 1152    39332   1152    1152    0   0   1   1152    25063   26214   0.0 2128\nIS1272  AB033763.2  100.000 100 1659    39332   1659    1659    0   0   1   1659    28423   30081   0.0 3064\nmecR1   AB033763.2  100.000 100 987 39332   987 987 0   0   1   987 30304   31290   0.0 1823\nmecA    AB033763.2  99.950  100 2007    39332   2007    2006    1   0   1   2007    31390   33396   0.0 3701\nmecA    AB033763.2  99.950  100 2007    39332   2007    2006    1   0   1   2007    31390   33396   0.0 3701\nIS431   AB033763.2  99.873  100 790 39332   790 789 1   0   1   790 35958   36747   0.0 1454\nIS431   AB033763.2  100.000 100 792 39332   792 792 0   0   1   792 35957   36748   0.0 1463\n
"},{"location":"cli/blast/camlhmp-blast-targets/#prefixdetailstsv","title":"{PREFIX}.details.tsv","text":"

The {PREFIX}.details.tsv file is a tab-delimited file with details for each type. This file can be useful for seeing how a sample did against all other types in a schema.

The columns in this file are:

Column Description sample The sample name as determined by --prefix type The predicted type status The status of the type (True if failed) targets The targets for the given type that had a match missing The targets for the given type that were not found schema The schema used to determine the type schema_version The version of the schema used camlhmp_version The version of camlhmp used params The parameters used for the analysis comment A small comment about the result

Below is an example of the {PREFIX}.details.tsv file:

sample  type    status  targets missing schema  schema_version  camlhmp_version params  comment\ncamlhmp I   True    ccrA1,ccrB1,IS431,mecA,mecR1,IS1272     sccmec_partial  0.0.1   0.2.1   min-coverage=95;min-pident=95   \ncamlhmp II  False   IS431,mecA,mecR1    ccrA2,ccrB2,mecI    sccmec_partial  0.0.1   0.2.1   min-coverage=95;min-pident=95   \ncamlhmp III False   IS431,mecA,mecR1    ccrA3,ccrB3,mecI    sccmec_partial  0.0.1   0.2.1   min-coverage=95;min-pident=95   \ncamlhmp IV  False   IS431,mecA,mecR1,IS1272 ccrA2,ccrB2 sccmec_partial  0.0.1   0.2.1   min-coverage=95;min-pident=95   \n
"},{"location":"cli/blast/camlhmp-blast-targets/#example-implementation","title":"Example Implementation","text":"

If you would like to see how camlhmp-blast-targets can be used, please see sccmec. In sccmec the schema is set up to directly use camlhmp-blast-targets to classify samples without any extra logic.

This allows for a simple wrapper like the following:

#!/usr/bin/env bash\nsccmec_dir=$(dirname $0)\n\nCAML_YAML=\"${sccmec_dir}/../data/sccmec.yaml\" \\\nCAML_TARGETS=\"${sccmec_dir}/../data/sccmec.fasta\" \\\n    camlhmp-blast-targets \\\n    \"${@:1}\"\n

This script will run camlhmp-blast-targets with the sccmec schema and targets.

"}]} \ No newline at end of file