Pynoma is an API developed to facilitate the access to human variant data from gnomAD database, working both with gnomAD version 2 and 3. The package retrieves both regular as well as clinical data, and offers support to four kinds of different searches, as well as the possibility to search in batches. For plotting the data, please take a look at the BIOVARS package. If you have scientific interests or want to use our package in formal reports, we kindly ask you to cite us in your publication: Carneiro, P., Colombelli, F., Recamonde-Mendoza, M., and Matte, U. (2022). Pynoma, PyABraOM and BIOVARS: Towards genetic variant data acquisition and integration. bioRxiv.
Currently there is not a PyPI version for the Pynoma API, so the installation needs that you clone this repository and install it as local package.
$ git clone https://github.com/bioinfo-hcpa/pynoma.git
$ pip install -e pynoma
There are 4 kinds of different searches supported by pynoma. Gene, transcript and region searches return the same kind of output, while variant searches build a different dataframe format.
Since gnomAD have two versions, the user must specify which one is to be used by choosing an integer value of either 2 or 3. Additionally, when searching for genes, transcripts or a region in chromosomes X or Y, a new column "Number of Hemizygotes" will be added to the outputted dataframe, so the user should have caution when performing pandas concatenation operations or batch searchings that could potentially mix both kinds of dataframe, resulting in table cells with NaN values.
GeneSearch(gnomad_version: int, gene: str)
.get_data(standard=True, additional_population_info=False)
from pynoma import GeneSearch
gs = GeneSearch(3, "IDUA")
df, clinical_df = gs.get_data()
TranscriptSearch(gnomad_version: int, transcript_id: str)
.get_data(standard=True, additional_population_info=False)
from pynoma import TranscriptSearch
ts = TranscriptSearch(3, "ENST00000247933")
df, clinical_df = ts.get_data()
RegionSearch(gnomad_version: int, chromosome, region_start: int, region_end: int)
.get_data(standard=True, additional_population_info=False)
from pynoma import RegionSearch
rs = RegionSearch(3, 4, 1002741, 1002771)
df, clinical_df = rs.get_data()
VariantSearch(gnomad_version: int, variant_id: str)
.get_data(raw=False)
from pynoma import VariantSearch
vs = VariantSearch(3, '4-1002747-G-A')
df, meta = vs.get_data()
If the user wants to configure multiple searches, including different ones (gene, transcript, region) with the exception of variant searches (that have different dataframe formats), they can use the batch search function. For example, to search for variants in a list of 5 genes, let's say ACE2, BRCA, ID4, MTOR and EMP1, the batch search can be used as follows:
from pynoma import helper, GeneSearch
genes = [GeneSearch(3, "ACE2"), GeneSearch(3, "BRCA"), GeneSearch(3, "ID4"), GeneSearch(3, "MTOR"), GeneSearch(3, "EMP1")]
df = helper.batch_search(genes, standard=True, additional_population_info=False, verbose=True)
Besides the list of Search objects, the other parameters (standard, additional_population_info and verbose) follow the same logic of the individual searches.
@article {Carneiro2022.06.07.495190,
author = {Carneiro, Paola and Colombelli, Felipe and Recamonde-Mendoza, Mariana and Matte, Ursula},
title = {Pynoma, PyABraOM and BIOVARS: Towards genetic variant data acquisition and integration},
elocation-id = {2022.06.07.495190},
year = {2022},
doi = {10.1101/2022.06.07.495190},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2022/06/09/2022.06.07.495190},
eprint = {https://www.biorxiv.org/content/early/2022/06/09/2022.06.07.495190.full.pdf},
journal = {bioRxiv}
}
This research was supported by the National Council for Scientific and Technological Development (CNPq) and the Research Incentive Fund (FIPE) from Hospital de Clínicas de Porto Alegre.