Skip to content

This package aims to provide a set of tools to help facilitate the handling of laboratory data. At the moment, there is only one function to help read and align compound lists of different samples. It is still under active development. In the future, more functions will be incorporated based on the need of our lab.

License

Notifications You must be signed in to change notification settings

QizhiSu/labtools

Repository files navigation

labtools

The goal of labtools is to provide a set of tools to help facilitate the handling of data that is commonly used in analytical chemistry laboratories, for example extracting metadata for chemicals from Pubchem and then building and viewing structure databases, exporting databases for MS-FINDER and MS-DIAL, compiling 2 dimensional GC QTOF/MS results exported from Canvas. It is still under active development. In the future, more functions will be incorporated based on the need of our lab.

Release notes

  1. 2023.08.28 Version 0.0.3.0000: Deposite functions to extract chemical metadata from Pubchem previouly incorporated in the fcmsafety package GitHub and update them accordingly. Further updates will be done in this package while those in thefcmsafety package will remain still.
  2. 2023.08.29 Version 0.0.4.0000: Add export4msdial function.
  3. 2023.08.30 Version 0.0.5.0000: Add navigate_chem function.
  4. 2023.09.21 Version 0.0.5.0002: Fix extract_classifier bug.
  5. 2023.11.24 Version 0.0.5.0003: Improve navigate_chem function.
  6. 2023.11.28 Version 0.0.6.0000: Add function to process gcms data from MS-DIAL.
  7. 2024.03.12 Version 0.0.7.0000: Add function to assign semi-quantification standards for target compounds.
  8. 2024.04.21 Version 0.1.01: Add function to filter msp file.
  9. 2024.07.06 Version 0.2.00: Add functions for semiquantification analysis.
  10. 2024.09.07 Version 0.2.01: Fix a bug in extract_cid and allow to define timeout for Pubchem API.

Installation

You can install the development version of labtools from GitHub with:

# install.packages("devtools")
  devtools::install_github("QizhiSu/labtools")

Extract chemical information from Pubchem

Read in your data

If you do not have the “rio” and “dplyr” package installed, please install it first. If you do, just skip this step.

install.packages("rio")
install.packages("dplyr")

Load required packages

library(fcmsafety)
library(dplyr)

Please name your file in English but not Chinese (Chinese letters are not well supported in some functions). There should be at least one column in your data containing either chemical names in English, CAS number, or InChIKey. The program will use any or all of these columns to retrieve meta data from Pubchem. A good chemical name is always favorable.

# Please enter the path of your data, e.g., "D:/my data/mydata.xlsx".
data <- rio::import("D:/my data/mydata.xlsx")

Extract cid and meta data

Please specify which column contains CAS number by the “cas_col” argument, InChIKey by “inchikey_col”, and chemical name by the “name_col” argument. You can also specify all these arguments. In this case, it will first use InChIKey, and then CAS and Name. To get flavornet information, cas = TRUE is required. Depends on the size of your data, it might take long time.

# No CAS
data <- data %>% extract_cid(name_col = 1) %>% extract_meta()
# With CAS
data <- data %>% extract_cid(name_col = 1) %>% extract_meta(cas = TRUE)
# With flavornet
data <- data %>% extract_cid(name_col = 1) %>% extract_meta(cas = TRUE, flavonet = TRUE)

Assign meta data for chemicals outside Pubchem

If you have some compounds that are not present in Pubchem, for example many oligomers found in food contact materials. there will be no SMILES retrieved for these compounds using extract_meta. In this case, you can use assign_meta function. However, a *.txt file containing Name and SMILES of these compounds is required. There are two options to prepare this text file. One is to prepare it manually, the column names must be Name and SMILES, respectively (case-insensitive). Another one is to prepare *.MOL files (case-insensitive) of these molecules and extract SMILES using combine_mol2sdf() and extract_structure() functions from the mspcompiler package mspcompiler. Note that the name in your *.txt file or *.MOL files have to be consistent with the one you have in your data as Name is used for matching. Assuming you have all your *MOL files in the “D:/my data” folder, then you can follow these steps:

# If you have not install the mspcompiler package, please install it following
# the instruction in its Github homepage https://github.com/QizhiSu/mspcompiler.
# Once you have it installed
library(mspcompiler)

# This function combines all *.MOL files in yhe providing folder into a single
# *.sdf file from which will be used to extract SMILES.
combine_mol2sdf(input = "D:/my data",
                output = "D:/my data/mydata.sdf",
                use_filename = TRUE)
# The input here is the output from the last command and it will generate a *.txt
# file containing Name and SMILES.
extract_structure(input = "D:/my data/mydata.sdf",
                  output = "D:/my data/mydata.txt")
data <- data %>% assign_meta(meta_file = "D:/my data/mydata.txt")

Extract classyfire information

After extracting meta data from Pubchem by extract_meta(), which means the data will has a column named InChIkey, then we can get the chemical structure classification done by classyfire.

data <- data %>% extract_classyfire()

Read and combine Canvas data

To combine Canvas data, we first need to manually analyze GC x GC data in Canvas, mark down peak of interest, and then export marked data in .txt format. All .txt files should be put into a folder as the function will read all .txt in the same folder and then combine them into a single table by matching the chemical name. It will evaluate the retention index and second dimensional retention time of a same compound across them samples. If the differences are bigger than the defined tolerance, it will tell you which samples have significantly different RI or 2D RT, such that you can carefully check where the inconsistencies locate.

library(labtools)

data_path <- 'C:/data'
data <- read_canvas(data_path,
                    ri_align_tolerance = 5,
                    rt_2d_tolerance = 0.05,
                    keep = 'area')

# to understand each argument, you can use the following code
?read_canvas

Convert database for MS-FINDER

To export database for MS-FINDER, we first need to extract metadata using extract_cid and extract_meta and then write it out into a .txt file.

library(labtools)

export4msfinder(data, "c:/data/structure_database_for_msfinder.txt")

# to understand each argument, you can use the following code
?export4msfinder

Convert database for MS-DIAL

To export database for MS-DIAL, we first need to extract metadata using extract_cid and extract_meta and then write it out into a .txt file. The function will then add most commonly used adducts based on the polarity provided, and calculate the exactmass for each adducts. This file can be used for post-identification in MS-DIAL.

library(labtools)

export4msdial(data, polarity = "pos", "c:/data/structure_database_for_msfinder.txt")

# to understand each argument, you can use the following code
?export4msdial

Navigat through a chemical table and to view chemical structure in a shinyApp

library(labtools)
# Your data must contain at least the SMILES column
navigate_chem(data)

Assign semi-quantification standards for target compounds

library(labtools)

std_md <- read.csv("standards_data.csv")
data_md <- read.csv("substances_data.csv")
result <- select_std(std_md, 1, 2, data_md, 2, 10)

Filter msp file

# to beging with, please install the mspcompiler package first
remotes::install_github("QizhiSu/mspcompiler")

library(labtools)

# The input file should be in msp format
# given you have a list of compounds to be filtered,
# you can use the following code to filter the msp file.
my_data <- rio::import("my_data.xlsx")
my_data <- my_data %>% extract_cid(name_col = 1, cas_col = 2) %>% extract_meta(cas = TRUE)

# and you have the NIST library
nist <- "C:/data/NIST.msp"

# then you can use the following code to filter the msp file
filter_msp(msp = nist, keep_napd8 = TRUE, cmp_list = my_data, output = "filtered_nist.msp")

About

This package aims to provide a set of tools to help facilitate the handling of laboratory data. At the moment, there is only one function to help read and align compound lists of different samples. It is still under active development. In the future, more functions will be incorporated based on the need of our lab.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages