The goal of labtools is to provide a set of tools to help facilitate the handling of data that is commonly used in analytical chemistry laboratories, for example extracting metadata for chemicals from Pubchem and then building and viewing structure databases, exporting databases for MS-FINDER and MS-DIAL, compiling 2 dimensional GC QTOF/MS results exported from Canvas. It is still under active development. In the future, more functions will be incorporated based on the need of our lab.
- 2023.08.28 Version 0.0.3.0000: Deposite functions to extract chemical metadata from Pubchem previouly incorporated in the fcmsafety package GitHub and update them accordingly. Further updates will be done in this package while those in thefcmsafety package will remain still.
- 2023.08.29 Version 0.0.4.0000: Add export4msdial function.
- 2023.08.30 Version 0.0.5.0000: Add navigate_chem function.
- 2023.09.21 Version 0.0.5.0002: Fix extract_classifier bug.
- 2023.11.24 Version 0.0.5.0003: Improve navigate_chem function.
- 2023.11.28 Version 0.0.6.0000: Add function to process gcms data from MS-DIAL.
- 2024.03.12 Version 0.0.7.0000: Add function to assign semi-quantification standards for target compounds.
- 2024.04.21 Version 0.1.01: Add function to filter msp file.
- 2024.07.06 Version 0.2.00: Add functions for semiquantification analysis.
- 2024.09.07 Version 0.2.01: Fix a bug in extract_cid and allow to define timeout for Pubchem API.
You can install the development version of labtools from GitHub with:
# install.packages("devtools")
devtools::install_github("QizhiSu/labtools")
If you do not have the “rio” and “dplyr” package installed, please install it first. If you do, just skip this step.
install.packages("rio")
install.packages("dplyr")
library(fcmsafety)
library(dplyr)
Please name your file in English but not Chinese (Chinese letters are not well supported in some functions). There should be at least one column in your data containing either chemical names in English, CAS number, or InChIKey. The program will use any or all of these columns to retrieve meta data from Pubchem. A good chemical name is always favorable.
# Please enter the path of your data, e.g., "D:/my data/mydata.xlsx".
data <- rio::import("D:/my data/mydata.xlsx")
Please specify which column contains CAS number by the “cas_col” argument, InChIKey by “inchikey_col”, and chemical name by the “name_col” argument. You can also specify all these arguments. In this case, it will first use InChIKey, and then CAS and Name. To get flavornet information, cas = TRUE is required. Depends on the size of your data, it might take long time.
# No CAS
data <- data %>% extract_cid(name_col = 1) %>% extract_meta()
# With CAS
data <- data %>% extract_cid(name_col = 1) %>% extract_meta(cas = TRUE)
# With flavornet
data <- data %>% extract_cid(name_col = 1) %>% extract_meta(cas = TRUE, flavonet = TRUE)
If you have some compounds that are not present in Pubchem, for example
many oligomers found in food contact materials. there will be no SMILES
retrieved for these compounds using extract_meta
. In this case, you
can use assign_meta
function. However, a *.txt file containing Name
and SMILES of these compounds is required. There are two options to
prepare this text file. One is to prepare it manually, the column names
must be Name and SMILES, respectively (case-insensitive). Another one is
to prepare *.MOL files (case-insensitive) of these molecules and extract
SMILES using combine_mol2sdf()
and extract_structure()
functions
from the mspcompiler
package
mspcompiler. Note that the
name in your *.txt file or *.MOL files have to be consistent with the
one you have in your data as Name is used for matching. Assuming you
have all your *MOL files in the “D:/my data” folder, then you can
follow these steps:
# If you have not install the mspcompiler package, please install it following
# the instruction in its Github homepage https://github.com/QizhiSu/mspcompiler.
# Once you have it installed
library(mspcompiler)
# This function combines all *.MOL files in yhe providing folder into a single
# *.sdf file from which will be used to extract SMILES.
combine_mol2sdf(input = "D:/my data",
output = "D:/my data/mydata.sdf",
use_filename = TRUE)
# The input here is the output from the last command and it will generate a *.txt
# file containing Name and SMILES.
extract_structure(input = "D:/my data/mydata.sdf",
output = "D:/my data/mydata.txt")
data <- data %>% assign_meta(meta_file = "D:/my data/mydata.txt")
After extracting meta data from Pubchem by extract_meta(), which means the data will has a column named InChIkey, then we can get the chemical structure classification done by classyfire.
data <- data %>% extract_classyfire()
To combine Canvas data, we first need to manually analyze GC x GC data in Canvas, mark down peak of interest, and then export marked data in .txt format. All .txt files should be put into a folder as the function will read all .txt in the same folder and then combine them into a single table by matching the chemical name. It will evaluate the retention index and second dimensional retention time of a same compound across them samples. If the differences are bigger than the defined tolerance, it will tell you which samples have significantly different RI or 2D RT, such that you can carefully check where the inconsistencies locate.
library(labtools)
data_path <- 'C:/data'
data <- read_canvas(data_path,
ri_align_tolerance = 5,
rt_2d_tolerance = 0.05,
keep = 'area')
# to understand each argument, you can use the following code
?read_canvas
To export database for MS-FINDER, we first need to extract metadata
using extract_cid
and extract_meta
and then write it out into a .txt
file.
library(labtools)
export4msfinder(data, "c:/data/structure_database_for_msfinder.txt")
# to understand each argument, you can use the following code
?export4msfinder
To export database for MS-DIAL, we first need to extract metadata using
extract_cid
and extract_meta
and then write it out into a .txt file.
The function will then add most commonly used adducts based on the
polarity provided, and calculate the exactmass for each adducts. This
file can be used for post-identification in MS-DIAL.
library(labtools)
export4msdial(data, polarity = "pos", "c:/data/structure_database_for_msfinder.txt")
# to understand each argument, you can use the following code
?export4msdial
library(labtools)
# Your data must contain at least the SMILES column
navigate_chem(data)
library(labtools)
std_md <- read.csv("standards_data.csv")
data_md <- read.csv("substances_data.csv")
result <- select_std(std_md, 1, 2, data_md, 2, 10)
# to beging with, please install the mspcompiler package first
remotes::install_github("QizhiSu/mspcompiler")
library(labtools)
# The input file should be in msp format
# given you have a list of compounds to be filtered,
# you can use the following code to filter the msp file.
my_data <- rio::import("my_data.xlsx")
my_data <- my_data %>% extract_cid(name_col = 1, cas_col = 2) %>% extract_meta(cas = TRUE)
# and you have the NIST library
nist <- "C:/data/NIST.msp"
# then you can use the following code to filter the msp file
filter_msp(msp = nist, keep_napd8 = TRUE, cmp_list = my_data, output = "filtered_nist.msp")