This repo contains default code for data processing assays and QC reports. It is not intended to be a starting point for new packages, as it does not contain some of the necessary files and directories for packages. Use DataPackageR::datapackage_skeleton()
for that. However, it does contain standard data processing functions, standard QC report formatting functions, and standard processing code for BAMA, ELISA, NAb, and ICS.
PK code can be found in the Nussenzweig PK packages. The data and data processing have not been standardized enough yet to publish here.
Note that this code is still configured for datasets.R-based package building. It will need to be updated to handle the YAML-based package building.
- datasets.R is the old driver program.
- functions.R contains standard data-processing functions. This is sourced at the top of each preprocess_*.Rmd script.
- qc_functions.R contains standard formatting functions for QC reports. This is sourced at the top of each preprocess_*.Rmd script.
- protocol_specific.R contains protocol-specific objects, definitions, functions, etc. This is sourced at the top of each preprocess_*.Rmd script.
- reproducibility.R contains code for reproducibility tables. This is sourced at the end of each preprocess_*.Rmd script.
- preprocess_*.Rmd contains code for processing each respective assay
Note that metadata does not arrive as systematically as assay data. The VISC project managers are trying to streamline this process, but be aware that when you start processing assay data, you might have to ping the project managers to find treatment assignment information, demographics, etc. The best way I've found to handle metadata is to put it in inst/extdat/
and read it in protocol_specific.R
. If you source protocol_specific.R
in each assay script, you'll have rx or demo or whatever defined as an object that you can refer to. If you save these datasets using SaveObj()
, then alternately, you can load()
the .Rda files from the /data/
directory of the package.
This is a short description of the ICS processing flow. The processing code (data-raw/preprocess_flow.Rmd
produces HTML output in inst/doc/
. To run ICS preprocessing code, you’ll need to do some things first:
- Install bioconductor:
source("https://bioconductor.org/biocLite.R")
biocLite()
You’ll have to install several packages from bioconductor, by including them in a subsequent biocLite()
call just like you’d do with install.packages()
. The packages are flowWorkspace, openCyto, ggcyto, flowIncubator. Use
biocLite(c("flowWorkspace", "openCyto", "ggcyto", "flowIncubator"))
-
update the
PATH
variable at the top of the flow code to point toT://cavd/obj1/cvd465/
-
change
parseXML
(in 1st chunk) to TRUE for your first run--this is referred to in theeval=
argument of several chunks and controls whether the raw file parsing is done or not. -
define
SCRATCH
(1st chunk) to point to a location on your C drive that can hold a lot of data (about 20 GB). I kept local files for all protocols on my machine until the final report was sent out, so that I didn’t have to parse again. It can take 3-5 hours or more.
Once you parse the XML files, you’ll store objects locally so you don’t have to go through the extremely time-consuming parse process again. This is done with save_gs()
--you’ll see several save_gs()
calls through the process. If you’re ever running through the code and you worry that you’ve made a mistake, you can always just reload your last-saved local object with load_gs()
. Once you’ve successfully gotten through the last chunk that has the argument eval = parseXML
, you will have all relevant parsed data stored locally, and you can change parseXML to false and operate just from locally saved files.
FYI, protocol_specific.R
contains some object definitions that are relevant to ICS.
In general, the ICS flow goes like this:
Basic definitions, source files, load packages, set options
Parse XML files, save first local object
Used to standardize the parsed files--This will vary greatly by protocol and will often need Greg’s help. Save local object.
Print list of nodes and gating hierarchy
Add Boolean gates — this adds the Booleans to the data. Save second local object.
Extract the ICS data out of the GatingSet created in Step 7 into a data.table. Compute viability, compute marginals, and joint marginals, compute background values and parent values.
Fisher p-values
MIMOSA
COMPASS
Gating plots for review
In the inst/exdata
folder of this package, you'll find a file called arp.csv
that contains the Antigen Reagent Project antigen panels and clades.
For QCing BAMA data, we have worked out the following process with the Tomaras lab:
- Build package and internally QC and sanity check the data
- Make sure you run
writePerm()
in your BAMA code withqc=TRUE
. This outputs a CSV QC file for lab programmatic QC. - Write and internally review the BAMA QC report.
- Upload the QC csv file and QC report to COWS:
https://atlas.scharp.org/cpas/project/HVTN/Tools%20and%20Reports/Collaborative%20Workspace/Data%20Sets-Reports/begin.view?
- Navigate to the
Duke [Tomaras]
tab - Navigate to the LUM folder
- Navigate to the relevant cvdNNN numbered folder, or create one if it doesn't exist
- Use the COWS GUI tools to upload your files. CSV files should have a date in the filename. The BAMA code does this by default.
- Inform the lab by email that the QC report and csv file have been uploaded.