Author: Lev Litichevskiy
Last updated: December 18, 2024
This repository contains code and data that can be used to reproduce figures from the Dietary Restriction in Diversity Outbred mice (DRiDO) microbiome manuscript. The starting point for this repository is summarized tables of taxonomic and functional classification results (not fastq files).
L Litichevskiy, M Considine, J Gill, V Shandar, … A Di Francesco, GA Churchill, M Li, CA Thaiss. Interactions between the gut microbiome, dietary restriction, and aging in genetically diverse mice. https://www.biorxiv.org/content/10.1101/2023.11.28.568137
- Taxonomy:
data/kraken_matrix_agg_by_stool_ID_n1303x2997.txt
withdata/kraken_taxonomy_n1303.txt
- Absolute counts
- Pathways:
data/pathabundance_tpm_agg_by_stool_ID_n422x2997.txt
- TPM (transcripts-per-million) abundances
- That is, normalized for gene length and sequencing depth
- fastq files available on SRA: PRJNA1054518
This tutorial demonstrates how to import taxonomic data and perform several basic analyses.
analysis
contains .Rmd notebooks used for generating figuresplots
contains figures, i.e. the output ofanalysis
scripts
contains a mix of .R and .Rmd files used for data processing and running linear modelsresults
contains the output ofscripts
data
contains the inputs toscripts
andanalysis
, including metadata
See here for more details about the overall workflow.
See here for which script was used to produce every figure panel in the manuscript.
Note that there are multiple layers of metadata: sequencing metadata,
library metadata, and stool metadata are stored separately. Multiple
sequencing IDs (seq.ID
) can correspond to the same library ID
(lib.ID
), and multiple library IDs can correspond to the same stool
sample (stool.ID
). Mice contributed one or more stool samples.
Note also that library and stool metadata is embedded in the SRA metadata, but sequencing metadata is not. This is because individual SRA runs correspond to (unique) library IDs. In the unlikely event that a user will need to know which sequencing IDs correspond to each SRA run (i.e. SRR accession), we have made available a mapping between SRR accessions and sequencing IDs. Only a small number of libraries / SRA runs (n=259) correspond to multiple sequencing IDs, i.e. these libraries were sequenced multiple times.
This code was run on macOS Big Sur using R v4.2.2. All R packages are available from CRAN or Bioconductor — except for ASReml, which requires a license. ASReml was used for estimating heritability and running linear mixed models. Identical results can be produced using the lme4qtl package (see run_lme4qtl.R for an example).
All analyses except for mediation and QTL mapping were run on a laptop. Mediation analysis was performed on a cluster using Snakemake (Snakefile_mediation, run_mediation_one_diet_one_pheno.R), and QTL mapping was performed on a cluster using R/qtl2 (run_genetic_mapping_rqtl2.R).
Karl Broman’s R/qtl2 was written specifically to handle multi-parent QTL mapping crosses such as DO mice.
QTL mapping was performed as described in Zhang et al., Genetics, 2022 (“Genetic linkage analysis”).
Input files not included in this repo: