Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to understand the LMM models in alpha, beta diversity and Differencial abundance. #67

Open
OrsonMM opened this issue Oct 28, 2024 · 5 comments
Labels
question Further information is requested

Comments

@OrsonMM
Copy link

OrsonMM commented Oct 28, 2024

Dear team MicrobiomeStat,

I am appreciate very much your software contribution. I am new using Lineal Mixed models.
Please can you suggest me If I am used my data correctly.

In my experiment, I have this variables:

asv variable : Taxonomical abundances of DADA2 output
treat variable: 4 differents (A, B, C and D)
time variable : 3 differents time points (1,2,3)
sample_treatment_time variable: 5 independent samples for each treatment and their respective replicates over time (60 samples in total).

image

My question is what is the asv community that are affected by Treat, Time or interaction of these Treat:Time.

I am enter my variables for model in MicrobiomeStat:

group.var = Treat
subject.var = sample_treatment_time
time.var = Time

Please can you explain me how is the ecuation form :

In the manual I am not sure if use the same model for alpha and beta diversity and for diferential abundance of AVS.

I understand that use : y ~ time.var + group.var + time.var : group.var + (1 | subject.var)
is correct ??

Greats

@cafferychen777
Copy link
Owner

Dear Orson,

Thank you for your interest in MicrobiomeStat and for reaching out with your question about Linear Mixed Models (LMM). We appreciate your detailed description of your experimental design.

From your description, I can see you have:

  • 4 treatments (A, B, C, D)
  • 3 time points
  • 5 independent samples per treatment with replicates over time
  • A total of 60 samples

While the model formula you suggested (y ~ time.var + group.var + time.var:group.var + (1|subject.var)) is generally appropriate for longitudinal microbiome data analysis, to better assist you, could you please specify which MicrobiomeStat function(s) you are using?

Each function might have slightly different implementations to accommodate the specific needs of alpha diversity, beta diversity, and differential abundance analyses.

Once you clarify which function(s) you're working with, I can provide more specific guidance about the model implementation.

Best regards

@OrsonMM
Copy link
Author

OrsonMM commented Oct 28, 2024

Hi Caffery Yang,

Thank's for rapid response,

I understand based on your response that each function generate a different ecuation model.
I have more doubts in these functions:

  1. alpha diversity
alpha_time_diversity <- generate_alpha_trend_test_long(
  data.obj = rarefy_data_genus,
  alpha.name = c("shannon", "simpson", "observed_species", "chao1", "ace","pielou"),
  depth = NULL,
  time.var = "Time",
  subject.var = "sample_treatment_time",
  group.var = "Treat",
  adj.vars = NULL
  )
  1. Beta diversity
beta_diversity <- generate_beta_trend_test_long(
  data.obj = rarefy_data_genus,
  dist.obj = NULL,
  subject.var = "sample_treatment_time",   # random effect - I am not understand if is a slope or intercept ramdom  
  time.var = "Time", # Fixed effect 
  group.var = "Treat",
  adj.vars = NULL,
  dist.name = c("Jaccard")
)
beta_diversity_volatility <- generate_beta_volatility_test_long(
  data.obj = rarefy_data_genus,
  dist.obj = NULL,
  subject.var = "sample_treatment_time",
  time.var = "Time",
  group.var = "Treat",
  adj.vars = NULL,
  dist.name = c("BC","Jaccard","UniFrac","JS")
)

  1. DA

Here, I prefered used linda because I can put the ecuation.
(But I am not sure if its correct)

model_1 <- linda(
  feature.dat = genus_normalizated_data$feature.tab,
  meta.dat = genus_data$meta.dat,
  formula = '~ Time + Treat + Treat:Time + (1 | sample_treatment_time)', 
  feature.dat.type = c('proportion'),
  prev.filter = 0.1,
  mean.abund.filter = 0,
  max.abund.filter = 0,
  is.winsor = TRUE,
  outlier.pct = 0.03,
  adaptive = TRUE,
  zero.handling = c('imputation'),
  pseudo.cnt = 0.5,
  corr.cut = 0.1,
  p.adj.method = "fdr",
  alpha = 0.05,
  n.cores = 20,
  verbose = TRUE
)

@OrsonMM OrsonMM changed the title Request to understand the MLM models in alpha, beta diversity and Differencial abundance. Request to understand the LMM models in alpha, beta diversity and Differencial abundance. Oct 28, 2024
@cafferychen777
Copy link
Owner

Hi Orson,

Thank you for your detailed follow-up questions about the model equations in MicrobiomeStat. I'll explain how each function implements its statistical models:

  1. Alpha Diversity Analysis
    For your alpha_time_diversity call, the function implements a linear mixed effects model of the form:
alpha_diversity ~ Treat * Time + (1 + Time | Sample_Time)

This model includes:

  • Fixed effects: Treatment, Time, and their interaction (Treat * Time)
  • Random effects: Both random intercepts AND random slopes for Time nested within each Sample
  • This allows each sample to have its own trajectory over time
  1. Beta Diversity Analysis
    For your beta_diversity call, the function attempts two model structures in order of complexity:

First tries:

Jaccard_distance ~ Treat * Time + (1 + Time | Sample_Time)

If that fails to converge, automatically simplifies to:

Jaccard_distance ~ Treat * Time + (1 | Sample_Time)

For your beta_diversity_volatility call, this is actually a different type of analysis. It:

  1. First calculates volatility (rate of change between consecutive timepoints) for each subject

  2. Then fits a simple linear model: volatility ~ Treat

  3. Differential Abundance Analysis (linda)
    Your formula is well-structured:

abundance ~ Time + Treat + Treat:Time + (1 | Sample_Time)

This model:

  • Tests main effects of Time and Treatment
  • Tests their interaction
  • Includes random intercepts for each Sample
  • The function also applies CLR transformation to abundances and handles zeros/outliers appropriately

Some suggestions for your analysis:

  1. For the alpha and beta trend analyses, the default inclusion of random slopes is appropriate for longitudinal data but may not converge with only 3 timepoints. Don't worry if this happens - the functions will automatically simplify to random intercepts.

  2. Make sure your "Sample_Time" variable uniquely identifies samples that are measured repeatedly. Each independent sample should have a consistent identifier across its timepoints.

  3. For linda, you could consider matching the alpha/beta diversity models by using:

~ Time + Treat + Treat:Time + (1 + Time | Sample_Time)

Though your current random intercept model is also perfectly valid.

Overall, your implementation looks appropriate for your experimental design (4 treatments, 3 timepoints, 5 replicates per treatment-timepoint combination). Let me know if you need any clarification about specific aspects of these models.

Best regards,
Chen

@cafferychen777
Copy link
Owner

PS: I'd like to encourage you to explore MicrobiomeStat's rich visualization capabilities to complement your statistical analyses.

@cafferychen777 cafferychen777 added the question Further information is requested label Oct 28, 2024
@OrsonMM
Copy link
Author

OrsonMM commented Oct 28, 2024

I appreciated so much your help @cafferychen777

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants