-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weights for Samples of LinDA Modelling #59
Comments
Dear Dario Strbenac, Thank you for your feedback regarding the LinDA analysis in MicrobiomeStat. I appreciate you bringing this to our attention. To ensure we address your concern effectively, could you please provide some additional clarification?
Your insights will greatly help us improve MicrobiomeStat and ensure it meets the needs of our users. Thank you for your time and contribution to the project. Best regards, |
Oops, inverse weights would not make sense. Standard weights would. We noticed the issue when we identified the same species of bacteria but opposite fold change to published by another team who also did qPCR validation. The comparison was healthy volunteers versus cancer-adjacent normal tissue. Note that two Healthy group samples have high proportions. > healthy[1, ] # Just the ten healthy volunteer samples subset of the whole matrix.
Oral_1-N Oral_2-N Oral_3-N Oral_4-N Oral_5-N Oral_6-N Oral_7-N Oral_8-N Oral_9-N Oral_10-N
Cutibacterium acnes 0 0 0.113 0.1 0.69 0.025 0.71 0.093 0 0.000 The high proportion is an artefact of only three species detected for samples Oral_5-N and Oral_7-N. > colSums(healthy > 0)
Oral_1-N Oral_2-N Oral_3-N Oral_4-N Oral_5-N Oral_6-N Oral_7-N Oral_8-N Oral_9-N Oral_10-N
2 0 14 14 3 5 3 7 6 15 Code to reproduce the Cutibacterium acnes contradictory result is: library(MicrobiomeStat)
load("testLindaRobustness.RData")
normalsCancerFit <- linda(bacteriaMatrixNormals, clinicalTableNormals, "~ Age + Smoking + Gender + `Tissue Type`",
"proportion")
results <- normalsCancerFit[["output"]][["`Tissue Type`Normal"]]
results["Cutibacterium acnes", c("log2FoldChange", "lfcSE", "stat", "pvalue", "padj")]
log2FoldChange lfcSE stat pvalue padj
Cutibacterium acnes -8.123449 1.859128 -4.369493 5.445916e-05 0.0009802648 If I plot the proportions, I see that there are two outliers in Healthy group and one outlier in Normal group. library(ggplot2)
theme_set(theme_bw())
plotData <- data.frame(sampleID = colnames(bacteriaMatrixNormals),
proportion = bacteriaMatrixNormals["Cutibacterium acnes", ],
group = clinicalTableNormals[, "Condition"])
ggplot(plotData, aes(x = sampleID, y = proportion)) + geom_bar(stat = "identity") +
facet_grid(cols = vars(group), scales = "free_x", space = "free_x") +
geom_hline(yintercept = 1.00, linetype = "dashed", colour = "red") +
theme(axis.text.x = element_text(angle = 90)) + ggtitle("Cutibacterium acnes Proportion") In statistics, it is popular to down-weight samples instead of discarding them.
So, ideally, I would like to be able to do something like: myWeights <- colSums(proportionsMatrix > 0)
sampleInfo$myWeights <- myWeights
linda(proportionsMatrix, sampleInfo, "~ status","proportion", weights = "myWeights") Winsorisation makes almost no difference. I tried |
I notice that the top-ranked bacteria are sometimes caused by outliers of
linda
analysis. This happens when some samples have five species and other species have fifty species detected, for instance. I would likelinda
to have an option for weighting of samples inversely to the number of species detected for each sample.lm
hasweights
parameter.linda
should also support weights.The text was updated successfully, but these errors were encountered: