clustering plot (Figure 1e) #206

daccachejoe · 2023-09-27T18:17:14Z

Hi -

I've gotten the tool up and running and it seems to be working well. However, I am trying to understand how well my genotypes are defined/separated. How can I do so? I tried to recreate the clustering plot in the paper in figure 1 but I am not sure what is being used to create those visualizations. Is there a meaningful way to compare the genotypes inferred?

Ultimately this is a qc measure for my data. I want to ensurethat my demultiplexing is robust and having a visualization for that would be helpful.

Thanks!
Joe

wheaton5 · 2023-09-27T20:27:32Z

I've been meaning to make a script for this for a long time. I take the clusters_tmp.tsv file and i take the log likelihood columns, then row-wise i normalize them by dividing either by the mean or max (cant remember). Then that matrix I do a PCA on. Mathematically it doesn't make much sense, but it does provide a nice visualization.

daccachejoe · 2023-10-04T18:00:01Z

Would you mind sharing the code you used to generate that plot? I can try it on my data. It could be the package you used to run PCA but when I did as you stated with min-max normalization, I got this very odd looking PCA plot. Knowing me, I probably went astray along the way to generating this so looking at the original code would be very helpful. Thanks!

zheng-sc · 2023-11-13T14:44:32Z

may I have the code for cluster visualization? Thanks a lot!!!

wheaton5 · 2023-11-15T22:25:18Z

sorry, grant stuff came up and i got busy. I will try to get on this this weekend.

wheaton5 · 2023-11-15T22:29:10Z

i looked and i dont have the code anymore so i need to recreate it.

daccachejoe · 2023-11-15T22:37:42Z

Let me see if my code can help at all. My lab meeting with the plot above was lauded as likely incorrect but maybe you can share your thoughts on it.

library(dplyr)
library(ggplot2)
library(readr)
library(FactoMineR)

# Now for the barcode assignment and plots
pca.df <-  read_delim(paste0(souporcell.dir,"clusters.tsv"), delim = "\t")

# PCA plotting of cells by their genotype scores
pca.df <- pca.df %>% mutate(assignment = 
                              ifelse(status == "unassigned", 
                                     "NA", 
                                     ifelse(grepl("/", assignment),
                                            "NA",
                                            assignment))) %>%
  select(colnames(pca.df)[grep("^c",
                               colnames(pca.df))], 
         assignment)

# max normalize the data, its crude and potentially the source of why my plot is weird
pca.df[,1:4] <- t(apply(pca.df[,1:4], 1, function(x){(x/max(x))}))
res.pca = PCA(pca.df, scale.unit=F, ncp=20, graph=T, quali.sup = 5)
plot.PCA(res.pca, axes=c(1, 2), choix="ind")

res.pca.ggplot.df <- as.data.frame(res.pca[["ind"]][["coord"]])
res.pca.ggplot.df$assignment <- pca.df$assignment
res.pca.ggplot.df %>%
  ggplot(aes(x = Dim.1, y = Dim.2, color = assignment)) +
  geom_point(size = 0.5) +
  scale_color_manual(values = c("red", "green", "blue", "yellow", "gray")) +
  theme_classic()
dev.off()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clustering plot (Figure 1e) #206

clustering plot (Figure 1e) #206

daccachejoe commented Sep 27, 2023

wheaton5 commented Sep 27, 2023

daccachejoe commented Oct 4, 2023 •

edited

Loading

zheng-sc commented Nov 13, 2023

wheaton5 commented Nov 15, 2023

wheaton5 commented Nov 15, 2023

daccachejoe commented Nov 15, 2023

clustering plot (Figure 1e) #206

clustering plot (Figure 1e) #206

Comments

daccachejoe commented Sep 27, 2023

wheaton5 commented Sep 27, 2023

daccachejoe commented Oct 4, 2023 • edited Loading

zheng-sc commented Nov 13, 2023

wheaton5 commented Nov 15, 2023

wheaton5 commented Nov 15, 2023

daccachejoe commented Nov 15, 2023

daccachejoe commented Oct 4, 2023 •

edited

Loading