Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clustering plot (Figure 1e) #206

Open
daccachejoe opened this issue Sep 27, 2023 · 6 comments
Open

clustering plot (Figure 1e) #206

daccachejoe opened this issue Sep 27, 2023 · 6 comments

Comments

@daccachejoe
Copy link

Hi -

I've gotten the tool up and running and it seems to be working well. However, I am trying to understand how well my genotypes are defined/separated. How can I do so? I tried to recreate the clustering plot in the paper in figure 1 but I am not sure what is being used to create those visualizations. Is there a meaningful way to compare the genotypes inferred?

Ultimately this is a qc measure for my data. I want to ensurethat my demultiplexing is robust and having a visualization for that would be helpful.

Thanks!
Joe

@wheaton5
Copy link
Owner

I've been meaning to make a script for this for a long time. I take the clusters_tmp.tsv file and i take the log likelihood columns, then row-wise i normalize them by dividing either by the mean or max (cant remember). Then that matrix I do a PCA on. Mathematically it doesn't make much sense, but it does provide a nice visualization.

@daccachejoe
Copy link
Author

daccachejoe commented Oct 4, 2023

Would you mind sharing the code you used to generate that plot? I can try it on my data. It could be the package you used to run PCA but when I did as you stated with min-max normalization, I got this very odd looking PCA plot. Knowing me, I probably went astray along the way to generating this so looking at the original code would be very helpful. Thanks!
image

@zheng-sc
Copy link

may I have the code for cluster visualization? Thanks a lot!!!

@wheaton5
Copy link
Owner

sorry, grant stuff came up and i got busy. I will try to get on this this weekend.

@wheaton5
Copy link
Owner

i looked and i dont have the code anymore so i need to recreate it.

@daccachejoe
Copy link
Author

Let me see if my code can help at all. My lab meeting with the plot above was lauded as likely incorrect but maybe you can share your thoughts on it.

library(dplyr)
library(ggplot2)
library(readr)
library(FactoMineR)

# Now for the barcode assignment and plots
pca.df <-  read_delim(paste0(souporcell.dir,"clusters.tsv"), delim = "\t")

# PCA plotting of cells by their genotype scores
pca.df <- pca.df %>% mutate(assignment = 
                              ifelse(status == "unassigned", 
                                     "NA", 
                                     ifelse(grepl("/", assignment),
                                            "NA",
                                            assignment))) %>%
  select(colnames(pca.df)[grep("^c",
                               colnames(pca.df))], 
         assignment)

# max normalize the data, its crude and potentially the source of why my plot is weird
pca.df[,1:4] <- t(apply(pca.df[,1:4], 1, function(x){(x/max(x))}))
res.pca = PCA(pca.df, scale.unit=F, ncp=20, graph=T, quali.sup = 5)
plot.PCA(res.pca, axes=c(1, 2), choix="ind")

res.pca.ggplot.df <- as.data.frame(res.pca[["ind"]][["coord"]])
res.pca.ggplot.df$assignment <- pca.df$assignment
res.pca.ggplot.df %>%
  ggplot(aes(x = Dim.1, y = Dim.2, color = assignment)) +
  geom_point(size = 0.5) +
  scale_color_manual(values = c("red", "green", "blue", "yellow", "gray")) +
  theme_classic()
dev.off()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants