Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible batch effect in ATAC clusters in scMultiome data #1875

Closed
kryptonitecircuit2002 opened this issue Dec 27, 2024 · 0 comments
Closed

Comments

@kryptonitecircuit2002
Copy link

I am analysing an scMultiome data using the following code:

#Read files
input.10X.Control <- Read10X_h5("D:/Transit/Single_Cell+ATAC/Combined_Analysis/Control_sample/filtered_feature_bc_matrix_control.h5")
fragpath.Control <- "D:/Transit/Single_Cell+ATAC/Combined_Analysis/Control_sample/atac_fragments_control.tsv.gz"
metadata.Control <- read.csv(
  file = "D:/Transit/Single_Cell+ATAC/Combined_Analysis/Control_sample/per_barcode_metrics_control.csv",
  header = TRUE,
  row.names = 1
)
## Create Seurat object
Control <- CreateSeuratObject(counts = input.10X.Control$`Gene Expression`, 
                           assay = 'RNA',
                           project = sampleID,
                           meta.data = metadata.Control)

Control[["percent.mt"]] <- PercentageFeatureSet(Control, pattern = "^MT-")

#Get annotations 
#annotaion from GTF
gref.path = "D:/Transit/Single_Cell+ATAC/Control/ATAC/Data+Analysis/Analysis.Control/Danio_rerio.GRCz11.105.filtered.gtf"
gene.coords.zf <- rtracklayer::import(gref.path)
genome(gene.coords.zf) <- 'GRCz11'
gene.coords.zf <- gene.coords.zf[! is.na(gene.coords.zf$gene_name),]

##Add ATAC data
atac_counts <- input.10X.Control$Peaks
grange.counts <- StringToGRanges(rownames(atac_counts), sep = c(":", "-"))
grange.use <- seqnames(grange.counts) %in% standardChromosomes(grange.counts)
atac_counts <- atac_counts[as.vector(grange.use), ]
Control[['ATAC']] <- CreateChromatinAssay(
  counts = atac_counts,
  sep = c(":", "-"),
  genome = 'GRCz11',
  fragments = fragpath.Control,
  min.cells = 10,
  annotation = gene.coords.zf
)
#QC
DefaultAssay(Control) <- "ATAC"
Control <- NucleosomeSignal(Control)
Control <- TSSEnrichment(Control, fast = FALSE)
Control$pct_reads_in_peaks <- Control$atac_peak_region_fragments / Control$atac_fragments * 100

#Peak calling
system('bash -c "/home/ayush/.local/bin/macs2 callpeak -t /mnt/d/Transit/Single_Cell+ATAC/Combined_Analysis/Control_sample/atac_fragments_control.tsv.gz -g 1.4e+09 -f BED --nomodel --extsize 200 --shift -100 -n Control_cerbx_script --outdir /mnt/d/Transit/Single_Cell+ATAC/Control/ATAC/Data+Analysis/Analysis.Control/Cerebrocortex" ', 
                wait = TRUE,  ignore.stderr = FALSE,  ignore.stdout = FALSE)
peak_path <- "D:/Transit/Single_Cell+ATAC/Control/ATAC/Data+Analysis/Analysis.Control/Cerebrocortex/Control_cerbx_script_peaks.narrowPeak"
peaks <- rtracklayer::import(peak_path, format = "narrowPeak")
peaks <- keepStandardChromosomes(peaks, pruning.mode = 'coarse')

# quantify counts in each peak
macs_count <- FeatureMatrix(fragments = Fragments(Control),
                            features = peaks,
                            cells = colnames(Control))

Control[['ATAC']] <- CreateChromatinAssay(
  counts = macs_count,
  sep = c(":", "-"),
  genome = 'GRCz11',
  fragments = fragpath.Control,
  min.cells = 10,
  annotation = gene.coords.zf
)

Control <- subset(
  x = Control,
  subset = nCount_ATAC < 50000 &
    nCount_RNA < 10000 &
    nucleosome_signal < 1.5 &
    TSS.enrichment > 1 &
    TSS.enrichment < 10)

#preprocess <- function(Control, n.pc = 30, n.lsi = 10){
  
 #Filter peaks/genes 
  
 tmp <- Matrix::rowSums(Control[["RNA"]]@layers$counts > 0)
 Control[["RNA"]] <- subset(Control[["RNA"]], features = names(which(tmp >= 10)))
 tmp <- Matrix::rowSums(Control[["ATAC"]]@counts > 0)
 Control[["ATAC"]] <- subset(Control[["ATAC"]], features = names(which(tmp >= 10)))
  
# ATAC-seq
  DefaultAssay(Control) <- "ATAC"
  Control <- RunTFIDF(Control, method = 3)
  Control <- FindTopFeatures(Control, min.cutoff = 'q75')
  Control <- RunSVD(Control)
  Control <- RunUMAP(Control, reduction = 'lsi', dims = 2:10, assay = 'ATAC',
                    reduction.name = "umap.atac", reduction.key = "atacUMAP_")
  Control <- FindNeighbors(Control, reduction = 'lsi', dims = 2:10, assay = 'ATAC')
  Control <- FindClusters(Control, graph.name = 'ATAC_snn', algorithm = 3, resolution = 0.3)

However, when I am running the code:
DimPlot(Control, reduction = 'umap.atac', label = T)

I am getting the following ATAC clusters:
ATAC

Is this arising because of possible batch effects? How do I solve this?

@stuart-lab stuart-lab locked and limited conversation to collaborators Jan 4, 2025
@timoast timoast converted this issue into discussion #1879 Jan 4, 2025

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant