Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with LinkPeaks: Genes and Peaks Located on Different Chromosomes #1858

Open
nh-codem opened this issue Nov 28, 2024 · 5 comments
Open
Labels
bug Something isn't working

Comments

@nh-codem
Copy link

Hi, I encountered an unusual result while using the LinkPeaks function: some gene-peak linkages (e.g., TMEM92) are not within the specified distance threshold defined by the distance = 500000 parameter and, surprisingly, are located on entirely different chromosomes.

Could you please provide insights into why this might happen? Are there any specific parameters or preprocessing steps I should double-check to address this issue?

Thank you for your help!

Here's the code:

objTemp_CRE <- LinkPeaks(objTemp_CRE, peak.assay = "peaks", distance = 500000, expression.assay = "RNA")
Links(objTemp_CRE)

image

I reviewed other similar issues, and apart from the seqlevels of my objTemp_CRE object returning NULL, I didn't notice any specific anomalies in my seqlevels settings.

> objTemp_CRE
An object of class Seurat 
30050 features across 1863 samples within 2 assays 
Active assay: peaks (30000 features, 28515 variable features)
 2 layers present: counts, data
 1 other assay present: RNA
 3 dimensional reductions calculated: lsi, harmony, umap

> objTemp_CRE[["peaks"]]
ChromatinAssay data with 30000 features for 1863 cells
Variable features: 28515 
Genome: 
Annotation present: TRUE 
Motifs present: FALSE 
Fragment files: 48 

> seqlevels(granges(objTemp_CRE))
 [1] "chr1"  "chr2"  "chr3"  "chr4"  "chr5"  "chr6"  "chr7"  "chr8"  "chr9" 
[10] "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18"
[19] "chr19" "chr20"

> seqlevels(Annotation(objTemp_CRE))
 [1] "chr1"  "chr2"  "chr3"  "chr4"  "chr5"  "chr6"  "chr7"  "chr8"  "chr9" 
[10] "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18"
[19] "chr19" "chr20" "chrX"  "chrMT"

> seqlevels(objTemp_CRE)
NULL

image

@nh-codem nh-codem added the bug Something isn't working label Nov 28, 2024
@timoast
Copy link
Collaborator

timoast commented Dec 8, 2024

Can you show the full code you're using?

@nh-codem
Copy link
Author

nh-codem commented Dec 8, 2024

Can you show the full code you're using?

oh, yep, here's the code I used: @timoast

pacman::p_load(Seurat,Signac,dplyr,ggplot2,future)
future::plan(future::multicore(workers = future::availableCores() - 10 )); options(future.globals.maxSize = 1e+12, future.seed = TRUE,future.rng.onMisuse = 'ignore', bitmapType='cairo')
load('macFas5_genomeInfo_signac.RData')

obj.cre <- readRDS("signac_final.Rds")
obj.cre[["RNA"]] <- readRDS("CCA_imputeGS.Rds") %>% GetAssay(., assay = "RNA") ### CCA Imputed GeneExpr Data
obj.cre[["GeneActivity"]] <- NULL ### Drop Raw GeneActivity

# *objTemp_CRE is a Test Data
DefaultAssay(obj.cre) <- "peaks"
objTemp_CRE <- subset(obj.cre, cytpe == "XXX", features = sample(rownames(obj.cre), 1e+04))
DefaultAssay(obj.cre) <- "eRNA"
objTemp_CRE[["RNA"]] <- subset(obj.cre, cytpe == "XXX",features = sample(rownames(obj.cre), 100)) %>% GetAssay(., assay = "RNA")

# *Perform linking of peaks to genes
DefaultAssay(objTemp_CRE) <- "peaks"
objTemp_CRE <- objTemp_CRE  %>% RegionStats(genome = BSgenome.Mfascicularis.NCBI.5.0)
objTemp_CRE <- LinkPeaks(objTemp_CRE, peak.assay = "peaks", distance = 500000, expression.assay = "RNA")

@nh-codem
Copy link
Author

nh-codem commented Dec 9, 2024

However, while processing genome annotations, I used the following code to modify chromosome names:

suppressPackageStartupMessages(library(BSgenome.Mfascicularis.NCBI.5.0))
seqnames(BSgenome.Mfascicularis.NCBI.5.0) = gsub("MFA", "chr", seqnames(BSgenome.Mfascicularis.NCBI.5.0))

I am unsure whether these main steps might cause any anomalies.
@timoast

@timoast
Copy link
Collaborator

timoast commented Dec 13, 2024

Can you show the code used to create signac_final.Rds?

@nh-codem
Copy link
Author

@timoast

# 1.0 Prepare custom genome annotation
pacman::p_load(rtracklayer,plyranges,dplyr)
{
    # Prepare Genome ----------------------------------------------------------1
    suppressPackageStartupMessages(library(BSgenome.Mfascicularis.NCBI.5.0))
    seqnames(BSgenome.Mfascicularis.NCBI.5.0)=gsub("MFA","chr",seqnames(BSgenome.Mfascicularis.NCBI.5.0))
    seqinfo_macFas5 <- seqinfo(BSgenome.Mfascicularis.NCBI.5.0) %>% as.data.frame %>% 
        tibble::rownames_to_column("seqnames") %>% 
        filter(grepl('^chr', seqnames)) %>% tibble %>% 
        {Seqinfo(seqnames = .$seqnames, seqlengths = .$seqlengths, isCircular = .$isCircular, genome = 'macFas5')}
    
    genome.blacklist <- readRDS('macFas5.huada.backlist.Rds') 
    genome.annotation <- rtracklayer::import('Macaca_fascicularis_5.0.91.2.2_withchrname.gtf') %>% 
        plyranges::mutate(tx_id = transcript_id, tx_name = transcript_name) %>% 
        plyranges::mutate(gene_name = if_else(is.na(gene_name),gene_id,gene_name)) %>% 
        plyranges::filter(! seqnames %in% c("M","chrM","MT","X","Y","chrX","chrY"))
} %>% invisible %>% suppressMessages %>% suppressWarnings
save.image('macFas5_genomeInfo_signac.RData')


# 1.1 Create a peak set for multiple libraries and generate a separate Signac object for each Animal
pacman::p_load(Seurat,Signac,rtracklayer,plyranges,dplyr, future)
plan(multicore(workers = 10)); options(future.globals.maxSize = 1e+12, bitmapType='cairo')
load('macFas5_genomeInfo_signac.RData')

{
    fragpath.lst <- list.files('sn_cre',"*.fragments.tsv.bgzip.gz$", full.names = T)
    peaks_all <- CallPeaks(fragpath.lst, effective.genome.size = 2.9e+09) %>% keepStandardChromosomes(., pruning.mode = "coarse")
    peaks_flt <- peaks_all %>% keepStandardChromosomes(., pruning.mode = "coarse") %>% 
        plyranges::filter(!seqnames %in% c("M","chrM","MT","X","Y","chrX","chrY")) 
    
    
    catalog_libs <- readxl::read_excel("Sample_libinfo.xlsx",sheet = "cre.matrix") %>% select(1:10) %>% dplyr::rename("orig.ident" = "LibName")
    lapply(setNames(unique(catalog_libs$Animal),unique(catalog_libs$Animal)), function(animal){
        fragpath.lst <- catalog_libs %>% dplyr::filter(Animal == !!animal) %>% pull(ServerPath) %>% unique()
        frags.obj <- lapply(fragpath.lst, function(frags){
            CB <- data.table::fread(frags, select = 4, sep = "\t", header = FALSE) %>% pull(1) %>% unique()
            return({CreateFragmentObject(path = frags,cells = CB, validate.fragments = F, verbose = F)})
        }) %>% invisible %>% suppressMessages
        
        counts_combine  <- frags.obj %>% FeatureMatrix(.,features = peaks_flt, process_n = 1e+04)
        chrom_assay <- CreateChromatinAssay(counts_combine, fragments = frags.obj, annotation = genome.annotation, verbose = F)
        combine_obj <- CreateSeuratObject(counts = chrom_assay, assay = "peaks",verbose = F)
        seqinfo(combine_obj) <- seqinfo_macFas5
        
        combine_obj <- combine_obj %>% NucleosomeSignal(object = ., verbose = F) %>% TSSEnrichment(object = ., fast = F,verbose = F) 
        [email protected] <- [email protected] %>% 
            mutate(
                nFrags = doublet_info[row.names(.), 'nFrags'],
                PRiPs = round(nCount_peaks / nFrags * 100, 2),
            )
        combine_obj$blk_frac  <- FractionCountsInRegion(object = combine_obj, regions = genome.blacklist, assay = 'peaks')
        [email protected] <- [email protected] %>% tibble::rownames_to_column("cellN") %>% 
            mutate(orig.ident = paste0("ATAC-",orig.ident)) %>% 
            plyr::join(catalog_libs,by = "orig.ident") %>% 
            tibble::column_to_rownames("cellN")
        
        return(combine_obj)
    }) -> keep3Anima.Objs
    keep3Anima.Objs %>% qs::qsave(file = "signac_obj_raw.qs") 
}



# 1.2 Quality Control
signac.list <- qs::qread("signac_obj_raw.qs")
lapply(signac.list, function(obj){
    ########## filter ########## 
    return({
        subset(x = obj, subset = 
                   nFrags > 1000 &
                   nCount_peaks > 1000 & 
                   PRiPs > 15 &
                   blk_frac < 0.05 &
                   nucleosome_signal < 4 &
                   TSS.enrichment > 1.7
        )
    })
}) %>% (\(.) merge(.[[1]],.[-1]))() -> combine_obj

combine_obj <- combine_obj %>% RunTFIDF(verbose = F) %>% 
    FindTopFeatures(verbose = F) %>% RunSVD(n = 30,verbose = F) %>% 
    harmony::RunHarmony(group.by.vars = 'Animal',reduction.use = 'lsi',project.dim = F) %>% 
    RunUMAP(reduction = 'harmony', dims = 2:30,verbose = F) %>% 
    FindNeighbors(reduction = 'harmony', dims = 2:30,verbose = F) %>% 
    FindClusters(resolution = 0.5, algorithm = 3,verbose = F)

combine_obj %>% saveRDS("signac_final.Rds")


    

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants