Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

q-values after consensus peak normalization are all 1 #160

Open
kelly-sovacool opened this issue Dec 5, 2023 · 2 comments
Open

q-values after consensus peak normalization are all 1 #160

kelly-sovacool opened this issue Dec 5, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@kelly-sovacool
Copy link
Member

Regardless of dataset used.

This causes an error when running chipseeker_peakplot:

Command error:
  Warning message:
  package 'dplyr' was built under R version 4.3.2
  GRanges object with 58429 ranges and 2 metadata columns:
            seqnames            ranges strand |      pvalue    qvalue
               <Rle>         <IRanges>  <Rle> |   <numeric> <numeric>
        [1]     chr1   3671441-3671716      * | 4.16506e-12         1
        [2]     chr1   4142568-4142910      * | 2.12097e-11         1
        [3]     chr1   4228249-4228601      * | 7.22163e-12         1
        [4]     chr1   4297125-4297414      * | 3.61542e-12         1
        [5]     chr1   4332522-4332831      * | 1.30793e-11         1
        ...      ...               ...    ... .         ...       ...
    [58425]     chrY 90804913-90805435      * | 3.87469e-11         1
    [58426]     chrY 90807619-90808047      * | 4.66823e-11         1
    [58427]     chrY 90808620-90809110      * | 4.36221e-11         1
    [58428]     chrY 90810807-90811291      * | 3.35543e-11         1
    [58429]     chrY 90828800-90829091      * | 4.38834e-12         1
    -------
    seqinfo: 21 sequences from an unspecified genome; no seqlengths
  chr1 dosen't contain signal higher than 1
  chr10 dosen't contain signal higher than 1
  chr11 dosen't contain signal higher than 1
  chr12 dosen't contain signal higher than 1
  chr13 dosen't contain signal higher than 1
  chr14 dosen't contain signal higher than 1
  chr15 dosen't contain signal higher than 1
  chr16 dosen't contain signal higher than 1
  chr17 dosen't contain signal higher than 1
  chr18 dosen't contain signal higher than 1
  chr19 dosen't contain signal higher than 1
  chr2 dosen't contain signal higher than 1
  chr3 dosen't contain signal higher than 1
  chr4 dosen't contain signal higher than 1
  chr5 dosen't contain signal higher than 1
  chr6 dosen't contain signal higher than 1
  chr7 dosen't contain signal higher than 1
  chr8 dosen't contain signal higher than 1
  chr9 dosen't contain signal higher than 1
  chrX dosen't contain signal higher than 1
  chrY dosen't contain signal higher than 1
  Error in UseMethod("group_by") :
    no applicable method for 'group_by' applied to an object of class "list"
  Calls: covplot -> getChrCov -> %>% -> summarise -> group_by
  Execution halted
@kelly-sovacool kelly-sovacool added the bug Something isn't working label Dec 5, 2023
@kelly-sovacool kelly-sovacool changed the title q-values after consensus normalization are all 1 q-values after consensus peak normalization are all 1 Dec 5, 2023
@kelly-sovacool
Copy link
Member Author

Could this be a bug in how normalized q-values are calculated, or is it a feature of this normalization method?

https://github.com/CCBR/nf-modules/blob/60d50f4c45a50378cad70b49013f51750617caaa/modules/CCBR/custom/normalizepeaks/templates/normalize_peaks.R#L45-L61

@kelly-sovacool
Copy link
Member Author

kelly-sovacool commented Jan 3, 2024

We are building a chipseq analysis pipeline and want to be able to create a consensus peak set across samples. We found a normalization method to account for variation in read depth and quality across samples that takes the -log10(p-value) and divides that by the sum of all -log10(p-values) divided by 1 million to create a "score per million". However, we'd also like to have normalized q-values (a.k.a FDR / Benjamini-Hochberg). Our first attempt to calculate normalized q-values resulted in all of them being nearly 1. We're not sure if there's a problem in how we are calculating it or if it's a feature of this normalization method.

Description of the normalization method from the supplementary methods from Corces et al. 2018: image

Our implementation in R:

#' Normalization method from Corces et al.
#' https://doi.org/10.1126/science.aav1898
corces <- function(x) {
  return(x / sum(x) / 10^6)
}

#' Normalize peaks
normalize <- function(peak_dat, norm_method = corces) {
  return(
    peak_dat %>%
      mutate(
        pvalue_norm = norm_method(pvalue_neglog10),
        qvalue_norm = 10^(-pvalue_norm) %>% p.adjust(method = "BH") %>% -log10(.)
      ) %>%
      select(-c(pvalue, qvalue)) %>%
      rename(pvalue = pvalue_norm, qvalue = qvalue_norm)
  )
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant