Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider consensus peak overlap procedure #159

Open
kelly-sovacool opened this issue Nov 30, 2023 · 2 comments
Open

Consider consensus peak overlap procedure #159

kelly-sovacool opened this issue Nov 30, 2023 · 2 comments

Comments

@kelly-sovacool
Copy link
Member

kelly-sovacool commented Nov 30, 2023

During consensus peak calling, consider using the method from Corces et al. (doi:10.1126/science.aav1898) to handle overlapping peaks, rather than bedtools merge. From pgs. 6-7 of the supplement:

Peak calling for 796 ATAC-seq profiles and 23 cancer types was performed to ensure high quality fixed-width peaks. We chose to use fixed-width peaks because (i) it makes count based and motif focused analyses less biased to large peaks and (ii) with large datasets merging peak sets to obtain a union peak set can lead to many peaks being merged into one very large peak, limiting our ability to resolve independent peaks. Because each cancer type is not represented by an equal number of samples, we first determined a peak set for each cancer type individually. Initially, performing peak calling with MACS2, we found that peak calls were affected by changes in data quality (TSS enrichment scores ranged from 3.94 to 19 in our dataset) and read depth (range 26 million to 258 million per replicate). To overcome this issue, we designed a peak calling procedure that would produce a set of high confidence peaks. For each sample, peak calling was performed on the Tn5-corrected single-base insertions using the MACS2 callpeak command with parameters “--shift -75 --extsize 150 --nomodel --call-summits --nolambda --keep-dup all -p 0.01”. The peak summits were then extended by 250 bp on either side to a final width of 501 bp, filtered by the ENCODE
hg38 blacklist (https://www.encodeproject.org/annotations/ENCSR636HFF/), and filtered to remove peaks that extend beyond the ends of chromosomes.

Overlapping peaks called within a single sample were handled using an iterative removal procedure. First, the most significant peak is kept and any peak that directly overlaps with that significant peak is removed. Then, this process iterates to the next most significant peak and so on until all peaks have either been kept or removed due to direct overlap with a more significant peak. This prevents the removal of peaks due to “daisy chaining” or indirect overlap and simultaneously maintains a compendium of fixed-width peaks. This resulted in a set of fixed-width peaks for each sample which we refer to here as a “sample peak set”.

@kopardev
Copy link
Member

kopardev commented Jan 2, 2024

Check this

@kopardev
Copy link
Member

@kelly-sovacool we can call these "corces.consensus" peaks and have them output when we have replicates. .. what is your opinion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants