We used Kundaje pipeline for aligning pair ended reads to hg19 and removing duplicates as scABC.
bds_scr [SCR_NAME] atac.bds -align -species hg19 -species_file [SPECIES_FILE_PATH] -nth [NUM_THREADS] -fastq1_1 [READ_PAIR1] -fastq1_2 [READ_PAIR2]
The resulting bam files were merged using samtools.
samtools merge [AGGREGATE_BAM] *.trim.PE2SE.nodup.bam
The merged bam file was then used as input into MACS2 to call merged peaks for later analysis.
bds_scr [SCR_NAME] atac.bds -species hg19 -species_file [SPECIES_FILE_PATH] -nth [NUM_THREADS] -se -filt_bam [AGGREGATE_BAM]
scATAC-seq data is required in peak count matrix as inputs. We can import scATAC-seq preprocessing function in scale module to preprocessing as scABC
from scale.utils import sample_filter, peak_filter, cell_filter
We filtered peaks that presented in >= 10 cells with >= 2 reads by peak_filter function
data = peak_filter(data)
We filtered cells that >= number of peaks / 50 by cell_filter function
data = cell_filter(data)
Or combine peak_filter and cell_filter
data = sample_filter(data)