Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
wbaopaul committed Feb 28, 2022
1 parent c65009b commit 2545e8e
Showing 1 changed file with 27 additions and 27 deletions.
54 changes: 27 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ scATAC-pro consists of two units, the data processing unit and the downstream an
Installation
------------

- Note: It is not necessary to install scATAC-pro from scratch. You can use the docker or singularity version if you prefer (see [Run scATAC-pro through docker or singularity](#run-scATAC-pro-through-docker-or-singularity) )
- Note: It is not necessary to install scATAC-pro from scratch. You can use the docker or singularity version if your system support (see [Run scATAC-pro through docker or singularity](#run-scATAC-pro-through-docker-or-singularity) )
- Run the following command in your terminal, scATAC-pro will be installed in YOUR\_INSTALL\_PATH/scATAC-pro\_1.4.3

<!-- -->
Expand All @@ -51,7 +51,7 @@ Updates
- Now provide [scATAC-pro tutorial in R](https://scatacpro-in-r.netlify.app/index.html) for access QC metrics and perform downstream analysis
- Current version: 1.4.3
- Highlighted updates
* New module *reprocess_cellranger_output* added, to reprocess 10x scATAC-seq data (including atac in 10x multiome assay) originally processed by cellranger, taking cellranger processed .bam and .fragments.tsv.gz files as input (v1.4.3)
* **New module *reprocess_cellranger_output* added, to reprocess 10x scATAC-seq data (including atac in 10x multiome assay) originally processed by cellranger, taking cellranger processed .bam and .fragments.tsv.gz files as input (v1.4.3)**
* More friendly to single-end sequencing data (v1.4.2)
* New module *labelTransfer* added, to do label trasfer (for cell annotation) from cell annotation of scRNA-seq data. First construct a gene by cell activity matrix, then use *FindTransferAnchors* and *TransferData* function from Seurat R package to predicted cell type annotation from the cell annotaiton in scRNA-seq data (v1.4.0)
* New module *rmDoublets* added,to remove potential doublets using [DoubletFinder](https://github.com/chris-mcginnis-ucsf/DoubletFinder) algorithm (v1.3.1)
Expand Down Expand Up @@ -98,21 +98,24 @@ Dependencies
One command for many
-----------

- **IMPORTANT**: The parameters and options should be specified in a configurartion file in plain text format. Copy and edit the configure\_user.txt file in this repository and then in your terminal run the following commands:
- **Input**:
- fastq files for pair-end1 reads(pe1_fastq.gz), pair-end2 reads(pe2_fastq_gz) and cell barcords (index_fastq.gz)

- **NOTE**: some large mapping index and genome annotation files can be downloaded [here](https://chopri.box.com/s/dlqybg6agug46obiu3mhevofnq4vit4t)
- **for data generated by 10x, you can just speficy the path to each FASTQ files folder per sample**

- To access QC metrics and perform downstream analysis in R, see [scATAC-pro tutorial in R](https://scatacpro-in-r.netlify.app/index.html)
- **IMPORTANT**: The parameters and options should be specified in a configurartion file in plain text format. Copy and edit the *configure\_user.txt* file in this repository and then in your terminal run the following commands:

```
$ scATAC-pro -s process
-i pe1_fastq,pe2_fastq,index_fastq
-i pe1.fastq.gz,pe2.fastq.gz,index.fastq.gz(,other_index_fastq.gz)
-c configure_user.txt
$ scATAC-pro -s downstream
-i output/filtered_matrix/PEAK_CALLER/CELL_CALLER/matrix.mtx (or matrix.rds)
-c configure_user.txt
## PEAK_CALLER and CELL_CALLER is specified in your configure_user.txt file
```

- If fastq files are generated using 10x genomics platform, you can just specify the path to fastq folder for a sample:
Expand All @@ -125,34 +128,34 @@ One command for many

- For data processing, if fastq files have been demultiplexed as the required format with the barcode recorded in the name of each read as @barcode:ORIGIN\_READ\_NAME , you can skip the demultiplexing step by running the following command:

```
$ scATAC-pro -s process_no_dex
-i pe1_fastq,pe2_fastq
-c configure_user.txt
```

- To reprocess data originally processed by cellranger:

```
$ scATAC-pro -s reprocess_cellranger_output
-i path.to.cellranger.generated.bam,path.to.cellranger.generated.fragments.tsv.gz
-i cellranger_generated.bam_file,cellranger_generated_fragments.tsv.gz_file
-c configure_user.txt
```

- The **output** will be saved under ./output as default
- --verbose (or -b) will print the running message on screen, otherwise the message will only be saved under output/logs/MODULE.txt
- **NOTE**:
- Some large mapping index and genome annotation files can be downloaded [here](https://chopri.box.com/s/dlqybg6agug46obiu3mhevofnq4vit4t)
- The **output** will be saved under ./output as default
- --verbose (or -b) will print the running message on screen, otherwise the message will only be saved under output/logs/MODULE.txt
- To access QC metrics and perform downstream analysis in R, see [scATAC-pro tutorial in R](https://scatacpro-in-r.netlify.app/index.html)


Step by step guide to running scATAC-pro
---------------------------

- **IMPORTANT**: you can run scATAC-pro sequentially. The input of a later analysis module is the output of the previous analysis modules. The following tutorial uses fastq files downloaded from [PBMC10k 10X Genomics](https://support.10xgenomics.com/single-cell-atac/datasets/1.1.0/atac_v1_pbmc_10k?)

- *Input*:
- fastq files for pair-end1 reads(pe1_fastq.gz), pair-end2 reads(pe2_fastq_gz) and cell barcords (index_fastq.gz)

- **for data generated by 10x, you can just speficy the path to each FASTQ files folder per sample**

- *Run scATAC-pro sequentially* (specify PEAK_CALLER = MACS2 and CELL_CALLER = FILTER or other values in the configure_user.txt file)
- <u>Run scATAC-pro sequentially (specifyi PEAK_CALLER = MACS2 and CELL_CALLER = FILTER or other values in the configure_user.txt file) </u>

```
$ scATAC-pro -s demplx_fastq
Expand All @@ -168,7 +171,6 @@ Step by step guide to running scATAC-pro
output/demplxed_fastq/pbmc10k.demplxed.PE2.fastq.gz
-c configure_user.txt
$ scATAC-pro -s mapping
-i output/trimmed_fastq/pbmc10k.trimmed.demplxed.PE1.fastq.gz,
output/trimmed_fastq/pbmc10k.trimmed.demplxed.PE2.fastq.gz,
Expand All @@ -183,11 +185,11 @@ Step by step guide to running scATAC-pro
-c configure_user.txt
$ scATAC-pro -s get_mtx
-i output/summary/pbmc10k.fragments.tsv.gz,output/peaks/MACS2/pbmc10k_features_BlacklistRemoved.bed
-i output/summary/pbmc10k.fragments.tsv.gz,output/peaks/PEAK_CALLER/pbmc10k_features_BlacklistRemoved.bed
-c configure_user.txt
$ scATAC-pro -s qc_per_barcode
-i output/summary/pbmc10k.fragments.tsv.gz,output/peaks/MACS2/pbmc10k_features_BlacklistRemoved.bed
-i output/summary/pbmc10k.fragments.tsv.gz,output/peaks/PEAK_CALLER/pbmc10k_features_BlacklistRemoved.bed
-c configure_user.txt
$ scATAC-pro -s call_cell
Expand Down Expand Up @@ -473,10 +475,10 @@ $ singularity exec --bind YOUR_BIND_PATH -H YOUR_WORK_PATH --cleanenv scatac-pro
```

2. More commonly, use it on a HPC cluster:
2. More commonly, use it on a HPC cluster, here is an example script for running mapping step in my case (please change the file paths to yours):
- write a script mapping.sh with something essially like this:

```
# write a script mapping.sh with something essially like this:
#!/bin/bash
module load singularity ## load singularity in your system
Expand All @@ -487,21 +489,19 @@ singularity pull -F docker://wbaopaul/scatac-pro:latest ## just need run this l
singularity exec --bind /mnt/isilon/ --cleanenv -H /mnt/isilon/tan_lab/yuw1/run_scATAC-pro/PBMC10k scatac-pro_latest.sif \
scATAC-pro -s mapping -i fastq_PE1_file,fastq_PE2_file -c configure_user.txt
## and then sumbit your job on HPC (e.g. qsub or sbatch mapping.sh)
```
- then sumbit your job on your HPC (e.g. qsub or sbatch mapping.sh)

- **NOTE**: YOUR_WORK_PATH is your working directory, where the outputs will be saved

- **NOTE**: All inputs including data paths specified in configure_user.txx should be accessible under YOUR_BIND_PATH
- **NOTE**:
- YOUR_WORK_PATH is your working directory, where the outputs will be saved

- **NOTE**: if running the *footprint* module, remember to download the reference data [rgtdata](https://chopri.box.com/s/dlqybg6agug46obiu3mhevofnq4vit4t) folder into YOUR_WROK_PATH
- All inputs including data paths specified in configure_user.txt should be accessible under YOUR_BIND_PATH

- NOTE: if running the *footprint* module, remember to download the reference data [rgtdata](https://chopri.box.com/s/dlqybg6agug46obiu3mhevofnq4vit4t) folder into YOUR_WROK_PATH

[Access QC in R](https://scatacpro-in-r.netlify.app/qc_in_r)
---------------------------------------


[Downstream Analysis in R](https://scatacpro-in-r.netlify.app/downstream_in_r)
--------------------------------------

Expand Down

0 comments on commit 2545e8e

Please sign in to comment.