update readme

wbaopaul · Feb 28, 2022 · 2545e8e · 2545e8e
1 parent c65009b
commit 2545e8e
Showing 1 changed file with 27 additions and 27 deletions.
diff --git a/README.md b/README.md
@@ -36,7 +36,7 @@ scATAC-pro consists of two units, the data processing unit and the downstream an
 Installation
 ------------
 
--   Note: It is not necessary to install scATAC-pro from scratch. You can use the docker or singularity version if you prefer (see [Run scATAC-pro through docker or singularity](#run-scATAC-pro-through-docker-or-singularity) )
+-   Note: It is not necessary to install scATAC-pro from scratch. You can use the docker or singularity version if your system support (see [Run scATAC-pro through docker or singularity](#run-scATAC-pro-through-docker-or-singularity) )
 -   Run the following command in your terminal, scATAC-pro will be installed in YOUR\_INSTALL\_PATH/scATAC-pro\_1.4.3
 
 <!-- -->
@@ -51,7 +51,7 @@ Updates
 - Now provide [scATAC-pro tutorial in R](https://scatacpro-in-r.netlify.app/index.html) for access QC metrics and perform downstream analysis
 - Current version: 1.4.3
 - Highlighted updates
-    * New module *reprocess_cellranger_output* added, to reprocess 10x scATAC-seq data (including atac in 10x multiome assay) originally processed by cellranger, taking cellranger processed .bam and .fragments.tsv.gz files as input (v1.4.3)
+    * **New module *reprocess_cellranger_output* added, to reprocess 10x scATAC-seq data (including atac in 10x multiome assay) originally processed by cellranger, taking cellranger processed .bam and .fragments.tsv.gz files as input (v1.4.3)**
     * More friendly to single-end sequencing data (v1.4.2)
     * New module *labelTransfer* added, to do label trasfer (for cell annotation) from cell annotation of scRNA-seq data. First construct a gene by cell activity matrix, then use *FindTransferAnchors* and *TransferData* function from Seurat R package to predicted cell type annotation from the cell annotaiton in scRNA-seq data (v1.4.0)
     * New module *rmDoublets* added,to remove potential doublets using [DoubletFinder](https://github.com/chris-mcginnis-ucsf/DoubletFinder) algorithm (v1.3.1)
@@ -98,21 +98,24 @@ Dependencies
 One command for many
 -----------
 
--   **IMPORTANT**: The parameters and options should be specified in a configurartion file in plain text format. Copy and edit the configure\_user.txt file in this repository and then in your terminal run the following commands:
+-   **Input**: 
+    -   fastq files for pair-end1 reads(pe1_fastq.gz), pair-end2 reads(pe2_fastq_gz) and cell barcords (index_fastq.gz) 
 
-- **NOTE**: some large mapping index and genome annotation files can be downloaded [here](https://chopri.box.com/s/dlqybg6agug46obiu3mhevofnq4vit4t)
+    -   **for data generated by 10x, you can just speficy the path to each FASTQ files folder per sample**
 
-- To access QC metrics and perform downstream analysis in R, see [scATAC-pro tutorial in R](https://scatacpro-in-r.netlify.app/index.html) 
+-   **IMPORTANT**: The parameters and options should be specified in a configurartion file in plain text format. Copy and edit the *configure\_user.txt* file in this repository and then in your terminal run the following commands:
 
 ```
     $ scATAC-pro -s process 
-                 -i pe1_fastq,pe2_fastq,index_fastq 
+                 -i pe1.fastq.gz,pe2.fastq.gz,index.fastq.gz(,other_index_fastq.gz) 
                  -c configure_user.txt 
 
     $ scATAC-pro -s downstream 
                  -i output/filtered_matrix/PEAK_CALLER/CELL_CALLER/matrix.mtx (or matrix.rds) 
                  -c configure_user.txt
+
     ## PEAK_CALLER and CELL_CALLER is specified in your configure_user.txt file
+
 ```
 
 -   If fastq files are generated using 10x genomics platform, you can just specify the path to fastq folder for a sample:
@@ -125,34 +128,34 @@ One command for many
 
 -   For data processing, if fastq files have been demultiplexed as the required format with the barcode recorded in the name of each read as @barcode:ORIGIN\_READ\_NAME , you can skip the demultiplexing step by running the following command:
 
+```
     $ scATAC-pro -s process_no_dex 
                  -i pe1_fastq,pe2_fastq
                  -c configure_user.txt 
+```
 
 -   To reprocess data originally processed by cellranger:
 
 ```
     $ scATAC-pro -s reprocess_cellranger_output
-                 -i path.to.cellranger.generated.bam,path.to.cellranger.generated.fragments.tsv.gz
+                 -i cellranger_generated.bam_file,cellranger_generated_fragments.tsv.gz_file
                  -c configure_user.txt
 
 ```
 
--   The **output** will be saved under ./output as default
--   --verbose (or -b) will print the running message on screen, otherwise the message will only be saved under output/logs/MODULE.txt
+- **NOTE**: 
+  - Some large mapping index and genome annotation files can be downloaded [here](https://chopri.box.com/s/dlqybg6agug46obiu3mhevofnq4vit4t)
+  - The **output** will be saved under ./output as default
+  - --verbose (or -b) will print the running message on screen, otherwise the message will only be saved under output/logs/MODULE.txt
+  - To access QC metrics and perform downstream analysis in R, see [scATAC-pro tutorial in R](https://scatacpro-in-r.netlify.app/index.html) 
 
 
 Step by step guide to running scATAC-pro
 ---------------------------
 
 -   **IMPORTANT**: you can run scATAC-pro sequentially. The input of a later analysis module is the output of the previous analysis modules. The following tutorial uses fastq files downloaded from [PBMC10k 10X Genomics](https://support.10xgenomics.com/single-cell-atac/datasets/1.1.0/atac_v1_pbmc_10k?) 
 
--   *Input*: 
-    -   fastq files for pair-end1 reads(pe1_fastq.gz), pair-end2 reads(pe2_fastq_gz) and cell barcords (index_fastq.gz) 
-
-    -   **for data generated by 10x, you can just speficy the path to each FASTQ files folder per sample**
-
--   *Run scATAC-pro sequentially* (specify PEAK_CALLER = MACS2 and CELL_CALLER = FILTER or other values in the configure_user.txt file)
+-   <u>Run scATAC-pro sequentially (specifyi PEAK_CALLER = MACS2 and CELL_CALLER = FILTER or other values in the configure_user.txt file) </u>
 
 ```
     $ scATAC-pro -s demplx_fastq 
@@ -168,7 +171,6 @@ Step by step guide to running scATAC-pro
                     output/demplxed_fastq/pbmc10k.demplxed.PE2.fastq.gz
                  -c configure_user.txt 
 
-
     $ scATAC-pro -s mapping 
                   -i output/trimmed_fastq/pbmc10k.trimmed.demplxed.PE1.fastq.gz,
                      output/trimmed_fastq/pbmc10k.trimmed.demplxed.PE2.fastq.gz,
@@ -183,11 +185,11 @@ Step by step guide to running scATAC-pro
                  -c configure_user.txt 
                  
     $ scATAC-pro -s get_mtx 
-                 -i output/summary/pbmc10k.fragments.tsv.gz,output/peaks/MACS2/pbmc10k_features_BlacklistRemoved.bed 
+                 -i output/summary/pbmc10k.fragments.tsv.gz,output/peaks/PEAK_CALLER/pbmc10k_features_BlacklistRemoved.bed 
                  -c configure_user.txt 
 
     $ scATAC-pro -s qc_per_barcode 
-                 -i output/summary/pbmc10k.fragments.tsv.gz,output/peaks/MACS2/pbmc10k_features_BlacklistRemoved.bed 
+                 -i output/summary/pbmc10k.fragments.tsv.gz,output/peaks/PEAK_CALLER/pbmc10k_features_BlacklistRemoved.bed 
                  -c configure_user.txt
 
     $ scATAC-pro -s call_cell
@@ -473,10 +475,10 @@ $ singularity exec --bind YOUR_BIND_PATH -H YOUR_WORK_PATH --cleanenv scatac-pro
 
 ```
 
-2. More commonly, use it on a HPC cluster:
+2. More commonly, use it on a HPC cluster, here is an example script for running mapping step in my case (please change the file paths to yours): 
+  - write a script mapping.sh with something essially like this:
 
 ```
-# write a script mapping.sh with something essially like this:
 #!/bin/bash
 module load singularity ## load singularity in your system
 
@@ -487,21 +489,19 @@ singularity pull -F docker://wbaopaul/scatac-pro:latest  ## just need run this l
 singularity exec --bind /mnt/isilon/ --cleanenv -H /mnt/isilon/tan_lab/yuw1/run_scATAC-pro/PBMC10k scatac-pro_latest.sif \ 
 scATAC-pro -s mapping -i fastq_PE1_file,fastq_PE2_file -c configure_user.txt
 
-## and then sumbit your job on HPC (e.g. qsub or sbatch mapping.sh)
-
 ```
+  - then sumbit your job on your HPC (e.g. qsub or sbatch mapping.sh)
 
-- **NOTE**: YOUR_WORK_PATH is your working directory, where the outputs will be saved 
-
-- **NOTE**: All inputs including data paths specified in configure_user.txx should be accessible under YOUR_BIND_PATH
+  - **NOTE**: 
+    - YOUR_WORK_PATH is your working directory, where the outputs will be saved 
 
-- **NOTE**: if running the *footprint* module, remember to download the reference data [rgtdata](https://chopri.box.com/s/dlqybg6agug46obiu3mhevofnq4vit4t) folder into YOUR_WROK_PATH
+    - All inputs including data paths specified in configure_user.txt should be accessible under YOUR_BIND_PATH
 
+    - NOTE: if running the *footprint* module, remember to download the reference data [rgtdata](https://chopri.box.com/s/dlqybg6agug46obiu3mhevofnq4vit4t) folder into YOUR_WROK_PATH
 
 [Access QC in R](https://scatacpro-in-r.netlify.app/qc_in_r)
 ---------------------------------------
 
-
 [Downstream Analysis in R](https://scatacpro-in-r.netlify.app/downstream_in_r)
 --------------------------------------