Skip to content

Latest commit

 

History

History
276 lines (134 loc) · 10.2 KB

readme.md

File metadata and controls

276 lines (134 loc) · 10.2 KB

Target

  • Build a large single cell gastric atlas
    • to capture the variability present, such as tumor location, tissue type, in the population with bulk, single cell and omics.
    • to find undiscovered cell types.
    • Investigate the heterogeneity, define new modules
    • to provide a large dataset for algorithm development.
  • Add more information in the future
    • spatial data
    • multiomics in bulk and single cell

If you have interests in mining the large dataset and share your code to accelerate and facilitate the reproducible work. Please fell free to contact with me: [email protected]

Ref

Several aspects can be achieved.

The basic pipeline of this article ref from:

The atlas level article:

Cell interaction:

sub celltype identification and propose novel prognostic signature:

Microenvironment difference in different clinical group:

Benchmark test:

Database website for multiomics with bulk and single cell:

Learn single cell:

Because It's more efficient to run Python to process the large scale single cell data, I would recommend you to use Scanpy rather than Seurat.

Little advice for novice

Use data: GEO Accession viewer (nih.gov) From Parallel single-cell and bulk transcriptome analyses reveal key features of the gastric tumor microenvironment - PubMed (nih.gov) with annotated 111,140 cells

output

And imitate Integrated Analysis Highlights the Immunosuppressive Role of TREM2+ Macrophages in Hepatocellular Carcinoma - PubMed (nih.gov) try to warm you up.

Data Info

The details of datasets and sample info can be accessed in Article/Dataset and Article/Sample files.

10 dataset with raw fq:

  • Cells:
    • unfiltered: 1,092,591
    • filtered: 834,360
  • Samples:
    • 125 ( primary cancers, adjacent lesion)
    • 17 pbmc
    • 9 precancer
    • 4 health donor
    • 2 tumor infiltrated lymphocyte

output

anno_heatmap1

anno_heatmap2

116 samples with processed mtx including:

  • 116 ( primary cancers + adjacent lesion)
  • 10 metastasis
  • 4 pbmc
  • 1 precancer

Single cell obj

I have rerun the 10x cellranger 7.1 using Ensembl84 to reproduced the raw mtx for the samples with raw fq data(totally 11 datasets). Besides, six projects provide processed mtx or obj. The details are described in Article/Dataset and Article/Sample files.

I have conducted the basic filter and integration job to got the basic annotation result, totally 834,360 cells after filter.

image-20240229134749354

  • 02/raw: merged without filter
  • 02/filtered: filtered without integrated
  • 02/integrated: integrated and annotated

You can contact me to get the result.

Code Info

Envir Install

mamba create -n sc_py3 python=3.8 python-igraph leidenalg ipykernel -y
pip install 'scanpy>1.8.1'
mamba install -y r-base r-seurat

file Structure

  • 01-Ref_Preprocess.ipynb

Three datasets with processed data and annotation were collected.

The performance of scale(center data with zero mean and 1 standard deviation) and unscale in bio conservation and batch correction is also examined.

  • 02-Preprocessed.ipynb

filter cells through:

cell quality control: 1) gene counts < 6500, mRNA counts > 200, mitochondrial mRNA counts percentage < 15%; 2) gene counts and mRNA counts both in their 5 MAD (median absolute deviations) ranges; 3) remove doublets by Scrublet with default parameters. 4) We further filtered out samples with less than 500 cells remaining. Though excluded from the data integration, these samples are marked as low quality

Using the first 50 principal components, a neighborhood graph was calculated with the number of neighbors set to k = 30. Batch was removed by harmony. Data were subsequently clustered with Louvain clustering at a resolution of r = 0.25.

Major celltypes are annotated by marker-based, reference-based and manual annotation.

Outside package detail

  • sceasy

cellgeni/sceasy: A package to help convert different single-cell data formats to each other (github.com)

JiekaiLab/scDIOR: scDIOR: Single cell data IO softwaRe (github.com)

Bug

  • scib_metrics
pip install scib_metrics
from scib_metrics.benchmark import Benchmarker

AttributeError: module 'jax.random' has no attribute 'KeyArray'

Ref: theislab/scib: Benchmarking analysis of data integration tools (github.com)

pip install --upgrade jax jaxlib chex
# useless
pip install ml_dtypes==0.2.0 jax==0.4.13 jaxlib==0.4.11


  • no graph in jupyter notebook output
%matplotlib inline
  • unrar remain structure
nohup unrar x Raw.rar &

command line - unrar nested folder in ubuntu strange behaviour - Ask Ubuntu

  • For instance, if you have an object created from a class in a module and then you change the class definition in the module's source file, re-importing the module won't change the behavior or structure of the existing object. The object will continue to behave according to the class definition that was in place at the time it was created.

you should change the python file name to reload a module.

  • TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

TypeError: metaclass conflict: matplotlib v3.7.0 is incompatible with scanpy · Issue #2411 · scverse/scanpy (github.com)

pip install 'matplotlib<3.7'
pip install scanpy>1.8.1 
  • scanpy rank_genes_groups AttributeError: module 'numba' has no attribute 'core'

uninstall scanpy, then install through mamba:

mamba install conda-forge::scanpy -y