Skip to content

Latest commit

 

History

History

data_loading

Load Data

Single-cell RNAseq data is sparse and the code in this section loads this data into a "Tidy Data" table schema with (at a minimum) columns for cell identifier, gene name, and transcript count.

The general steps in this process are:

  1. Convert source data to long, sparse format.
  • For example given a CSV file in dense matrix format, such as:
,cell1,cell2,cell3
gene1,0.0,0.0,3.0
gene2,0.0,0.0,0.0
gene3,1.0,0.0,2.0
  • reshape it as a tidy data CSV.
cell,gene,trans_cnt
cell1,gene3,1.0
cell3,gene1,3.0
cell3,gene3,2.0
  1. Load the reshaped data to BigQuery.
bq --project PROJECT-ID load --autodetect DATASET-NAME.TABLE-NAME \
  gs://BUCKET-NAME/PATH/TO/LONG/SPARSE/FILE.csv

Here are instructions to load specific datasets, each demonstrating a different technique: