During the intense use of these scripts in the past year, several bugs related to automatic parameter setting in STARsolo were discovered. Release 3.1 aims to fix all of those. Namely,
- problems that arose when read 1 is sequenced to a length different than what's expected for a given BC + UMI. This can cause serious batch effects for samples processed assuming incorrect UMI length. New logic avoids this, resetting the UMI parameter whenever necessary.
- problems in detection of read strand-specificity (e.g. 3' vs. 5' experiments) in
starsolo_10x_auto.sh
. Datasets with low mapping rates caused many issues here; thus, a more conservative (and, hopefully, robust) approach was chosen. - change in logic of how 200k test reads are selected from the fastq files. Current release takes top 1M reads from each fastq file, and then subsamples 200k out of those using
seqtk
. This should be faster than previous approaches, and also more robust to some corner cases (e.g. subsampling from a particularly bad fastq file). - numerous minor fixes and updates.