Allelic fraction of input variant #55

pratikchandrani · 2019-02-07T12:21:06Z

Is there a way to specify the allelic fraction of the simulated variant? Suppose I want to create a fastq with allelic fraction of 0.1 and another with 0.9 to test my algorithm, how can it be achieve using neat?

Thanks,
Pratik

zstephens · 2019-02-07T16:21:00Z

Greetings, there are a couple ways you could do this:

Specify a ploidy of 10, random mutations would then mostly be in the VAF=0.1 neighborhood.
Create two datasets with different coverages and combine them. See this file for an example of how that could be accomplished: https://github.com/zstephens/neat-genreads/blob/master/models/genReadsTumorTutorial.zip

MikeWLloyd · 2019-03-11T14:43:42Z

This is not clear to me: if -v *.VCF input has an AF specified, does neat-genreads respect/simulate that frequency?

If not, what is the simulated VAF when -v is given?

zstephens · 2019-03-11T22:08:31Z

The simulator samples reads from a fixed number of sequences. i.e. the default is diploid, and so the simulator creates 2 copies of the reference genome, such that all inserted variation is inherently phased, and thus can be compared against for benchmarking studies. As such there is no notion of inserting variation with arbitrary allele frequency, it has to be some fraction corresponding to the percentage of alleles the variant is inserted into.

AF fields of the input VCF file are not used. Instead you can use the GT field to explicitly specify which phase you would like to insert the variant into. E.g. if you wanted a SNP inserted into 10% of the reads, you could run the simulator with ploidy 10, and have a GT field of 1/0/0/0/0/0/0/0/0/0. Alternatively you could use the genreads-specific WP field: #36 (comment)

Admittedly kind of a clumsy workaround, which is why mixing multiple datasets might be easier.

tkoganti · 2022-03-09T21:23:28Z

Greetings, there are a couple ways you could do this:

Specify a ploidy of 10, random mutations would then mostly be in the VAF=0.1 neighborhood.

Create two datasets with different coverages and combine them. See this file for an example of how that could be accomplished: https://github.com/zstephens/neat-genreads/blob/master/models/genReadsTumorTutorial.zip

Hello there! We are trying to simulate somatic variants at low frequencies (as low as 3-5%). I looked through the shell script where you had commands to generate variants at 10% (using 80x normal and 20x tumor). I am wondering why you are using genMutModel.py script at all? Can you just generate a normal sample without any variants at 80x and use mutations_tumor.vcf to generate tumor sample at 20x and then combine both?

python neat-genreads/genReads.py -r chr1_subset.fa -R 101 --pe 300 30 -c 80 -M 0.002 -o output_normal --vcf

python neat-genreads/genReads.py -r chr1_subset.fa -R 101 --pe 300 30 -c 20  -o output_tumor -v mutations_tumor.vcf   --vcf

Then concatenate output_normal and output_tumor fastq files?

Thanks so much for your help! This is a great tool and will be very helpful in some our analyses!

zstephens · 2022-03-09T22:45:32Z

Hey Teja,

In that tutorial I was demonstrating a use case where the types of variants specific to the tumor are different than those in the simulated germline (e.g. maybe the tumor variants have different trinucleotide biases, or differ the the frequency of indels vs. snvs, etc). Indeed, you could also use those commands you provided and could achieve the desired result with respect to variant allele fraction, but the statistics underlying the types of variants generated for the tumor/germline fractions would be the same (which I imagine is probably fine for most use cases).

Also, the ownership of this project has changed and is being kept up-to-date by a team at NCSA: https://github.com/ncsa/NEAT

tkoganti · 2022-03-10T14:17:45Z

Awesome! Thanks Zach!!
I will post any new questions on NCSA site

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allelic fraction of input variant #55

Allelic fraction of input variant #55

pratikchandrani commented Feb 7, 2019

zstephens commented Feb 7, 2019

MikeWLloyd commented Mar 11, 2019

zstephens commented Mar 11, 2019

tkoganti commented Mar 9, 2022

zstephens commented Mar 9, 2022

tkoganti commented Mar 10, 2022

Allelic fraction of input variant #55

Allelic fraction of input variant #55

Comments

pratikchandrani commented Feb 7, 2019

zstephens commented Feb 7, 2019

MikeWLloyd commented Mar 11, 2019

zstephens commented Mar 11, 2019

tkoganti commented Mar 9, 2022

zstephens commented Mar 9, 2022

tkoganti commented Mar 10, 2022