-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allelic fraction of input variant #55
Comments
Greetings, there are a couple ways you could do this:
|
This is not clear to me: if If not, what is the simulated VAF when |
The simulator samples reads from a fixed number of sequences. i.e. the default is diploid, and so the simulator creates 2 copies of the reference genome, such that all inserted variation is inherently phased, and thus can be compared against for benchmarking studies. As such there is no notion of inserting variation with arbitrary allele frequency, it has to be some fraction corresponding to the percentage of alleles the variant is inserted into. AF fields of the input VCF file are not used. Instead you can use the GT field to explicitly specify which phase you would like to insert the variant into. E.g. if you wanted a SNP inserted into 10% of the reads, you could run the simulator with ploidy 10, and have a GT field of 1/0/0/0/0/0/0/0/0/0. Alternatively you could use the genreads-specific WP field: #36 (comment) Admittedly kind of a clumsy workaround, which is why mixing multiple datasets might be easier. |
Hello there! We are trying to simulate somatic variants at low frequencies (as low as 3-5%). I looked through the shell script where you had commands to generate variants at 10% (using 80x normal and 20x tumor). I am wondering why you are using
Then concatenate output_normal and output_tumor fastq files? Thanks so much for your help! This is a great tool and will be very helpful in some our analyses! |
Hey Teja, In that tutorial I was demonstrating a use case where the types of variants specific to the tumor are different than those in the simulated germline (e.g. maybe the tumor variants have different trinucleotide biases, or differ the the frequency of indels vs. snvs, etc). Indeed, you could also use those commands you provided and could achieve the desired result with respect to variant allele fraction, but the statistics underlying the types of variants generated for the tumor/germline fractions would be the same (which I imagine is probably fine for most use cases). Also, the ownership of this project has changed and is being kept up-to-date by a team at NCSA: https://github.com/ncsa/NEAT |
Awesome! Thanks Zach!! |
Is there a way to specify the allelic fraction of the simulated variant? Suppose I want to create a fastq with allelic fraction of 0.1 and another with 0.9 to test my algorithm, how can it be achieve using neat?
Thanks,
Pratik
The text was updated successfully, but these errors were encountered: