Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allelic fraction of input variant #55

Open
pratikchandrani opened this issue Feb 7, 2019 · 6 comments
Open

Allelic fraction of input variant #55

pratikchandrani opened this issue Feb 7, 2019 · 6 comments

Comments

@pratikchandrani
Copy link

Is there a way to specify the allelic fraction of the simulated variant? Suppose I want to create a fastq with allelic fraction of 0.1 and another with 0.9 to test my algorithm, how can it be achieve using neat?

Thanks,
Pratik

@zstephens
Copy link
Owner

Greetings, there are a couple ways you could do this:

  1. Specify a ploidy of 10, random mutations would then mostly be in the VAF=0.1 neighborhood.

  2. Create two datasets with different coverages and combine them. See this file for an example of how that could be accomplished: https://github.com/zstephens/neat-genreads/blob/master/models/genReadsTumorTutorial.zip

@MikeWLloyd
Copy link

This is not clear to me: if -v *.VCF input has an AF specified, does neat-genreads respect/simulate that frequency?

If not, what is the simulated VAF when -v is given?

@zstephens
Copy link
Owner

The simulator samples reads from a fixed number of sequences. i.e. the default is diploid, and so the simulator creates 2 copies of the reference genome, such that all inserted variation is inherently phased, and thus can be compared against for benchmarking studies. As such there is no notion of inserting variation with arbitrary allele frequency, it has to be some fraction corresponding to the percentage of alleles the variant is inserted into.

AF fields of the input VCF file are not used. Instead you can use the GT field to explicitly specify which phase you would like to insert the variant into. E.g. if you wanted a SNP inserted into 10% of the reads, you could run the simulator with ploidy 10, and have a GT field of 1/0/0/0/0/0/0/0/0/0. Alternatively you could use the genreads-specific WP field: #36 (comment)

Admittedly kind of a clumsy workaround, which is why mixing multiple datasets might be easier.

@tkoganti
Copy link

tkoganti commented Mar 9, 2022

Greetings, there are a couple ways you could do this:

  1. Specify a ploidy of 10, random mutations would then mostly be in the VAF=0.1 neighborhood.
  2. Create two datasets with different coverages and combine them. See this file for an example of how that could be accomplished: https://github.com/zstephens/neat-genreads/blob/master/models/genReadsTumorTutorial.zip

Hello there! We are trying to simulate somatic variants at low frequencies (as low as 3-5%). I looked through the shell script where you had commands to generate variants at 10% (using 80x normal and 20x tumor). I am wondering why you are using genMutModel.py script at all? Can you just generate a normal sample without any variants at 80x and use mutations_tumor.vcf to generate tumor sample at 20x and then combine both?

python neat-genreads/genReads.py -r chr1_subset.fa -R 101 --pe 300 30 -c 80 -M 0.002 -o output_normal --vcf

python neat-genreads/genReads.py -r chr1_subset.fa -R 101 --pe 300 30 -c 20  -o output_tumor -v mutations_tumor.vcf   --vcf

Then concatenate output_normal and output_tumor fastq files?

Thanks so much for your help! This is a great tool and will be very helpful in some our analyses!

@zstephens
Copy link
Owner

Hey Teja,

In that tutorial I was demonstrating a use case where the types of variants specific to the tumor are different than those in the simulated germline (e.g. maybe the tumor variants have different trinucleotide biases, or differ the the frequency of indels vs. snvs, etc). Indeed, you could also use those commands you provided and could achieve the desired result with respect to variant allele fraction, but the statistics underlying the types of variants generated for the tumor/germline fractions would be the same (which I imagine is probably fine for most use cases).

Also, the ownership of this project has changed and is being kept up-to-date by a team at NCSA: https://github.com/ncsa/NEAT

@tkoganti
Copy link

Awesome! Thanks Zach!!
I will post any new questions on NCSA site

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants