Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRCH38 problems #86

Open
popicka opened this issue Nov 23, 2020 · 3 comments
Open

GRCH38 problems #86

popicka opened this issue Nov 23, 2020 · 3 comments
Assignees

Comments

@popicka
Copy link

popicka commented Nov 23, 2020

Hi,

We have been using Neat to successfully generate reads with hg19 reference.
However, when using GRCh38 with alternative sequences, plus decoys and HLA, we stumble upon different problems.

  1. In order to properly use computeGC.py we have removed the strings after the first space from the original fasta as suggested here. ComputeGC was able to generate the model after this.
  2. We used neat-genreads
    This is the command line that was used:
    python genReads.py --gc-model gc_model.p -t target.bed --pe-model fraglen.p -r grch38.fasta -o wes_normal_150x -v variants.vcf -R 101 -c 150 -M 0.1 --rng 10 --vcf --bam --job 8 8

and got this error
/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py:2909: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)

Do you maybe know what could have caused this error? When running neat-genreads should we use the original fasta or the fasta file used for computeGC? Are there any known bugs related to GRCH38 reference?

Thank you,
Ana

@joshfactorial joshfactorial self-assigned this Nov 23, 2020
@zstephens
Copy link
Owner

Greetings, was there more to that error? I don't see a line number to where in genReads.py it was going wrong.

I wonder if you would be able to share your target bed file? I think the code might expect it to very organized (sorted, no overlapping regions) and things can go wrong if it's not.

@popicka
Copy link
Author

popicka commented Jan 10, 2021

Hi, sorry for the delayed response. Here are all of the BED files that we used. With all of these BED files we get the same error.
S07604514_Regions_v6_hg38_Targets.txt
S31285117_Regions_v7_hg38_Targets.txt
S07604514_Regions_v6_hg38_Targets_removed_additional_info.txt
S04380110_Regions_v5_hg38_Targets.txt
S04380110_Regions_v5_hg38_Targets_removed_additional_info.txt
S04380110_Regions_v5_hg38_Targets_removed_additional_info.sorted.txt

Interestingly, we also have a problem with GRCH38 WGS simulations, where NEAT breaks without reporting an error. Thank you!

@zstephens
Copy link
Owner

I was able to successfully simulate small test datasets using those bed files (minus the header lines) and a subset of chr1. So something must be going wrong later down the line. Do you have a log file of where the simulation was failing? E.g. which chromosome and position did it make it to before dying?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants