Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R0, R1, and King values out of expected range #1

Open
nomascus opened this issue Nov 15, 2018 · 1 comment
Open

R0, R1, and King values out of expected range #1

nomascus opened this issue Nov 15, 2018 · 1 comment

Comments

@nomascus
Copy link

Hi rwaples,

I've been trying to generate pairwise relatedness plots using ANGSD as described in the bioarxiv paper, but I am encountering problems in that my values display the same general distribution as in figure S4, but the values deviate substantially. Roughly, R0 = 0 - 3; R1 = 0.1 - 0.4; King = -0.6 - 0.125.

screen shot 2018-11-15 at 12 20 20 pm

This is a non-model organism, and we built a de novo reference genome. I used a SNP set that we made from only high depth tissue samples and added some low depth individuals in this run, but based the run on the preexisting SNP set. In general I am trying to be pretty strict with the filtration here, and I've tried I've tried playing around with different settings to no avail.

Do you have any advice? Here is my code below.

  1. Generate Genotype likelihoods for each scaffold for 13 individuals

angsd -bam BamList.txt -P 4 -ref ReferenceGenome.fna -out GLF_scaffold1 -uniqueOnly 1 -remove_bads 1 -only_proper_pairs 1 -trim 0 -C 50 -baq 1 -minQ 30 -minMapQ 30 -maxDepth 100 -GL 2 -doCounts 1 -sites SNPsVCF.txt -minind 13 -r Scaffold1 -doGlf 1 -doMajorMinor 1 -doMaf 1 -rmTriallelic 0.000002

  1. Concatenate GLFs for each scaffold
    cat *.glf.gz > complete.glf.gz

  2. Generate ibspair file
    /apps/ANGSD/0.916/misc/ibs -glf complete.glf.gz -nInd 13 -allpairs 1 -o complete

  3. Process with read_IBS.R
    Complete = 'complete.ibspair'
    do_derived_stats(read_ibspair_model0(Complete))
    ggplot(Complete, aes(x=R1, y=R0)) + geom_point
    ggplot(Complete, aes(x=R1, y=Kin)) + geom_point

@rwaples
Copy link
Owner

rwaples commented Nov 19, 2018

Hello nomascus,

Glad you are using the program. thanks for including a detailed accounting of the commands you ran.

My first guess (not knowing anything about the species or sampling scheme), would be that you have samples from different populations, these pairs would appear as less related (higher R0, lower kinship, lower R1) than a pair of samples from the same population. Does the pattern here make sense in that light? Differential admixture into the two samples can also affect these statistics, often affecting the stats in the same direction

If that doesn't seem to make sense, a few ideas:

  1. you can try removing the -C flag. From my understanding this is helpful when aligning to a distantly related genome, but can be problematic if you are aligning to a closely related one, such as here.
  2. you can try the SFS based method, using the alleles in the reference. This should work as long as you expect this reference allele to be segregating at each site.
  3. In the data I tested on, an upper depth filter didn't seem to matter, but it might here. maybe try -setMaxDepth

Do you see a bias / difference based on depth of the sample?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants