-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClumpakRerun Number of runs is not consistent between K's #22
Comments
Did you get any warnings printed to the terminal about -NaN values being found for log likelihood values? If so, that would probably be the cause since that will result in nothing being written to the ll_all.txt file for those replicates. I don't know why Admixture sometimes produces a -NaN for the log likelihood because the code is closed. Unfortunately my best recommendation would be to randomly delete 1-2 values (as appropriate) for the other K values if you want to use the bestK method from clumpak. |
Many thanks for the quick reply and advice Therefore, I checked my
However, there are corresponding converted Q files in the My conclusion is that the three replicates get lost somewhere in the Do you have any suggestion? |
The whole |
Q files are indeed present in the results.zip:
I re-ran submitClumpak.py redirecting outputs to a logfile; see clumpak_output.txt and do not see anything out of the ordinary; except
Note that in the MCL clustering output for K=9 cluster 3 consists of a singleton (number 3; which is replicate 9_4) that is not clustered with any other replicate. In the output for K=11 clusters 4 and 5 also consist of singletons (numbers 8 and 10; which are, respectively, replicates 11_9 and 11_11). It is exactly these replicates in singleton clusters that are missing in the clumpak output. Therefore, what I think happens is that Clumpak does not output singletons, causing the inconsistent number of runs. Is there a reason why AdmixPipe does not allow such inconsistent numbers? Thanks again |
To be honest this has never come up before, but that is probably because I rarely run the bestK pipeline from clumpak. I don't find it to be especially informative. Thanks for bringing this to my attention. I will have to revise the code so that it checks for replicates that are not present in the clumpak output when I do my next revision; probably sometime this winter. |
Sorry for the delayed response - I've been busy with finishing up some projects and it has taken a while to circle back to these issues. I think I figured out a solution for this by digging through the clumpak code. There is a setting for the Briefly, the default value is set to 0.1 (in the clumpak code itself), and this seems to be the minimum threshold for including a cluster in the output. E.g., if the number of replicates included in a cluster do not represent at least So if you set this number to a low value (e.g., I will leave this issue open until I have a chance to implement these changes. edit: fix is implemented in github repository version. I still need to update documentation in README.md to reflect changes. Fix will be pushed to Docker container in next Docker update. |
I have run
admixturePipeline.py
,submitClumpak.py
, anddistructRerun.py
in a dedicated directory. All seem to have worked OK.Now I want to run
submitClumpak.py -b
but I am getting the following error:error occurred - Number of runs is not consistent between K's
I checked and I have 20 replicates for each k (from k=2 to k=13). However, when I open
ll_all.txt
there seem to be only 19 entries for k=9 and 18 for k=11. See attached file.Do you know what may be going wrong?
Thanks, Robin
ll_all.txt
The text was updated successfully, but these errors were encountered: