-
-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW]: quickBayes: An analytical approach to Bayesian loglikelihoods #6230
Comments
Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks. For a list of things I can do to help you, just type:
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
|
|
Wordcount for |
|
Review checklist for @mattpitkinConflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
|
Review checklist for @JonasMossConflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
|
Hi @AnthonyLim23, I've just got a few minor things to suggest initially:
@software{quasielasticbayes,
author = {{Sivia}, D. and {Howells}, S.},
title = {quasielasticbayes version 0.2.0}
}
@article{bayesPaper,
title = {Bayesian analysis of quasielastic neutron scattering data},
journal = {Physica B: Condensed Matter},
volume = {182},
number = {4},
pages = {341-348},
year = {1992},
note = {Quasielastic Neutron Scattering},
issn = {0921-4526},
doi = {10.1016/0921-4526(92)90036-R},
url = {https://www.sciencedirect.com/science/article/pii/092145269290036R},
author = {{Sivia}, D.~S. and {Carlile} C.~J. and {Howells}, W.~S. and {König}, S.},
abstract = {We consider the analysis of quasielastic neutron scattering data from a Bayesian point-of-view. This enables us to use probability theory to assess how many quasielastic components there is most evidence for in the data, as well as providing an optimal estimate of their parameters. We review the theory briefly, describe an efficient algorithm for its implementation and illustrate its use with both simulated and real data.}
}
}
@article{bayesReview,
title = {Bayesian inference in physics},
author = {{von Toussaint}, Udo},
journal = {Rev. Mod. Phys.},
volume = {83},
issue = {3},
pages = {943--999},
year = {2011},
month = {Sep},
publisher = {American Physical Society},
doi = {10.1103/RevModPhys.83.943},
url = {https://link.aps.org/doi/10.1103/RevModPhys.83.943}
}
@book{bayesBook,
author = {{Sivia}, D. and {Skilling}, J.},
edition = {Second},
publisher = {Oxford University Press},
title = {Data Analysis A Bayesian Tutorial},
year = {2006},
isbn = {9780198568322}
} |
In the developer section of the docs, could you please add a sentences or two stating how users/developers can:
i.e., just say that users can submit issues in the GitHub repo. |
I have updated the PR for the paper with your bib file. I then edited the authors list for the software so the shared names are the same as the paper. The other comments have been addressed in ISISNeutronMuon/quickBayes#88 |
@editorialbot generate pdf |
@AnthonyLim23 It looks like I made a typo in my version of the While, I'm at it - I spotted a minor typo. In the acknowledgements |
For the Quasielasticbayes 0.2.0 I dont have a date or url. It was a programme written late 80's early 90's for internal use by scientists. So was never properly documented. I can put the github that it has since been moved to (and updated to work with Python), what do you think? |
@AnthonyLim23, I think you can put the GitHub link that is has been moved to. |
That should be everything updated. |
I have now read much of the documentation and can only recommend a full rewrite of it. Since this package is about Bayesian statistics, I need to know:
To fix these problems it may help to take a look at, e.g., the docs for linear regression in pymc. Here it's clearly stated:
More precise terminology would also help. I would also recommend fewer attempts at intuitive explanations of the statistics. Here are a couple of more detailed responses regarding the loglikelihood page.
|
Thanks a lot for your thorough review of the documentation, @JonasMoss! |
I agree with @JonasMoss's comments - the documentation needs very considerable work. The introduction should be a clear succinct summary of what the package is for - the example given is not very clear, and if included (I would actually recommend removing it) needs to be explained in more detail. As stated in the previous comments this should include: the model used, the model parameters that are being fit, how are they being fit, an explanation of the input data, e.g., something about choosing two different noise levels to show a particular effect. (As an aside, in the introduction example with the Gaussian profiles, the docs/code state that these have an amplitude of 103, but the plots show a value of about 50; how is the amplitude being defined?) The API documentation also needs major work. Currently, the API docs exists, but following down the chain of links shows most functions/classes are very minimally documented, without any clear explanation of what they do or what the input arguments and outputs are. Having a full API documentation is not a requirement for a JOSS paper, so long there are several usage cases shown in examples that are well described and cover a decent range of the most used package functionality. I would recommend focusing on a few good quality examples and remove most of the API from the docs unless it points to well documented material. |
In order to reduce the amount of work we all need to do, I want to make sure that we are agreed on what changes need to be made. I hope that is ok. I agree that the documentation needs more work, but I want it to be appropriate for what the package does. This package is a bit strange as it uses some Bayesian statistics, but not directly. So I want to check how much detail on Bayesian statistics is appropriate. My understanding is that normally Bayesian inference uses multiple "walkers"/"samples" that are initially distributed across the parameter space according to a prior distribution (usually flat). However, in this package a prior is not defined by the user. Instead they define some initial start parameters, which are then used for a fit (the user can choose the fit engine/method). In that sense there is only a single point for the prior (unless a multistart method is used). There is currently no posterior distribution produced by the code (I would like to add it someday, but would want to verify the results against DREAM from bumps). At present the code just produces the peak value of the distribution. This result is then used to determine which model is more likely. The models are defined as fitting functions, which when called return an array of the corresponding y values. These are used in the optimisation to calculate the local minima. I have included some "special" functions that allow the construction of more complicated models with minimal work (e.g. composite for addition and convolution). The models are then tried within a workflow to calculate the most likely models (e.g. is the data linear or quadratic). Input data is generally just x, y, e and the user can also include additional x, y, e data that correspond to a resolution for Quasi Elastic Neutron Scattering instruments (e.g. Osiris at ISIS). Most of the approximations are hidden in the derivation of the peak posterior probability. The big assumption is that it is being evaluated at the local minima, which is why we can use fitting to calculate the peak posterior probability. @JonasMoss for a real world example is it sufficient to just have images and show a line of code reading some data (from the tests) for the input? I have included the log rules because my colleagues are unlikely to know them and I wanted them to have the information they needed to get from the equation in the book to the one implemented in the code. @mattpitkin for the API examples would you like to see examples for the individual pieces (e.g. fitting functions, fitting engines) or just a few example workflows/model selection? Most of the API docs are autogenerated. I did try to write the docs in the style of a tutorial, but I assume that those examples are insufficient. What changes would you recommend that I make to those examples to improve the API documentation? |
Hi @AnthonyLim23, sorry for the slow response.
I think it would be good to see, say, two fully worked examples demonstrating a realistic pipeline of usage of the code, including either real or simulated datasets. One could be, say, determining the number of Gaussian-like "lines" in some data and one could also incorporated some modelling of a background (or whatever you think appropriate). If there's a real analysis that it has been used on where the data is open and you a free to post in the docs that would be great, e.g., the QENS example that you have in the paper. You could do these in examples in Jupyter notebooks and either just link to them in the docs or get Sphinx to render them, so that you can easily include figures (although this is not a requirement). Some other bits of the API do need work though. The fit functions all at least need a short doc string saying what they are and what input parameters they require, otherwise they are not really useful. In terms of other general comments (and again in relation to @JonasMoss's comments), I think the first paragraph of the paper nicely described the use case of the code, so that should be echoed in the docs. Essentially this code is for model comparison and for each model you are (I think/assume) approximating the marginalised likelihood for a given model (also known as the evidence), where you marginalising over all the unknown model parameters (can you hold an of the parameters at fixed values or do you have to marginalise over them all?) as opposed to an analytical maximum likelihood approach. Maybe this is done using Laplace's method? You can state this in the docs and paper, but you should also briefly mention other approximation methods such as the AIC, BIC and nested sampling (which is the Monte Carlo method described at the end of the Sivia book). So, when you talk about loglikelihood in the docs I think you really mean marginal likelihood/evidence - please check if this is what you mean and make changes. You should also be explicit that priors on all model parameters are assumed to be flat and infinite in bounds (unless they are not?). This effects how you interpret the model comparison, so it does need to be stated. Is there any way to specifically set the prior ranges, or use different priors (say Gaussian priors, which can also be analytically marginalised over in certain cases)? In the docs, it currently states that:
but I don't really think this is quite what they are. The fit functions are a (potentially approximate) mathematical model for a process/phenomenon that you have collected data to measure. The data will be a combination of that modelled process and some noise (where the noise could just be measurement noise, or a combination of other unmodelled processes). So, you could say something like: "The fit functions are approximate mathematical models of the process that we intend to measure.". |
@mattpitkin I have data in the tests for QENS and simulated muon data (number of exp decays). I can do two examples using them. I will look into the rest of the API docs and make them clearer. I believe that it is the marginalised likelihood, but I will double check. At presents all of the parameters have to be free, I do want to make it easier to add fixes (the current method is https://github.com/ISISNeutronMuon/quickBayes/blob/main/src/quickBayes/fit_functions/stretch_exp_fixed.py which is a bit clunky. In terms of priors, from a user perspective it is purely the parameter range and start values for fit parameters. I believe the flat priors are baked into the assumptions used for deriving the analytical equation. I can touch on it, but the user will never be able to do anything about it. |
Review checklist for @prashjetConflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
|
Hi there. Thanks for your patience waiting for my review, I've just got round to it after a busy period. After reading the documentation and paper, I am still quite confused about which calculations this software is doing, and what information those calculations are telling me. I think the comment from @JonasMoss provides an excellent checklist of what should be included in the documentation so that someone can clearly understand what your software does and when to use it. I won't repeat the entire checklist here, but I'll highlight some key points in a response to your last message. You say...
In principle, I think that black box software for Bayesian statistics may sometimes be useful (i.e. software that provides a solution, without necessarily going into details) but only if you can clearly explain what situations the black-box is suitable for. In the case of
That last point is a little opaque, so I'll try to clarify it with an example. Currently, your I think unravelling Equation 4.20 of the Sivia & Skilling is likely to be difficult task, however, given that this calculation is currently at the heart of your software package, I think it is unavoidable in order for anyone know if p.s. I also had a problem with installation. |
Thanks to @JonasMoss, @prashjet and @mattpitkin for your thorough reviews! @AnthonyLim23, I understand that a large number of issues need to be addressed here. It is ok if it takes time, but please keep us updated on any progress in this thread. |
@osorensen thank you. It will take me a while as I have another project deadline that I need to focus on first. |
I have outlined my plan for the new documentation ISISNeutronMuon/quickBayes#92 I think this should cover all of the comments. I also have an issue to do the API changes ISISNeutronMuon/quickBayes#93 I have a potential fix for ISISNeutronMuon/quickBayes#89 |
👋 @AnthonyLim23, how is it going with the work towards the issues mentioned above? |
@osorensen I have written an introduction to Bayes and done a derivation of the main equation, using simplified notation (I have also extended it). I still need to clean it up a bit, I also need to write the comparison to other methods and the examples. Unfortunately, I have to do this around other things. I am sorry it is taking so long. I hope to get some more done next week (I am happy to stick it in a PR so you can have a look). I think I have fixed the issue with not being able to install it ISISNeutronMuon/quickBayes#89 and the fix ISISNeutronMuon/quickBayes#90 |
The docs are working in progress but what I have so far are here: https://quickbayes--94.org.readthedocs.build/en/94/ I still need to do:
However, it should give a good indication of the new documentation and if it is suitable. |
Thanks for the update @AnthonyLim23 |
Could you please give us a progress update, @AnthonyLim23? |
@osorensen I'm sorry this is taking so long, I am having to do around other projects. I believe the only things left to do for the docs are:
I am hoping to get some time this week to do some work on it. |
@AnthonyLim23, any updates on your progress with the docs? |
Unfortunately, I have not had any opportunity to work on this. However, I am happy to receive feedback on what I have so far |
@AnthonyLim23, I suggest you report here when you have addressed all the issues, so the reviewers can consider it all at once. |
Submitting author: @AnthonyLim23 (Anthony Lim)
Repository: https://github.com/ISISNeutronMuon/quickBayes
Branch with paper.md (empty if default branch): 82_paper
Version: 1.0.0b18
Editor: @osorensen
Reviewers: @JonasMoss, @mattpitkin, @prashjet
Archive: Pending
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@JonasMoss & @mattpitkin & @prashjet, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review.
First of all you need to run this command in a separate comment to create the checklist:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @osorensen know.
✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨
Checklists
📝 Checklist for @JonasMoss
📝 Checklist for @mattpitkin
📝 Checklist for @prashjet
The text was updated successfully, but these errors were encountered: