-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
visualization of uncertainty #24
Comments
Hey, I would also reference this in the metrics issue #4 If I understand correctly, you are concerned about ordering items. Here's one fun way to think about this down the line: Then you could have a d-dimensional plot with a heatmap representing LLK etc, which would be a very nice way to represent chemical space. Potentially 2d might even work and is very easy to visualize, but even higher-d could be further reduced. |
If all we need is a fixed-dimensional representation of graphs, then we might as well just do some dimension reduction tricks on the eigenspace representation? |
I'm not really sure there are good eigenspace representations of graphs of different sizes etc. If I were you I would not focus on that now, I would focus on trying out some of these unsupervised and semi-supervised models for graphs to get to the thing we actually need in order to incorporate knowledge from graphs we have no measurements for. |
Here's a recent paper doing chem-stuff: Here's a very rough overview of the core idea: Lots of papers of various degrees of complexity exist and lots of code-bases, in an ideal universe we would open an issue to survey the landscape of the different graph models out there with code and start a script to test their usefulness systematically just like the current experiment about graph nets for regression. How does that sound? And here's a classic just for starters: But there are many more recent papers extending those ideas significantly. |
I pitch we continue the graph modeling discussion in a new issue I created for that. |
I think there are two distinct aspects of this question: how to visually "index into" the domain of possible input graphs, and how to visualize a model's predictive uncertainty for a single input graph. For predictions of a scalar property of a single molecule (e.g. its "affinity" or "overall goodness score"), a complete representation would be the whole predictive pdf (aka a histogram of samples from the posterior predictive distribution). That can be summarized by an interval or scalar measuring its "spread" (stddev, quantiles, ...). Depending on the shapes of the predictive distributions, these summaries may be more or less lossy. For predictions of more than one property simultaneously of a given molecule ("solubility", "on-target affinity", "off-target affinity", "toxicity", ...), the predictions of the various quantities will probably be correlated, which will make summarization even harder. A complete representation would be the joint distribution for all these predictions, which can be lossily visualized in the usual ways (reduce jointly to 2D, show all bivariate marginals, ...). Although it will be too big to look at for more than a handful of molecules, I imagine it will be informative to take a look at the joint predictive distribution for all properties of a molecule or two (maybe for one molecule that looks very similar to something in training dataset, and one that looks very different), using the same model but different approximate inference algorithms. I don't have any special insight into how to visually "index into" the domain of input graphs, to get a more global picture of what the model's doing. Associating each graph with a point in 2D using the approaches @karalets describes here and on the new issue sounds good to me. Each 2D point would further be associated with a scalar (@karalets suggests LLK, and perhaps other scalar summaries of posterior predictive distribution would be appropriate), to form a colored scatterplot or heatmap maybe hinting at "where in chemical space the model is confident or not." |
I think @yuanqing-wang asks about how to build an axis over graphs here, not the output variable. That discussion, should probably be in the metrics issue #4 .
To give some more color: First, the metrics we care about predicting should be anything we decide to have in #4 , I mention LLK here as one example. I am sad nobody is interacting with #4 as this is a very important issue that @yuanqing-wang brought up as a blocker for reproduction when he initially looked at the Cambridge paper and we have not yet made overview plots like the ones they have there. The 2d or whatever-d plot would point to such metrics, whichever one chooses. |
Since the inputs are graphs and couldn't be squeezed onto one axis, how should we visualize the uncertainty predictions in regression tasks?
The text was updated successfully, but these errors were encountered: