visualization of uncertainty #24

yuanqing-wang · 2020-05-13T16:08:37Z

Since the inputs are graphs and couldn't be squeezed onto one axis, how should we visualize the uncertainty predictions in regression tasks?

karalets · 2020-05-13T16:18:27Z

Hey,

I would also reference this in the metrics issue #4

If I understand correctly, you are concerned about ordering items.
Initially, I believe one could represent uncertainty per batch/dataset and not easily per item on a plot, since ordering graphs is weird.

Here's one fun way to think about this down the line:
if the semi- and unsupervised learning thingie makes some progress, we may find ourselves in a position to have embeddings of graphs in a low-d domain.

Then you could have a d-dimensional plot with a heatmap representing LLK etc, which would be a very nice way to represent chemical space.

Potentially 2d might even work and is very easy to visualize, but even higher-d could be further reduced.

yuanqing-wang · 2020-05-13T17:45:37Z

If all we need is a fixed-dimensional representation of graphs, then we might as well just do some dimension reduction tricks on the eigenspace representation?

karalets · 2020-05-13T17:53:02Z

I'm not really sure there are good eigenspace representations of graphs of different sizes etc. If I were you I would not focus on that now, I would focus on trying out some of these unsupervised and semi-supervised models for graphs to get to the thing we actually need in order to incorporate knowledge from graphs we have no measurements for.

karalets · 2020-05-13T17:59:03Z

Here's a recent paper doing chem-stuff:
https://jcheminf.biomedcentral.com/articles/10.1186/s13321-019-0396-x

Here's a very rough overview of the core idea:
https://towardsdatascience.com/tutorial-on-variational-graph-auto-encoders-da9333281129

Lots of papers of various degrees of complexity exist and lots of code-bases, in an ideal universe we would open an issue to survey the landscape of the different graph models out there with code and start a script to test their usefulness systematically just like the current experiment about graph nets for regression.

How does that sound?

And here's a classic just for starters:
https://github.com/tkipf/gae

But there are many more recent papers extending those ideas significantly.

karalets · 2020-05-13T18:04:33Z

I pitch we continue the graph modeling discussion in a new issue I created for that.

maxentile · 2020-05-13T18:52:43Z

I think there are two distinct aspects of this question: how to visually "index into" the domain of possible input graphs, and how to visualize a model's predictive uncertainty for a single input graph.

For predictions of a scalar property of a single molecule (e.g. its "affinity" or "overall goodness score"), a complete representation would be the whole predictive pdf (aka a histogram of samples from the posterior predictive distribution). That can be summarized by an interval or scalar measuring its "spread" (stddev, quantiles, ...). Depending on the shapes of the predictive distributions, these summaries may be more or less lossy.

For predictions of more than one property simultaneously of a given molecule ("solubility", "on-target affinity", "off-target affinity", "toxicity", ...), the predictions of the various quantities will probably be correlated, which will make summarization even harder. A complete representation would be the joint distribution for all these predictions, which can be lossily visualized in the usual ways (reduce jointly to 2D, show all bivariate marginals, ...). Although it will be too big to look at for more than a handful of molecules, I imagine it will be informative to take a look at the joint predictive distribution for all properties of a molecule or two (maybe for one molecule that looks very similar to something in training dataset, and one that looks very different), using the same model but different approximate inference algorithms.

I don't have any special insight into how to visually "index into" the domain of input graphs, to get a more global picture of what the model's doing. Associating each graph with a point in 2D using the approaches @karalets describes here and on the new issue sounds good to me. Each 2D point would further be associated with a scalar (@karalets suggests LLK, and perhaps other scalar summaries of posterior predictive distribution would be appropriate), to form a colored scatterplot or heatmap maybe hinting at "where in chemical space the model is confident or not."

karalets · 2020-05-13T19:01:28Z

I think there are two distinct aspects of this question: how to visually "index into" the domain of possible input graphs, and how to visualize a model's predictive uncertainty for a single input graph.

For predictions of a scalar property of a single molecule (e.g. its "affinity" or "overall goodness score"), a complete representation would be the whole predictive pdf (aka a histogram of samples from the posterior predictive distribution). That can be summarized by an interval or scalar measuring its "spread" (stddev, quantiles, ...). Depending on the shapes of the predictive distributions, these summaries may be more or less lossy.

For predictions of more than one property simultaneously of a given molecule ("solubility", "on-target affinity", "off-target affinity", "toxicity", ...), the predictions of the various quantities will probably be correlated, which will make summarization even harder. A complete representation would be the joint distribution for all these predictions, which can be lossily visualized in the usual ways (reduce jointly to 2D, show all bivariate marginals, ...). Although it will be too big to look at for more than a handful of molecules, I imagine it will be informative to take a look at the joint predictive distribution for all properties of a molecule or two (maybe for one molecule that looks very similar to something in training dataset, and one that looks very different), using the same model but different approximate inference algorithms.

I think @yuanqing-wang asks about how to build an axis over graphs here, not the output variable. That discussion, should probably be in the metrics issue #4 .
I agree with you that the title might also point to discussions about how to visualize output, however the description of the issue makes me think otherwise.

I don't have any special insight into how to visually "index into" the domain of input graphs, to get a more global picture of what the model's doing. Associating each graph with a point in 2D using the approaches @karalets describes here and on the new issue sounds good to me. Each 2D point would further be associated with a scalar (@karalets suggests LLK, and perhaps other scalar summaries of posterior predictive distribution would be appropriate), to form a colored scatterplot or heatmap maybe hinting at "where in chemical space the model is confident or not."

To give some more color:

First, the metrics we care about predicting should be anything we decide to have in #4 , I mention LLK here as one example. I am sad nobody is interacting with #4 as this is a very important issue that @yuanqing-wang brought up as a blocker for reproduction when he initially looked at the Cambridge paper and we have not yet made overview plots like the ones they have there.

The 2d or whatever-d plot would point to such metrics, whichever one chooses.

yuanqing-wang added the discussion label May 13, 2020

karalets mentioned this issue May 13, 2020

Graph generative models #26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

visualization of uncertainty #24

visualization of uncertainty #24

yuanqing-wang commented May 13, 2020

karalets commented May 13, 2020 •

edited

Loading

yuanqing-wang commented May 13, 2020

karalets commented May 13, 2020 •

edited

Loading

karalets commented May 13, 2020

karalets commented May 13, 2020

maxentile commented May 13, 2020

karalets commented May 13, 2020

visualization of uncertainty #24

visualization of uncertainty #24

Comments

yuanqing-wang commented May 13, 2020

karalets commented May 13, 2020 • edited Loading

yuanqing-wang commented May 13, 2020

karalets commented May 13, 2020 • edited Loading

karalets commented May 13, 2020

karalets commented May 13, 2020

maxentile commented May 13, 2020

karalets commented May 13, 2020

karalets commented May 13, 2020 •

edited

Loading

karalets commented May 13, 2020 •

edited

Loading