Graph generative models #26

karalets · 2020-05-13T18:04:08Z

Graph generative models are important for the tasks we have been describing.

The core idea is to posit a model which defines some distribution over graphs P(G), for instance via a low dimensional model P(Z).

Example: P(G,Z) = P(G|Z) P(Z) like the class of models commonly described as Graph VAEs.

This is the probabilistic modeling of graphs space, there is also the self-supervised world which seems to implicitly represent P(G) without as clear a generative story.
There are also pure deep learning things, like transformer-type models and autoregressive graph models.

I am unsure how those would compare empirically in terms of performance.

In this issue, I suggest we survey the landscape of these models and their empirical comparisons and sketch out a strategy to compare them.

I referenced some work in a previous issue #24 , I pitch that we move this discussion here.

The text was updated successfully, but these errors were encountered:

karalets · 2020-05-13T18:05:45Z

I am quoting what I wrote last in #24 :

Here's a recent paper doing chem-stuff:
https://jcheminf.biomedcentral.com/articles/10.1186/s13321-019-0396-x

Here's a very rough overview of the core idea:
https://towardsdatascience.com/tutorial-on-variational-graph-auto-encoders-da9333281129

Lots of papers of various degrees of complexity exist and lots of code-bases, in an ideal universe we would open an issue to survey the landscape of the different graph models out there with code and start a script to test their usefulness systematically just like the current experiment about graph nets for regression.

How does that sound?

And here's a classic just for starters:
https://github.com/tkipf/gae

But there are many more recent papers extending those ideas significantly.

karalets · 2020-05-13T18:10:22Z

Is this something that maybe @dnguyen1196 would like to take point on? To really dig down on this specific aspect?

yuanqing-wang · 2020-05-13T18:12:26Z

Graph generativce models are important for the tasks we have been describing.

I totally agree that this is a very interesting area. But I'm not sure if I understand how does this fit in our current project at this stage.

(sure once we move on to RL regime we need some machinery to navigate the discrete chemical space)

yuanqing-wang · 2020-05-13T18:16:57Z

Also, structuring P(G|Z) in real life is by no means easy. There are a lot of aspects to leverage---validity of exploitation results, efficiency of exploration results,... and ideally we'd want the model to respect the invariances.

karalets · 2020-05-13T18:17:06Z

Graph generativce models are important for the tasks we have been describing.

I totally agree that this is a very interesting area. But I'm not sure if I understand how does this fit in our current project at this stage.

(sure once we move on to RL regime we need some machinery to navigate the discrete chemical space)

Graph generative models are the part that does the unsupervised or self-supervised training for graph structures, as I mention above.

You already tried playing with an instance of such models, but have not gotten this to work well.

I am pitching to focus a specific project which needs someone to focus on this specific topic here, as this is more important/complementary in its benefits than being Bayesian about parts of the model since it is not an inference issue, but a modeling issue that unlocks the ability to use more data than we have measurements for to improve a representation.

Remember, the Cambridge group got a lot of mileage out of doing this and this is already at the heart of our story as mentioned in issue #3

This is not just necessary for RL, it is necessary for all things we may want to do if we want to train graph-representations on more data than we have measurements for (but have graphs).

karalets · 2020-05-13T18:18:25Z

Also, structuring P(G|Z) in real life is by no means easy. There are a lot of aspects to leverage---validity of exploitation results, efficiency of exploration results,... and ideally we'd want the model to respect the invariances.

I am arguing to start focusing these discussions about graph-space here in a specific issue instead of throwing random self-supervised models or eigenspaces of graphs into the loop in other issues.

yuanqing-wang · 2020-05-13T18:19:42Z

So this would be a kind of regularized representation of graphs on fixed-dimensional spaces. And we might not need the actual generative part for now?

karalets · 2020-05-13T18:22:53Z

If you see my starting comments, I care mostly about the joint space of models that do unsupervised and self-supervised graph representation learning and to compare and evaluate them specifically.

Whether that is actually a graph VAE I do not care, it can be a self-supervised thing. But we need a space of graph representations that we can train from graphs alone and potentially use within a semi-supervised framework and it is prudent to have an issue and a focused effort targeted at exploring that systematically and focusing on experiments and literature review to create clarity about what works.

yuanqing-wang · 2020-05-13T18:27:42Z

Ah okay I see your point.

To start the discussion, I think it might be helpful to list a few flavors of graph generative models.

VAE with string-description-based latent encoding
Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules
https://doi.org/10.1021/acscentsci.7b00572
https://github.com/aspuru-guzik-group/chemical_vae.
VAE that encodes actions
Junction Tree Variational Autoencoder for Molecular Graph Generation
https://arxiv.org/abs/1802.04364
https://github.com/wengong-jin/icml18-jtnn

Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation
https://arxiv.org/pdf/1806.02473.pdf
https://github.com/bowenliu16/rl_graph_generation
VAE that encodes the spectral space of graphs
GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders
arXiv:1802.04364

Variational Graph Auto-Encoders
https://arxiv.org/pdf/1611.07308.pdf

karalets · 2020-05-13T18:31:08Z

Could you also please add titles so we can quickly parse what this work is and add a category with the self-supervised space of work?

I. e. I envision:

Graph VAEs:

Kipf et al 2016 (paper-link, code-link)
-...

Self-supervised work:
-....

Comparisons and performance...

I also added an introductory blog for those not familiar with all these concepts above and a recent paper I saw targeted at molecular stuff.

And I will inquire again:
This might be a very nice modular aspect of this project that is very important and needs a lot of attention and specific targeted experiments. Would @dnguyen1196 be a good person to focus on this?

yuanqing-wang · 2020-05-13T18:45:43Z

I think I can take a stab at this since I've played with these before. It's just that for some of these methods here (especially the ones that encode graph as a sequence of actions) might take some decent time to make them work. I'd suggest lower the priority of testing those.

karalets · 2020-05-13T18:54:15Z

I know you have played with this before, but I am arguing for a full-time effort just on this narrow topic by somebody that would slot into the overall project in order to make more measurable progress with empirical evidence. Do you want to focus on this for a few weeks?
I want to avoid having too many partial results so we can tick boxes and move on instead of having to revisit topics.

yuanqing-wang · 2020-05-13T19:01:40Z

How about, I'll aim to have the full characterization results of the simpler models here before our discussion on Fri. After that we can plan accordingly if we decided to also have more sophisticated algorithms.

karalets · 2020-05-13T19:03:29Z

How about, I'll aim to have the full characterization results of the simpler models here before our discussion on Fri. After that we can plan accordingly if we decided to also have more sophisticated algorithms.

Sounds reasonable.

yuanqing-wang · 2020-05-14T05:53:45Z

I realized that Kipf and Welling's variational graph autoencoder VGAE https://arxiv.org/abs/1611.07308 and this paragraph-vector based semi-supervised model https://arxiv.org/pdf/1711.10168.pdf might be the same thing with particular choices of pooling functions and training schedule.

karalets assigned yuanqing-wang May 13, 2020

karalets added discussion enhancement New feature or request labels May 13, 2020

karalets assigned dnguyen1196 May 13, 2020

karalets mentioned this issue May 13, 2020

Overall loop for training a deep net for molecules here #3

Open

4 tasks

dnguyen1196 mentioned this issue May 20, 2020

[Work-In-Progress] A different implementation of GVAE #34

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph generative models #26

Graph generative models #26

karalets commented May 13, 2020 •

edited

Loading

karalets commented May 13, 2020

karalets commented May 13, 2020

yuanqing-wang commented May 13, 2020 •

edited

Loading

yuanqing-wang commented May 13, 2020

karalets commented May 13, 2020 •

edited

Loading

karalets commented May 13, 2020

yuanqing-wang commented May 13, 2020

karalets commented May 13, 2020 •

edited

Loading

yuanqing-wang commented May 13, 2020 •

edited

Loading

karalets commented May 13, 2020 •

edited

Loading

yuanqing-wang commented May 13, 2020

karalets commented May 13, 2020 •

edited

Loading

yuanqing-wang commented May 13, 2020

karalets commented May 13, 2020

yuanqing-wang commented May 14, 2020

Graph generative models #26

Graph generative models #26

Comments

karalets commented May 13, 2020 • edited Loading

karalets commented May 13, 2020

karalets commented May 13, 2020

yuanqing-wang commented May 13, 2020 • edited Loading

yuanqing-wang commented May 13, 2020

karalets commented May 13, 2020 • edited Loading

karalets commented May 13, 2020

yuanqing-wang commented May 13, 2020

karalets commented May 13, 2020 • edited Loading

yuanqing-wang commented May 13, 2020 • edited Loading

karalets commented May 13, 2020 • edited Loading

yuanqing-wang commented May 13, 2020

karalets commented May 13, 2020 • edited Loading

yuanqing-wang commented May 13, 2020

karalets commented May 13, 2020

yuanqing-wang commented May 14, 2020

karalets commented May 13, 2020 •

edited

Loading

yuanqing-wang commented May 13, 2020 •

edited

Loading

karalets commented May 13, 2020 •

edited

Loading

karalets commented May 13, 2020 •

edited

Loading

yuanqing-wang commented May 13, 2020 •

edited

Loading

karalets commented May 13, 2020 •

edited

Loading

karalets commented May 13, 2020 •

edited

Loading