Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Work-In-Progress] A different implementation of GVAE #34

Closed
wants to merge 0 commits into from

Conversation

dnguyen1196
Copy link
Collaborator

@dnguyen1196 dnguyen1196 commented May 20, 2020

Reference issues: #26

A more modular implementation of GVAE that is more compatible with our Net class. Adapted from https://github.com/zfjsail/gae-pytorch/tree/master/gae

A quick look at the interface:

class GCNModelVAE(nn.Module):
    def __init__(self, input_feat_dim, hidden_dim1, hidden_dim2, dropout):
        self.output_regression = torch.nn.ModuleList([ # One can add more GCN layers
            GraphConvolution(hidden_dim1, hidden_dim2, dropout, act=lambda x: x),
            GraphConvolution(hidden_dim1, hidden_dim2, dropout, act=lambda x: x),
        ])

    def forward(self, x, adj):
        Returns the latent representation of the nodes . So that when Net calls
        representation(h), it invokes this function

    def condition(self, x, adj):
        Similar to Net.condition, returns a distribution of the approximate posterior q(z|x, adj)

    def encode_and_decode(self, x, adj):
        calls forward, and then condition, and then sample from the approximate posterior
        is used during training to optimize the ELBO

A training file in gvae_exp.py and train.py. The current training scheme loops through each molecule (1186 of them in the esol data set). However, most of these molecules make up small graphs. As a result, there is over fitting going on as training loss can be driven to 0 but validation and testing "scores" is low.

Also comes with utility function that performs "partition" of the graph into train/test/validation edges. Note that this implementation of VGAE (and the formulation introduced in the Kipf and Welling 16 paper) was geared towards link prediction. Therefore, various utilties are implemented to facilitate this task.

@dnguyen1196 dnguyen1196 added the enhancement New feature or request label May 20, 2020
@dnguyen1196 dnguyen1196 changed the title A different implementation of GVAE [Work-In-Progress] A different implementation of GVAE May 20, 2020
@karalets
Copy link
Collaborator

Great thanks, I will review tomorrow with a bit more time on my hands.

@yuanqing-wang
Copy link
Member

Thanks this looks nice! Let me take a look at the details in just a bit.

Copy link
Member

@yuanqing-wang yuanqing-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this. Some general comments:

  • the training and test loop all looks great. for now I think it would be more focused if we put more functionalities into the object rather than supporting it using a script. we are going to discuss the ways to integrate generative experiments into our pipeline in a short while I think.
  • some functionalities here are already implemented, specifically the transform from a graph to latent code. I'd suggest call something like pinot.Net to get those done. we should work on improving that if there are things you couldn't do with the current implementation.
  • I saw you mentioned that the size of the graph is tricky since it's too small. would batching small graphs together help? if not let's grab a dataset from ZINC. https://zinc.docking.org since there the dataset size is a lot larger.
  • let's brainstorm a bit what tests should we do!

from torch.nn.parameter import Parameter


class GraphConvolution(Module):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this bit is a little repetitive. The graph -> latent code transform is taken care of by pinot.representation with API to dgl models and so on. Now it looks like this is not too much different than a vanilla GN but if you feel the need to implement new representation architecture maybe we should do it in pinot.representation

Copy link
Collaborator

@karalets karalets May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy to wait with fusing pinot.representation with this for a while.
Ultimately yes, we want the same things.
For now, I just would like some empirical feedback on how well these different unsupervised choices work without being bogged down by pinot engineering.

The reason: We'll have to see if DGL is always the right thing for us (as currently assumed), maybe down the line there will be great code in one of these graph models here that is not in dgl and we may want to use. So let's first explore the space of those models before making decisions on API.

Copy link
Collaborator Author

@dnguyen1196 dnguyen1196 May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that a lot of the code was somewhat repetitive too but it just happens to be this way because one half of the GVAE (encoder part) is essentially doing what representation does.

And one can certainly see GVAE is very similar to what our Net in functionality. Both have an "encode" function that maps a graph to latent node representation. Net then uses the latent representation to parameterize a predictive distribution over some measurement. GVAE uses the Gaussian distribution for convenience to represent the approximate posterior distribution. Therefore the interface look similar but I would argue that they are doing different things with different purposes.

Therefore, I tried to make the interface for GVAE to closely match that of Net. The reason is that in the future, if we want to pretrain the representation component before jointly training with regression, we can do:

representation = GVAE(params_here)
train_unsupervised(representation)

output_regression = initialize_some_output_regression()
Net(representation, output_regression)

Ideally, I would also agree that if we can "fuse" GVAE as part of Net where if we call Net.forward(g) without giving it any measurement y, we automatically invoke the unsupervised training functionality of GVAE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .forward method you ported from the reference repo isn't what defines a VGAE, or even the encoder part for that matter. The three-layer simple graph conv is chosen simply as one of the simplest kind of implementations of graph conv, or in general, a graph embedding (g -> h). What distinguishes these model, from my understanding, is the transform from latent space to a graph (usually an adjacency matrix + some attribute). (let's call it h -> g)

So I would strong suggest we only implement h -> g part of these generative models as g -> h could be easily replaced by our previously defined models.

In this case it seems that the .forward function could be replaced by

pinot.representation.Sequential(
    layer=pinot.representation.dgl_legacy.gn('GraphConv'),
    config=[ # some dimensions and activations ])

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the .forward function from the reference repo has been renamed to encode_and_decode. .foward function in this implementation is doing what you're suggesting, mapping from g-> h. I was aware of this difference so I modified the interface so it matches more closely with what we expect from the interface of Net

loc=theta[0],
scale=torch.exp(theta[1]))

return distribution, theta[0], theta[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. It seems to me that these should already be taken care of if you would just call pinot.Net

Copy link
Collaborator

@karalets karalets May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same point as above, I strongly prefer the unsupervised thread to be self-sustaining for a bit until we design the interfaces with pinot later on.
Which does not make your point invalid, I just think it is too early for this integration.

It's good to think ahead of how one would do it, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that if we wanted to somehow integrate these two to share a representation layer, we'll have to make the g -> h part universal. Also that part should be easy to check and rewrite if we were to integrate them later. Now it's a one-line API.

Copy link
Member

@yuanqing-wang yuanqing-wang May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing is that hand-writen graph convolutional nets (I did a lot before) is a bit error prone and should be avoided. So I highly recommend we use the modular implementation of dgl even if for some reason we shouldn't use the existing one-line API.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so it seems what I can do with the current implementation to compromise is that

the .forward function should do what is expected when one calls representation(g) within Net which is to map from graph to latent representation. And this is the current state of the implementation actually. The .foward function maps from x, adj to z (latent encodings of the nodes).

What I can do is to use DGL implementation instead of handwritten GCN code. This will make it easier to try different DGL architecture in the future.



if __name__ == '__main__':
gae_for(args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This training script looks great! But maybe move it to somewhere in /scripts? I think @karalets would argue that we need to find ways to organically integrate these experiments.

Copy link
Collaborator Author

@dnguyen1196 dnguyen1196 May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a gvae_exp.py in /scripts, will remove this one since both are actually the same file. Thought of leaving train.py so that the module can be sort of a standalone module

GraphConvolution(hidden_dim1, hidden_dim2, dropout, act=lambda x: x),
])

def forward(self, x, adj):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we were to depend on DGL anyway we might not need to see adjacency matrix too often in the scripts expect when we output it from the generative model.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuanqing-wang I'm not too sure what you mean here. A forward pass in GVAE requires the node feature vectors x and the adjacency matrix adj. We can certainly combine them into one argument g = { 'x' : x, 'adj' : adj} (is that what you were trying to point out, that adj doesn't have to be explicitly defined as a separate argument to pass into forward?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I'm not nitpicking on the way we feed arguments into functions. My point is that adj is required here only because the GitHub repo you ported chose to express GraphConv in this way. This however could be expressed a lot simpler if we used dgl.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it would require some minor headache if you wanted to batch and unbatch adjacency matrix. (you'll need to either slice the sparse matrix or do some sort of slice along diagnal) Nothing too complicated, but DGL does that for you.

Copy link
Collaborator

@karalets karalets May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guys, I will restate this:
I strongly support also trying non-dgl representations for a standalone effort here.
Yuanqing, what Duc is doing is exploring different models and codebases to write them up.
I explicitly want to let him explore what is out there in terms of both models and code that runs to explore these APIs. I think what you are suggesting is quite a likely outcome of this process, but I want it to be the outcome, not an axiom.

Dgl, like any library, may have edgecases where it cannot express stuff that we may find we need. Let Duc play with this stuff for a while with a standalone module that just tests different models and codebases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also there is pytorch geometric and a variety of other ways to express graphs. We should not prematurely restrict our explorations to dgl, though I like dgl and was the person who originally suggested it as a quick way to get into pytorch graph stuff, if you remember.

Copy link
Collaborator Author

@dnguyen1196 dnguyen1196 May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuanqing-wang I see what you mean now. I will certainly try to see if DGL can be incorporated here. That would probably make the code more robust since DGL has been tested well and we have more graph neural network architecture to choose from. Good idea

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dnguyen1196 before you refactor out of this:
Can you please leave empirical information about how well this handwritten code works (empirically, i.e. numbers) vis a vis a library like dgl?
You would be surprised how often I have seen things diverge between low level pytorch code and output of libraries.

Copy link
Member

@yuanqing-wang yuanqing-wang May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dnguyen1196 @karalets

How about this:

  • for this model and the future generative models. we don't explicitly depend on dgl or anything. rather, we leave the representation **g -> h ** bit empty and only implement methods like generate and inference methods. If there's anything unique to the **g -> h ** part of some model, leave that function out of the main object so that we can plug in our favorite representations.
  • as of now, the only parts that invokes dgl libraries are the representation part. in general, this should be a neurally parametrized, trainable function that takes a graph as input and a (n_nodes, hidden_dimension) representation as output. so long as a function satisfies this, this could be plugged in as representation. (of course now the graph itself is also a DGL object but we can implement alternatives later.)

My main point is that, generative models typically refer to, and are characterized by, the function that translates a latent code to a graph. so it should be structured as such.

@yuanqing-wang
Copy link
Member

btw all the macOS tests are failing because of RAM issue or something. let's just disable them since it's distracting.

@dnguyen1196 dnguyen1196 force-pushed the generative-experiment branch 2 times, most recently from 1bf1dbc to f8f97f0 Compare May 25, 2020 17:08
@dnguyen1196 dnguyen1196 mentioned this pull request May 25, 2020
@dnguyen1196 dnguyen1196 deleted the generative-experiment branch May 26, 2020 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants