-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Work-In-Progress] A different implementation of GVAE #34
Conversation
Great thanks, I will review tomorrow with a bit more time on my hands. |
Thanks this looks nice! Let me take a look at the details in just a bit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this. Some general comments:
- the training and test loop all looks great. for now I think it would be more focused if we put more functionalities into the object rather than supporting it using a script. we are going to discuss the ways to integrate generative experiments into our pipeline in a short while I think.
- some functionalities here are already implemented, specifically the transform from a graph to latent code. I'd suggest call something like
pinot.Net
to get those done. we should work on improving that if there are things you couldn't do with the current implementation. - I saw you mentioned that the size of the graph is tricky since it's too small. would batching small graphs together help? if not let's grab a dataset from ZINC. https://zinc.docking.org since there the dataset size is a lot larger.
- let's brainstorm a bit what tests should we do!
from torch.nn.parameter import Parameter | ||
|
||
|
||
class GraphConvolution(Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this bit is a little repetitive. The graph -> latent code transform is taken care of by pinot.representation
with API to dgl models and so on. Now it looks like this is not too much different than a vanilla GN but if you feel the need to implement new representation architecture maybe we should do it in pinot.representation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy to wait with fusing pinot.representation
with this for a while.
Ultimately yes, we want the same things.
For now, I just would like some empirical feedback on how well these different unsupervised choices work without being bogged down by pinot engineering.
The reason: We'll have to see if DGL is always the right thing for us (as currently assumed), maybe down the line there will be great code in one of these graph models here that is not in dgl and we may want to use. So let's first explore the space of those models before making decisions on API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that a lot of the code was somewhat repetitive too but it just happens to be this way because one half of the GVAE (encoder part) is essentially doing what representation
does.
And one can certainly see GVAE is very similar to what our Net
in functionality. Both have an "encode" function that maps a graph to latent node representation. Net
then uses the latent representation to parameterize a predictive distribution over some measurement. GVAE uses the Gaussian distribution for convenience to represent the approximate posterior distribution. Therefore the interface look similar but I would argue that they are doing different things with different purposes.
Therefore, I tried to make the interface for GVAE to closely match that of Net
. The reason is that in the future, if we want to pretrain the representation
component before jointly training with regression
, we can do:
representation = GVAE(params_here)
train_unsupervised(representation)
output_regression = initialize_some_output_regression()
Net(representation, output_regression)
Ideally, I would also agree that if we can "fuse" GVAE as part of Net
where if we call Net.forward(g)
without giving it any measurement y
, we automatically invoke the unsupervised training functionality of GVAE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .forward
method you ported from the reference repo isn't what defines a VGAE, or even the encoder part for that matter. The three-layer simple graph conv is chosen simply as one of the simplest kind of implementations of graph conv, or in general, a graph embedding (g -> h). What distinguishes these model, from my understanding, is the transform from latent space to a graph (usually an adjacency matrix + some attribute). (let's call it h -> g)
So I would strong suggest we only implement h -> g part of these generative models as g -> h could be easily replaced by our previously defined models.
In this case it seems that the .forward
function could be replaced by
pinot.representation.Sequential(
layer=pinot.representation.dgl_legacy.gn('GraphConv'),
config=[ # some dimensions and activations ])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the .forward
function from the reference repo has been renamed to encode_and_decode
. .foward
function in this implementation is doing what you're suggesting, mapping from g-> h. I was aware of this difference so I modified the interface so it matches more closely with what we expect from the interface of Net
loc=theta[0], | ||
scale=torch.exp(theta[1])) | ||
|
||
return distribution, theta[0], theta[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. It seems to me that these should already be taken care of if you would just call pinot.Net
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same point as above, I strongly prefer the unsupervised thread to be self-sustaining for a bit until we design the interfaces with pinot
later on.
Which does not make your point invalid, I just think it is too early for this integration.
It's good to think ahead of how one would do it, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is that if we wanted to somehow integrate these two to share a representation layer, we'll have to make the g -> h part universal. Also that part should be easy to check and rewrite if we were to integrate them later. Now it's a one-line API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing is that hand-writen graph convolutional nets (I did a lot before) is a bit error prone and should be avoided. So I highly recommend we use the modular implementation of dgl even if for some reason we shouldn't use the existing one-line API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so it seems what I can do with the current implementation to compromise is that
the .forward
function should do what is expected when one calls representation(g)
within Net
which is to map from graph to latent representation. And this is the current state of the implementation actually. The .foward
function maps from x, adj
to z
(latent encodings of the nodes).
What I can do is to use DGL implementation instead of handwritten GCN code. This will make it easier to try different DGL architecture in the future.
|
||
|
||
if __name__ == '__main__': | ||
gae_for(args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This training script looks great! But maybe move it to somewhere in /scripts
? I think @karalets would argue that we need to find ways to organically integrate these experiments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a gvae_exp.py
in /scripts
, will remove this one since both are actually the same file. Thought of leaving train.py
so that the module can be sort of a standalone module
GraphConvolution(hidden_dim1, hidden_dim2, dropout, act=lambda x: x), | ||
]) | ||
|
||
def forward(self, x, adj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we were to depend on DGL anyway we might not need to see adjacency matrix too often in the scripts expect when we output it from the generative model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuanqing-wang I'm not too sure what you mean here. A forward pass in GVAE requires the node feature vectors x
and the adjacency matrix adj
. We can certainly combine them into one argument g = { 'x' : x, 'adj' : adj}
(is that what you were trying to point out, that adj
doesn't have to be explicitly defined as a separate argument to pass into forward
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I'm not nitpicking on the way we feed arguments into functions. My point is that adj
is required here only because the GitHub repo you ported chose to express GraphConv in this way. This however could be expressed a lot simpler if we used dgl
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it would require some minor headache if you wanted to batch and unbatch adjacency matrix. (you'll need to either slice the sparse matrix or do some sort of slice along diagnal) Nothing too complicated, but DGL does that for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guys, I will restate this:
I strongly support also trying non-dgl representations for a standalone effort here.
Yuanqing, what Duc is doing is exploring different models and codebases to write them up.
I explicitly want to let him explore what is out there in terms of both models and code that runs to explore these APIs. I think what you are suggesting is quite a likely outcome of this process, but I want it to be the outcome, not an axiom.
Dgl, like any library, may have edgecases where it cannot express stuff that we may find we need. Let Duc play with this stuff for a while with a standalone module that just tests different models and codebases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also there is pytorch geometric
and a variety of other ways to express graphs. We should not prematurely restrict our explorations to dgl, though I like dgl and was the person who originally suggested it as a quick way to get into pytorch graph stuff, if you remember.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuanqing-wang I see what you mean now. I will certainly try to see if DGL can be incorporated here. That would probably make the code more robust since DGL has been tested well and we have more graph neural network architecture to choose from. Good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dnguyen1196 before you refactor out of this:
Can you please leave empirical information about how well this handwritten code works (empirically, i.e. numbers) vis a vis a library like dgl?
You would be surprised how often I have seen things diverge between low level pytorch code and output of libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this:
- for this model and the future generative models. we don't explicitly depend on dgl or anything. rather, we leave the representation **g -> h ** bit empty and only implement methods like
generate
andinference
methods. If there's anything unique to the **g -> h ** part of some model, leave that function out of the main object so that we can plug in our favorite representations. - as of now, the only parts that invokes
dgl
libraries are the representation part. in general, this should be a neurally parametrized, trainable function that takes a graph as input and a(n_nodes, hidden_dimension)
representation as output. so long as a function satisfies this, this could be plugged in as representation. (of course now the graph itself is also a DGL object but we can implement alternatives later.)
My main point is that, generative models typically refer to, and are characterized by, the function that translates a latent code to a graph. so it should be structured as such.
btw all the macOS tests are failing because of RAM issue or something. let's just disable them since it's distracting. |
1bf1dbc
to
f8f97f0
Compare
Reference issues: #26
A more modular implementation of GVAE that is more compatible with our
Net
class. Adapted from https://github.com/zfjsail/gae-pytorch/tree/master/gaeA quick look at the interface:
A training file in
gvae_exp.py
andtrain.py
. The current training scheme loops through each molecule (1186 of them in the esol data set). However, most of these molecules make up small graphs. As a result, there is over fitting going on as training loss can be driven to 0 but validation and testing "scores" is low.Also comes with utility function that performs "partition" of the graph into train/test/validation edges. Note that this implementation of VGAE (and the formulation introduced in the Kipf and Welling 16 paper) was geared towards link prediction. Therefore, various utilties are implemented to facilitate this task.