Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataset representation #101

Open
6 tasks
yuanqing-wang opened this issue Jul 6, 2020 · 3 comments
Open
6 tasks

dataset representation #101

yuanqing-wang opened this issue Jul 6, 2020 · 3 comments
Assignees

Comments

@yuanqing-wang
Copy link
Member

yuanqing-wang commented Jul 6, 2020

what features do we want for Dataset object?

now we have the following functionalities

  • csv import
  • batch
  • split
  • temporal splitting

we should consider adding the following:

  • extra features (which level)
  • different node representation input
@miretchin
Copy link
Collaborator

Based on our discussion earlier, elaborating on the "different node representation input":

  1. Have dataset object annotate different inputs with different style of representing graphs. i.e., different preprocessing steps that yield different representations, like dataset.smiles_representation or some other representation.
  2. Have datasets be typed. Typed according to what input representation it assumes, and then have a flag that can be passed forward to the model (and by extension, the models should be typed).

@dnguyen1196
Copy link
Collaborator

Potentially, we might want to do something like this paper which the input has both the graph representation + junction tree representation (I digged in the code and it's possible to process molecules into junction trees either on the fly or as part of preprocessing)

https://arxiv.org/pdf/2006.12179.pdf

@yuanqing-wang
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants