Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training from scratch and model configuration #21

Open
sancarlim opened this issue Aug 1, 2022 · 2 comments
Open

Training from scratch and model configuration #21

sancarlim opened this issue Aug 1, 2022 · 2 comments

Comments

@sancarlim
Copy link

sancarlim commented Aug 1, 2022

Thanks for the great work and the repository!
I have trained the model from scratch and it yields similar results ( a bit worse but almost the same), but shouldn't we pre-train for 100 epochs and then finetune for other 100 as stated in the paper? If so, I think it would be good to indicate it in the README.
It would also be good to indicate how to combine the different enc/agg/dec to reproduce the jobs in the benchmark, or what configurations are possible at all, as some aggregator outputs would not match some decoders - maybe providing different .yml files?
Thank you!

@nachiket92
Copy link
Owner

Turns out training from scratch without pre-training with ground truth traversals leads to the same results and makes training simpler. That's why I decided to just drop the pre-training step. If you still want to pretrain, you can set the pre_train flag to True on line 48 in the config file.

Regarding combinations of enc/agg/dec - you're right, not every decoder will work with every aggregator. There are many possible combinations though. I'll share more configs after running some tests.

@sancarlim
Copy link
Author

sancarlim commented Aug 3, 2022

Perfect , thanks for the quick reply ! In the meanwhile, could you provide the config you used for the decoder ablation described in the paper? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants