Presentations while studying as an undergraduate intern in DILab, Korea univ.
VCR and ViLBERT Presentatations are made by me and Noah Lee(Korea univ, statistic). Other Presentations are made by myself.
Thanks for all the authors of the papers.
This presentation is for UNITER: UNiversal Image-TExt Representation Learning.
UNITER is universal model for many V + L tasks.
Source code for the UNITER is publicly available at here.
@inproceedings{chen2020uniter,
title={Uniter: Universal image-text representation learning},
author={Chen, Yen-Chun and Li, Linjie and Yu, Licheng and Kholy, Ahmed El and Ahmed, Faisal and Gan, Zhe and Cheng, Yu and Liu, Jingjing},
booktitle={ECCV},
year={2020}
}
This is for the VisualCOMET: Reasoning about the Dynamic Context of a Still Image.
VisualCOMET is new task for visual commonsense reasoning that includes before, after scene and human intent on scene.
Page and source code for the VisualCOMET.
@InProceedings{park2020visualcomet,
author = {Park, Jae Sung and Bhagavatula, Chandra and Mottaghi, Roozbeh and Farhadi, Ali and Choi, Yejin},
title = {VisualCOMET: Reasoning about the Dynamic Context of a Still Image},
booktitle = {In Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2020}
}
Presentation for From Recognition to Cognition: Visual Commonsense Reasoning.
VCR presents a new task for visual commonsense reasoning that broaden the horizon, not only for congition but also recognition.
Datasets and leaderboard on this page.
Source code is here for VCR.
@inproceedings{zellers2019vcr,
author = {Zellers, Rowan and Bisk, Yonatan and Farhadi, Ali and Choi, Yejin},
title = {From Recognition to Cognition: Visual Commonsense Reasoning},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}
Presentation for ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks.
ViLBERT solve V + L task by applying BERT's model structure. And Athours present co-attentional layer for mixing vision and lauguage informations.
Source code are publicly available at here
@article{lu2019vilbert,
title={ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks},
author={Lu, Jiasen and Batra, Dhruv and Parikh, Devi and Lee, Stefan},
journal={arXiv preprint arXiv:1908.02265},
year={2019}
}
This if for the COMET: Commonsense Transformers for Automatic Knowledge Graph Construction.
COMET proves that AI can automatically construct novel and diverse knowledge graph by adopting transformer base model, starting with seed data.
We can get the source code from here
@inproceedings{Bosselut2019COMETCT,
title={COMET: Commonsense Transformers for Automatic Knowledge Graph Construction},
author={Antoine Bosselut and Hannah Rashkin and Maarten Sap and Chaitanya Malaviya and Asli Çelikyilmaz and Yejin Choi},
booktitle={Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL)},
year={2019}
}