Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions
Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
[paper]
Plan-And-Write: Towards Better Automatic Storytelling (AAAI2019)
Lili Yao, Nanyun Peng, Ralph Weischedel, Kevin Knight, Dongyan Zhao, Rui Yan
[paper]
A sequential guiding network with attention for image captioning
Daouda Sow, Zengchang Qin, Mouhamed Niasse, Tao Wan
[paper]
Engaging Image Captioning Via Personality
Kurt Shuster, Samuel Humeau, Hexiang Hu, Antoine Bordes, Jason Weston
[paper]
Image Specificity (CVPR2015)
Mainak Jas, Devi Parikh
[paper]
Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions (LREC2018)
Albert Gatt, Marc Tanti, Adrian Muscat, Patrizia Paggio, Reuben A. Farrugia, Claudia Borg, Kenneth P. Camilleri, Mike Rosner, Lonneke van der Plas
[paper]
SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text (CVPR2018)
Alexander Mathews, Lexing Xie, Xuming He
[paper]
Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present (CVPR2018)
Xinpeng Chen, Lin Ma, Wenhao Jiang, Jian Yao, Wei Liu
[paper]
[code]
Convolutional Image Captioning (CVPR2018)
Jyoti Aneja, Aditya Deshpande, Alexander Schwing
[paper]
Nonparametric Method for Data-driven Image Captioning (ACL2014)
[paper]
Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning
Dianqi Li, Qiuyuan Huang, Xiaodong He, Lei Zhang, Ming-Ting Sun
[paper]
Learning to Guide Decoding for Image Captioning (AAAI2018)
Wenhao Jiang, Lin Ma, Xinpeng Chen, Hanwang Zhang, Wei Liu
[paper]
Boosting Image Captioning with Attributes
Ting Yao, Yingwei Pan, Yehao Li, Zhaofan Qiu, Tao Mei
[paper]
Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training (ICCV2017)
Rakshith Shetty, Marcus Rohrbach, Lisa Anne Hendricks, Mario Fritz, Bernt Schiele
[paper]
Recurrent Topic-Transition GAN for Visual Paragraph Generation
Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing
[paper]
Show, Reward and Tell: Automatic Generation of Narrative Paragraph from Photo Stream by Adversarial Training (AAAI2018)
Jing Wang, Jianlong Fu, Jinhui Tang, Zechao Li, Tao Mei
[paper]
Towards Diverse and Natural Image Descriptions via a Conditional GAN(ICCV2017)
Bo Dai, Sanja Fidler, Raquel Urtasun, Dahua Lin
[paper]
[slide]
Contrastive Learning for Image Captioning
[paper]
Describing Natural Images Containing Novel Objects with Knowledge Guided Assitance
Aditya Mogadala, Umanga Bista, Lexing Xie, Achim Rettinger
[paper]
A Hierarchical Approach for Generating Descriptive Image Paragraphs(CVPR2017)
[paper]
Skeleton Key: Image Captioning by Skeleton-Attribute Decomposition(CVPR2017)
Yufei Wang, Zhe Lin, Xiaohui Shen, Scott Cohen, Garrison W. Cottrell
[paper]
[supp]
Top-Down Visual Saliency Guided by Captions(CVPR2017)
Vasili Ramanishka, Abir Das, Jianming Zhang, Kate Saenko
[paper]
[supp]
Self-Critical Sequence Training for Image Captioning(CVPR2017)
Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jerret Ross, Vaibhava Goel
[paper]
Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-In-The-Blank Image Captioning(CVPR2017)
Qing Sun, Stefan Lee, Dhruv Batra
[paper]
Beyond Instance-Level Image Retrieval: Leveraging Captions to Learn a Global Visual Representation for Semantic Retrieval(CVPR2017)
Albert Gordo, Diane Larlus
[paper]
Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects(CVPR2017)
Ting Yao, Yingwei Pan, Yehao Li, Tao Mei
[paper]
Video Captioning With Transferred Semantic Attributes(CVPR2017)
Yingwei Pan, Ting Yao, Houqiang Li, Tao Mei
[paper]
Captioning Images With Diverse Objects(CVPR2017)
Subhashini Venugopalan, Lisa Anne Hendricks, Marcus Rohrbach, Raymond Mooney, Trevor Darrell, Kate Saenko
[paper]
[supp]
SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning(CVPR2017)
Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, Tat-Seng Chua
[paper]
Semantic Compositional Networks for Visual Captioning(CVPR2017)
Zhe Gan, Chuang Gan, Xiaodong He, Yunchen Pu, Kenneth Tran, Jianfeng Gao, Lawrence Carin, Li Deng
[paper]
End-To-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering(CVPR2017)
Youngjae Yu, Hyungjin Ko, Jongwook Choi, Gunhee Kim
[paper]
StyleNet: Generating Attractive Visual Captions With Styles(CVPR2017)
Chuang Gan, Zhe Gan, Xiaodong He, Jianfeng Gao, Li Deng
[paper]
Dense Captioning With Joint Inference and Visual Context(CVPR2017)
Linjie Yang, Kevin Tang, Jianchao Yang, Li-Jia Li
[paper]]
Weakly Supervised Dense Video Captioning(CVPR2017)
Zhiqiang Shen, Jianguo Li, Zhou Su, Minjun Li, Yurong Chen, Yu-Gang Jiang, Xiangyang Xue
[paper]
[supp]
Hierarchical Boundary-Aware Neural Encoder for Video Captioning(CVPR2017)
Lorenzo Baraldi, Costantino Grana, Rita Cucchiara
[paper]
[supplement]
Attend to You: Personalized Image Captioning With Context Sequence Memory Networks(CVPR2017)
Cesc Chunseong Park, Byeongchang Kim, Gunhee Kim
[paper]
Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning(CVPR2017)
Jiasen Lu, Caiming Xiong, Devi Parikh, Richard Socher
[paper]
Deep Reinforcement Learning-Based Image Captioning With Embedding Reward(CVPR2017)
Zhou Ren, Xiaoyu Wang, Ning Zhang, Xutao Lv, Li-Jia Li
[paper]
Self-critical Sequence Training for Image Captioning(CVPR2017)
[paper]
Controlling Linguistic Style Aspects in Neural Language Generation
Jessica Ficler, Yoav Goldberg
[paper]
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data(CVPR2016)
[paper]
YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-shot Recognition(CVPR2013)
[paper]
Teaching Machines to Describe Images via Natural Language Feedback
Huan Ling, Sanja Fidler
[paper]
Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner
Tseng-Hung Chen, Yuan-Hong Liao, Ching-Yao Chuang, Wan-Ting Hsu, Jianlong Fu, Min Sun
[paper]
Generating Sentences from a Continuous Space
Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio
[paper]
FOIL it! Find One mismatch between Image and Language caption
Ravi Shekhar, Sandro Pezzelle, Yauhen Klimovich, Aurelie Herbelot, Moin Nabi, Enver Sangineto, Raffaella Bernardi
[paper]
Sort Story: Sorting Jumbled Images and Captions into Stories
Harsh Agrawal, Arjun Chandrasekaran, Dhruv Batra, Devi Parikh, Mohit Bansal
[paper]
Dense-Captioning Events in Videos
Ranjay Krishna, Kenji Hata, Frederic Ren, Li Fei-Fei, Juan Carlos Niebles
[paper]
[project]
[ArXivTimes]
Prominent Object Detection and Recognition: A Saliency-based Pipeline
Hamed R. Tavakoli, Jorma Laaksonen
[paper]
Deep Reinforcement Learning-based Image Captioning with Embedding Reward
Zhou Ren, Xiaoyu Wang, Ning Zhang, Xutao Lv, Li-Jia Li
[paper]
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data(CVPR2016)
[paper]
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Liwei Wang, Yin Li, Svetlana Lazebnik
[paper]
#Image to Caption Generation
Recurrent Topic-Transition GAN for Visual Paragraph Generation
Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing
[paper]
Zero-Shot Recognition using Dual Visual-Semantic Mapping Paths
Yanan Li, Donghui Wang, Huanhang Hu, Yuetan Lin, Yueting Zhuang
[paper]
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning
Jiasen Lu, Caiming Xiong, Devi Parikh, Richard Socher
[paper]
[arXivTimes]
Controllable Text Generation(ICML2017)
Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, Eric P. Xing
[paper]
Hierarchical Boundary-Aware Neural Encoder for Video Captioning
Lorenzo Baraldi, Costantino Grana, Rita Cucchiara
[paper]
Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations
Brent Harrison, Upol Ehsan, Mark O. Riedl
[paper]
Comparative Study of CNN and RNN for Natural Language Processing
[paper]
Phrase Localization and Visual Relationship Detection with Comprehensive Linguistic Cues(ICCV2017)
Bryan A. Plummer, Arun Mallya, Christopher M. Cervantes, Julia Hockenmaier, Svetlana Lazebnik
[paper]
Reference Based LSTM for Image Captioning(AAAI2017)
[[paper](Reference Based LSTM for Image Captioning)]
Text-guided Attention Model for Image Captioning(AAAI2017)
Jonghwan Mun, Minsu Cho, Bohyung Han
[paper]
Attention Correctness: Machine Perception vs Human Annotations in Neural Image Captioning(AAAI2017)
[paper]
ImageNet MPEG-7 Visual Descriptors - Technical Report
Frédéric Rayar
[paper]
Learning to Decode for Future Success
[paper]
not image captioning, but it may be useful for?
Incorporating Global Visual Features into Attention-Based Neural Machine Translation
[paper]
牛久先生bookmark
[link]
Image-Text Multi-Modal Representation Learning by Adversarial Backpropagation
Gwangbeen Park, Woobin Im
[paper]
Visual Storytelling(NAACL2016)
[paper]
Stating the Obvious: Extracting Visual Common Sense Knowledge(NAACL2016)
[paper]
Unsupervised Visual Sense Disambiguation for Verbs using Multimodal Embeddings(NAACL2016)
[paper]
Black Holes and White Rabbits:Metaphor Identification with Visual Features(NAACL2016bestlong)
[paper]
Rich Image Captioning in the Wild(CVPR2016workshop)
Rich Image Captioning in the Wild
[paper]
MSR-VTT: A Large Video Description Dataset for Bridging Video and Language(CVPR2016)
Jun Xu , Tao Mei , Ting Yao and Yong Rui
[paper]
Jointly Modeling Embedding and Translation to Bridge Video and Language(CVPR2016)
Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, and Yong Rui
[paper]
Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks(CVPR2016)
Haonan Yu Jiang Wang Zhiheng Huang Yi Yang Wei Xu
[paper]
Unsupervised Learning from Narrated Instruction Videos(CVPR2016)
Jean-Baptiste Alayrac Piotr Bojanowski Nishant Agrawal Josef Sivic
[paper]
Movie Description
Anna Rohrbach, Atousa Torabi, Marcus Rohrbach, Niket Tandon, Christopher Pal, Hugo Larochelle, Aaron Courville, Bernt Schiele
[paper]
Sequence to Sequence -- Video to Text
Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko
code]
Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics
[paper]
Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures(arXiv)
[paper]
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books (Arxiv,2015)
Yukun Zhu@1, Ryan Kiros@1, Richard Zemel@1, Ruslan Salakhutdinov@1, Raquel Urtasun@1, Antonio Torralba@2, Sanja Fidler@1 (@1:University of Toronto,@2:Massachusetts Institute of Technology)
[paper]
[code]
[data]
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, Yoshua Bengio
[paper]
[supplementary]
[code]
[video introduction]
[slide]
[chainer]
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models (TACL, 2015)
Ryan Kiros, Ruslan Salakhutdinov, Richard Zemel
[paper]
[demo]
[code]
Zero-Shot Learning by Convex Combination of Semantic Embeddings(ICLR2014)
[paper]
DeViSE: A Deep Visual-Semantic Embedding Model(NIPS2013)
[paper]
[[slide(ja)(http://www.slideshare.net/beam2d/nips2013-devise)]
##Survey
[link]
[link2]
A Survey of Current Datasets for Vision and Language Research
[paper]
##generation from root word
A Deep Learning Approach for Arabic Caption Generation using Roots-Words(AAAI2017)