Video Fill In the Blank

2016 – 2017

Bidirectional LSTMs with spatial-temporal attention to predict missing words in video descriptions.

vision-and-languagevideoattentionLSTM

Tackles the Video-Fill-In-the-Blank (VFIB) challenge by proposing a framework that uses dual LSTMs (left-to-right and right-to-left) for textual encoding of sentence fragments, integrated with external memory, and spatial and temporal attention models for visual encoding. The approach effectively selects discriminative visual features to accurately predict missing words in video descriptions.

Publication

Video Fill In the Blank using LR/RL LSTMs with Spatial-Temporal Attentions

ICCV 2017

Amir Mazaheri, Dong Zhang, Mubarak Shah

arXiv

Video Fill In the Blank

📄 Publication

Video Fill In the Blank using LR/RL LSTMs with Spatial-Temporal Attentions

Publication