Tesis de Maestría / master Thesis
ANOSCAR: An image captioning model and dataset designed from OSCAR and the video dataset of activitynet
Fecha
2021-07-01Registro en:
Byrd Suárez, E.(2021). ANOSCAR: An image captioning model and dataset designed from OSCAR and the video dataset of activitynet [Unpublished master's thesis]. Instituto Tecnológico de Estudios Superiores de Monterrey.
Autor
GONZALEZ MENDOZA, MIGUEL; 123361
Byrd Suárez, Emmanuel
Institución
Resumen
Activity Recognition and Classification in video sequences is an area of research that has
received attention recently. However, video processing is computationally expensive, and its
advances have not been as extraordinary compared to those of Image Captioning. This work
uses a computationally limited environment and learns an Image Captioning transformation of
the ActivityNet-Captions Video Dataset that can be used for either Video Captioning or Video
Storytelling. Different Data Augmentation techniques for Natural Language Processing are
explored and applied to the generated dataset in an effort to increase its validation scores. Our
proposal includes an Image Captioning dataset obtained from ActivityNet with its features
generated by Bottom-Up attention and a model to predict its captions, generated with OSCAR.
Our captioning scores are slightly better than those of S2VT, but with a much simpler pipeline,
showing a starting point for future research using our approach, which can be used for either
Video Captioning or Video Storytelling. Finally, we propose different lines of research to how
this work can be further expanded and improved.