ANOSCAR: An image captioning model and dataset designed from OSCAR and the video dataset of activitynet

GONZALEZ MENDOZA, MIGUEL; 123361; Byrd Suárez, Emmanuel

dc.contributor	González Mendoza, Miguel
dc.contributor	School of Engineering and Sciences
dc.contributor	Ochoa Ruiz, Gilberto
dc.contributor	Marín Hernandez, Antonio
dc.contributor	Chang Fernández, Leonardo
dc.contributor	Campus Estado de México
dc.contributor	puemcuervo
dc.creator	GONZALEZ MENDOZA, MIGUEL; 123361
dc.creator	Byrd Suárez, Emmanuel
dc.date.accessioned	2023-04-26T17:36:14Z
dc.date.accessioned	2023-07-19T19:21:17Z
dc.date.available	2023-04-26T17:36:14Z
dc.date.available	2023-07-19T19:21:17Z
dc.date.created	2023-04-26T17:36:14Z
dc.date.issued	2021-07-01
dc.identifier	Byrd Suárez, E.(2021). ANOSCAR: An image captioning model and dataset designed from OSCAR and the video dataset of activitynet [Unpublished master's thesis]. Instituto Tecnológico de Estudios Superiores de Monterrey.
dc.identifier	https://hdl.handle.net/11285/650436
dc.identifier	https://orcid.org/0000-0002-9614-8944
dc.identifier.uri	https://repositorioslatinoamericanos.uchile.cl/handle/2250/7715995
dc.description.abstract	Activity Recognition and Classification in video sequences is an area of research that has received attention recently. However, video processing is computationally expensive, and its advances have not been as extraordinary compared to those of Image Captioning. This work uses a computationally limited environment and learns an Image Captioning transformation of the ActivityNet-Captions Video Dataset that can be used for either Video Captioning or Video Storytelling. Different Data Augmentation techniques for Natural Language Processing are explored and applied to the generated dataset in an effort to increase its validation scores. Our proposal includes an Image Captioning dataset obtained from ActivityNet with its features generated by Bottom-Up attention and a model to predict its captions, generated with OSCAR. Our captioning scores are slightly better than those of S2VT, but with a much simpler pipeline, showing a starting point for future research using our approach, which can be used for either Video Captioning or Video Storytelling. Finally, we propose different lines of research to how this work can be further expanded and improved.
dc.language	eng
dc.publisher	Instituto Tecnológico y de Estudios Superiores de Monterrey
dc.relation	draft
dc.relation	REPOSITORIO NACIONAL CONACYT
dc.relation	CONACYT
dc.rights	http://creativecommons.org/licenses/by-nc-nd/4.0
dc.rights	openAccess
dc.title	ANOSCAR: An image captioning model and dataset designed from OSCAR and the video dataset of activitynet
dc.type	Tesis de Maestría / master Thesis

Este ítem pertenece a la siguiente institución

Instituto Tecnológico de Monterrey (México)