dc.contributor | González Mendoza, Miguel | |
dc.contributor | School of Engineering and Sciences | |
dc.contributor | Ochoa Ruiz, Gilberto | |
dc.contributor | Marín Hernandez, Antonio | |
dc.contributor | Chang Fernández, Leonardo | |
dc.contributor | Campus Estado de México | |
dc.contributor | puemcuervo | |
dc.creator | GONZALEZ MENDOZA, MIGUEL; 123361 | |
dc.creator | Byrd Suárez, Emmanuel | |
dc.date.accessioned | 2023-04-26T17:36:14Z | |
dc.date.accessioned | 2023-07-19T19:21:17Z | |
dc.date.available | 2023-04-26T17:36:14Z | |
dc.date.available | 2023-07-19T19:21:17Z | |
dc.date.created | 2023-04-26T17:36:14Z | |
dc.date.issued | 2021-07-01 | |
dc.identifier | Byrd Suárez, E.(2021). ANOSCAR: An image captioning model and dataset designed from OSCAR and the video dataset of activitynet [Unpublished master's thesis]. Instituto Tecnológico de Estudios Superiores de Monterrey. | |
dc.identifier | https://hdl.handle.net/11285/650436 | |
dc.identifier | https://orcid.org/0000-0002-9614-8944 | |
dc.identifier.uri | https://repositorioslatinoamericanos.uchile.cl/handle/2250/7715995 | |
dc.description.abstract | Activity Recognition and Classification in video sequences is an area of research that has
received attention recently. However, video processing is computationally expensive, and its
advances have not been as extraordinary compared to those of Image Captioning. This work
uses a computationally limited environment and learns an Image Captioning transformation of
the ActivityNet-Captions Video Dataset that can be used for either Video Captioning or Video
Storytelling. Different Data Augmentation techniques for Natural Language Processing are
explored and applied to the generated dataset in an effort to increase its validation scores. Our
proposal includes an Image Captioning dataset obtained from ActivityNet with its features
generated by Bottom-Up attention and a model to predict its captions, generated with OSCAR.
Our captioning scores are slightly better than those of S2VT, but with a much simpler pipeline,
showing a starting point for future research using our approach, which can be used for either
Video Captioning or Video Storytelling. Finally, we propose different lines of research to how
this work can be further expanded and improved. | |
dc.language | eng | |
dc.publisher | Instituto Tecnológico y de Estudios Superiores de Monterrey | |
dc.relation | draft | |
dc.relation | REPOSITORIO NACIONAL CONACYT | |
dc.relation | CONACYT | |
dc.rights | http://creativecommons.org/licenses/by-nc-nd/4.0 | |
dc.rights | openAccess | |
dc.title | ANOSCAR: An image captioning model and dataset designed from OSCAR and the video dataset of activitynet | |
dc.type | Tesis de Maestría / master Thesis | |