dc.contributorGonzález Mendoza, Miguel
dc.contributorSchool of Engineering and Sciences
dc.contributorOchoa Ruiz, Gilberto
dc.contributorMarín Hernandez, Antonio
dc.contributorChang Fernández, Leonardo
dc.contributorCampus Estado de México
dc.contributorpuemcuervo
dc.creatorGONZALEZ MENDOZA, MIGUEL; 123361
dc.creatorByrd Suárez, Emmanuel
dc.date.accessioned2023-04-26T17:36:14Z
dc.date.accessioned2023-07-19T19:21:17Z
dc.date.available2023-04-26T17:36:14Z
dc.date.available2023-07-19T19:21:17Z
dc.date.created2023-04-26T17:36:14Z
dc.date.issued2021-07-01
dc.identifierByrd Suárez, E.(2021). ANOSCAR: An image captioning model and dataset designed from OSCAR and the video dataset of activitynet [Unpublished master's thesis]. Instituto Tecnológico de Estudios Superiores de Monterrey.
dc.identifierhttps://hdl.handle.net/11285/650436
dc.identifierhttps://orcid.org/0000-0002-9614-8944
dc.identifier.urihttps://repositorioslatinoamericanos.uchile.cl/handle/2250/7715995
dc.description.abstractActivity Recognition and Classification in video sequences is an area of research that has received attention recently. However, video processing is computationally expensive, and its advances have not been as extraordinary compared to those of Image Captioning. This work uses a computationally limited environment and learns an Image Captioning transformation of the ActivityNet-Captions Video Dataset that can be used for either Video Captioning or Video Storytelling. Different Data Augmentation techniques for Natural Language Processing are explored and applied to the generated dataset in an effort to increase its validation scores. Our proposal includes an Image Captioning dataset obtained from ActivityNet with its features generated by Bottom-Up attention and a model to predict its captions, generated with OSCAR. Our captioning scores are slightly better than those of S2VT, but with a much simpler pipeline, showing a starting point for future research using our approach, which can be used for either Video Captioning or Video Storytelling. Finally, we propose different lines of research to how this work can be further expanded and improved.
dc.languageeng
dc.publisherInstituto Tecnológico y de Estudios Superiores de Monterrey
dc.relationdraft
dc.relationREPOSITORIO NACIONAL CONACYT
dc.relationCONACYT
dc.rightshttp://creativecommons.org/licenses/by-nc-nd/4.0
dc.rightsopenAccess
dc.titleANOSCAR: An image captioning model and dataset designed from OSCAR and the video dataset of activitynet
dc.typeTesis de Maestría / master Thesis


Este ítem pertenece a la siguiente institución