dc.contributorConant Pablos, Santiago Enrique
dc.contributorCampus Monterrey
dc.contributorCampus Monterrey
dc.contributorCampus Monterrey
dc.creatorSánchez Pámanes, Roberto
dc.date.accessioned2019-08-29T23:20:32Z
dc.date.accessioned2022-10-13T21:34:09Z
dc.date.available2019-08-29T23:20:32Z
dc.date.available2022-10-13T21:34:09Z
dc.date.created2019-08-29T23:20:32Z
dc.identifierSánchez-Pámanes, R. (2019). Learning temporal features of facial action units using deeplearning (Master's Thesis). Tecnologico de Monterrey.
dc.identifierhttp://hdl.handle.net/11285/633039
dc.identifier.urihttps://repositorioslatinoamericanos.uchile.cl/handle/2250/4220969
dc.description.abstractFacial expressions are an important aspect of human life and research on this topic has led to real-world technological applications. The task of recognizing facial states is involved in a collection of challenging tasks that include assisting elders and babies, as well as enhancing pedagogical exercises. Unlike categorizing faces into emotions, the Facial Action Coding System encode ambiguous expressions by analyzing small differences in the face based on muscle movements called action units. By analyzing action unit co-occurrences, human coders can virtually create any anatomically possible facial scenario that is independent of interpretation and can be used as a tool for higher-level decision processes. The automatic detection of action units in videos has recently become an interesting topic for the deep learning community since models of this area have dramatically improved the performance in image-related tasks. The state-of-the-art proposals in the benchmark database FERA17 are currently vanilla implementations of convolutional neural networks that model the occurrence of action units by ignoring their \emph{temporal features}. However, rather than being like a single snapshot, the occurrence of independent facial movements changes over time in response to information dynamically gathered from the environment, thus these deep models cannot completely capture the complex dynamic context involved in their occurrence. Researchers have engineered other deep learning methods that possess the ability to learn features across sequences of images. These procedures can be grouped into three categories, 1) methods that extend image-based architectures by using aggregation methods, or 2) recurrent units, and 3) methods that are able to process spatiotemporal features natively. They all offer the possibility of capturing AU dynamics and enhance their detection. However, their study has been frequently overlooked by the facial expression recognition community, particularly for AU occurrence detection, and up to these days, it is unclear whether deep learning models that incorporate temporal features can indeed outperform those who do not. This work analyzes the effects of incrementally adding temporal capabilities to the spatial model ResNet50 on predicting the occurrence of a single action unit of the FERA17 database. Configurations evaluated include inflating the kernels in the model to create a 3-dimensional version of ResNet50, adding a recurrent layer to encode long-term dependencies, and including the dense optical flow representation of two consecutive periods of time. Results show that adding recurrent units to a spatial model out-performs other temporal paradigms and the baseline ResNet50 by 7.4\% considering the $F_1$ score. The discoveries placed in this thesis can be utilized to better define deep learning initial implementations for projects related to facial expression recognition. Knowing the extent to which each temporal paradigm can effectively capture the dynamics inherent to AU occurrence, future research projects can be improved.
dc.publisherInstituto Tecnológico y de Estudios Superiores de Monterrey
dc.publisherInstituto Tecnológico y de Estudios Superiores de Monterrey
dc.relationversión publicada
dc.relation2019-05-21
dc.rightshttp://creativecommons.org/publicdomain/zero/1.0/
dc.rightsOpen Access
dc.subjectINGENIERÍA Y TECNOLOGÍA
dc.subjectINGENIERÍA Y TECNOLOGÍA::CIENCIAS TECNOLÓGICAS::TECNOLOGÍA DE LOS ORDENADORES
dc.titleLearning temporal features of facial action units using deep learning
dc.typeTesis de Maestría / master Thesis


Este ítem pertenece a la siguiente institución