Tese
Deep-based recurrent approaches for gesture recognition
Fecha
2020-06-12Autor
Igor Leonardo Oliveira Bastos
Institución
Resumen
The recognition of gestures corresponds to a mathematical interpretation of a human motion by a machine. It involves different aspects and parts of human body, such as variations in the positioning of hands and arms, facial and body expressions, head positioning and trunk posture. Since gesture recognition takes into account both appearance (appearance of body parts, for example) and movement, it is related to the extraction of spatiotemporal information in videos, leading to a wide range of applications. As a consequence, many approaches focus on this topic, presenting variations in terms of employed features and learning algorithms used on the task. However, despite the existence of a wide range of approaches related to the recognition of gestures, gaps are noticed regarding aspects such as scalability (in terms of the number of gestures), time to incorporate new gestures; and actuation over unsegmented videos, i.e., videos containing multiple gestures and no information about the start and end of these gestures. Thus, this work aims at presenting strategies that fill these gaps, addressed in two different lines: (i) creation of scalable models for incremental application in large databases; (ii) formulation of a model to detect and recognize gestures concomitantly, considering unsegmented videos. For an efficient performance on gesture videos, it is important to take into account the well-defined temporal structure of gestures, which preaches for the existence of ordered sub-events. To handle this order of sub-events, we propose models that are capable of extracting spatiotemporal information and also weigh this temporal structure, contemplating the contribution of previous inputs (previous videos snippets) to evaluate subsequent ones. Thereby, our models correlate information from different video parts, producing richer representations of gestures that are used for a more accurate recognition. Finally, to evaluate the proposed approach, we present the results obtained from the application of the models described in this document. These outcomes were obtained from tests on widely used databases, considering the metrics employed to evaluate performance on each of them. On ChaLearn LAP Isolated Gestures (ChaLearn
IsoGD) and Sheffield Kinect Gestures (SKIG), the method proposed in this document achieved 69.44% and 99.53% of accuracy, respectively. On ChaLearn Multimodal Gesture Recognition (ChaLearn Montalbano) and ChaLearn Continuous Gestures (ChaLearn ConGD), the method contemplated in this document obtained 0.919 and 0.623 as Jaccard Score, respectively. Comparisons with literature approaches evidence the good performance of the proposed methods, rivaling with state-of-the-art researches on all evaluated databases.