dc.contributorWilliam Robson Schwartz
dc.contributorhttp://lattes.cnpq.br/0704592200063682
dc.contributorErickson Rangel do Nascimento
dc.contributorGuillermo Camara Chávez
dc.contributorLeandro Augusto Frata Fernandes
dc.contributorRicardo da Silva Torres
dc.creatorIgor Leonardo Oliveira Bastos
dc.date.accessioned2022-01-17T16:48:08Z
dc.date.accessioned2022-10-04T00:59:05Z
dc.date.available2022-01-17T16:48:08Z
dc.date.available2022-10-04T00:59:05Z
dc.date.created2022-01-17T16:48:08Z
dc.date.issued2020-06-12
dc.identifierhttp://hdl.handle.net/1843/39110
dc.identifierhttps://orcid.org/0000-0001-6998-4771
dc.identifier.urihttp://repositorioslatinoamericanos.uchile.cl/handle/2250/3838002
dc.description.abstractThe recognition of gestures corresponds to a mathematical interpretation of a human motion by a machine. It involves different aspects and parts of human body, such as variations in the positioning of hands and arms, facial and body expressions, head positioning and trunk posture. Since gesture recognition takes into account both appearance (appearance of body parts, for example) and movement, it is related to the extraction of spatiotemporal information in videos, leading to a wide range of applications. As a consequence, many approaches focus on this topic, presenting variations in terms of employed features and learning algorithms used on the task. However, despite the existence of a wide range of approaches related to the recognition of gestures, gaps are noticed regarding aspects such as scalability (in terms of the number of gestures), time to incorporate new gestures; and actuation over unsegmented videos, i.e., videos containing multiple gestures and no information about the start and end of these gestures. Thus, this work aims at presenting strategies that fill these gaps, addressed in two different lines: (i) creation of scalable models for incremental application in large databases; (ii) formulation of a model to detect and recognize gestures concomitantly, considering unsegmented videos. For an efficient performance on gesture videos, it is important to take into account the well-defined temporal structure of gestures, which preaches for the existence of ordered sub-events. To handle this order of sub-events, we propose models that are capable of extracting spatiotemporal information and also weigh this temporal structure, contemplating the contribution of previous inputs (previous videos snippets) to evaluate subsequent ones. Thereby, our models correlate information from different video parts, producing richer representations of gestures that are used for a more accurate recognition. Finally, to evaluate the proposed approach, we present the results obtained from the application of the models described in this document. These outcomes were obtained from tests on widely used databases, considering the metrics employed to evaluate performance on each of them. On ChaLearn LAP Isolated Gestures (ChaLearn IsoGD) and Sheffield Kinect Gestures (SKIG), the method proposed in this document achieved 69.44% and 99.53% of accuracy, respectively. On ChaLearn Multimodal Gesture Recognition (ChaLearn Montalbano) and ChaLearn Continuous Gestures (ChaLearn ConGD), the method contemplated in this document obtained 0.919 and 0.623 as Jaccard Score, respectively. Comparisons with literature approaches evidence the good performance of the proposed methods, rivaling with state-of-the-art researches on all evaluated databases.
dc.publisherUniversidade Federal de Minas Gerais
dc.publisherBrasil
dc.publisherICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃO
dc.publisherPrograma de Pós-Graduação em Ciência da Computação
dc.publisherUFMG
dc.rightshttp://creativecommons.org/licenses/by-nc-sa/3.0/pt/
dc.rightsAcesso Aberto
dc.subjectGesture recognition
dc.subjectRecurrent neural networks
dc.subjectSpatiotemporal information
dc.subjectIsolated gestures
dc.subjectUnsegmented videos
dc.titleDeep-based recurrent approaches for gesture recognition
dc.typeTese


Este ítem pertenece a la siguiente institución