dc.contributorhttps://orcid.org/0000-0002-7337-8974
dc.contributorhttps://orcid.org/0000-0002-8060-6170
dc.creatorBecerra de la Rosa, Aldonso
dc.creatorDe la Rosa Vargas, José Ismael
dc.creatorGonzález Ramírez, Efrén
dc.creatorPedroza Ramírez, Ángel David
dc.creatorMartínez, Juan Manuel
dc.creatorEscalante, Nivia
dc.date.accessioned2020-05-06T20:42:07Z
dc.date.available2020-05-06T20:42:07Z
dc.date.created2020-05-06T20:42:07Z
dc.date.issued2017-11
dc.identifier2573-0770
dc.identifierhttp://ricaxcan.uaz.edu.mx/jspui/handle/20.500.11845/1894
dc.identifierhttps://doi.org/10.48779/9ds7-t936
dc.description.abstractThe aim of this paper is to present two new variations of the frame-level cost function for training a Deep neural network in order to achieve better word error rates in speech recognition. Minimization functions of a neural network are salient aspects to deal with when researchers are working on machine learning, and hence their improvement is a process of constant evolution. In the first proposed method, the conventional cross-entropy function can be mapped to a nonuniform loss function based on its corresponding extropy (a complementary dual function), enhancing the frames that have ambiguity in their belonging to specific senones (tied-triphone states in a hidden Markov model). The second proposition is a fusion of the proposed mapped cross-entropy and the boosted cross-entropy function, which emphasizes those frames with low target posterior probability. The developed approaches have been performed by using a personalized mid-vocabulary speaker-independent voice corpus. This dataset is employed for recognition of digit strings and personal name lists in Spanish from the northern central part of Mexico on a connected-words phone dialing task. A relative word error rate improvement of 12.3% and 10.7% is obtained with the two proposed approaches, respectively, regarding the conventional well-established crossentropy objective function.
dc.languageeng
dc.publisherIEEE
dc.relationgeneralPublic
dc.rightshttp://creativecommons.org/licenses/by-nc-nd/3.0/us/
dc.rightsAtribución-NoComercial-SinDerivadas 3.0 Estados Unidos de América
dc.sourceProc. of the IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC2017), at Ixtapa, Mexico, pp. 1-6, 2017.
dc.titleSpeech recognition using deep neural networks trained with non-uniform frame-level cost functions
dc.typeinfo:eu-repo/semantics/conferencePaper


Este ítem pertenece a la siguiente institución