info:eu-repo/semantics/article
Speech recognition in a dialog system: from conventional to deep processing A case study applied to Spanish
Fecha
2018-08Registro en:
1380-7501
1573-7721
Autor
Becerra, Aldonso
De la Rosa Vargas, José Ismael
González Ramírez, Efrén
Institución
Resumen
The aim of this paper is to illustrate an overview of the automatic speech recognition
(ASR) module in a spoken dialog system and how it has evolved from the conventional
GMM-HMM (Gaussian mixture model - hidden Markov model) architecture toward the
recent nonlinear DNN-HMM (deep neural network) scheme. GMMs have dominated for a
long time the baseline of speech recognition, but in the past years with the resurgence of
artificial neural networks (ANNs), the former models have been surpassed in most recognition tasks. An outstanding consideration for ANNs-based acoustic model is the fact that their weights can be adjusted in two training steps: i) initialization of the weights (with or without pre-training) and ii) fine-tuning.