Reconstructing fundamental frequency from noisy speech using initialized autoencoders

Zeledón Córdoba, Marisol; Sánchez Solís, Joseline; Coto Jiménez, Marvin

dc.creator	Zeledón Córdoba, Marisol
dc.creator	Sánchez Solís, Joseline
dc.creator	Coto Jiménez, Marvin
dc.date.accessioned	2022-03-22T16:20:29Z
dc.date.accessioned	2022-10-19T23:22:50Z
dc.date.available	2022-03-22T16:20:29Z
dc.date.available	2022-10-19T23:22:50Z
dc.date.created	2022-03-22T16:20:29Z
dc.date.issued	2020-10
dc.identifier	https://ieeexplore.ieee.org/abstract/document/9387643
dc.identifier	1548-0992
dc.identifier	https://hdl.handle.net/10669/86261
dc.identifier	10.1109/TLA.2020.9387643
dc.identifier	322-B9-105
dc.identifier.uri	https://repositorioslatinoamericanos.uchile.cl/handle/2250/4516097
dc.description.abstract	In this paper, we present a new approach for fundamental frequency (f0) detection in noisy speech, based on Long Short-term Memory Neural Networks (LSTM). f0 is one of the most important parameters of human speech. Its detection is relevant in many speech signal processing areas and remains an important challenge for severely degraded signals. In previous references for f0 detection in speech enhancement and noise reduction tasks, LSTM has been initialized with random weights, following a back-propagation through time algorithm to adjust them. Our proposal is an alternative for a more efficient initialization, based on the weights of an Autoassociative network. This initialization is a better starting point for the f0 detection in noisy speech. We show the advantages of pre-training using objective measures for the parameter and the training process, with artificial and natural noise added at different signal-to-noise levels. Results show the performance of the LSTM increases in comparison to the random initialization, and represents a significant improvement in comparison with traditional initialization of neural networks for f0 detection in noisy conditions.
dc.language	eng
dc.source	IEEE Latin America Transactions, vol.18(10), pp.1724-1731.
dc.subject	Deep learning
dc.subject	Fundamental frequency
dc.subject	Long short-term memory (LSTM)
dc.subject	NEURAL NETWORKS
dc.title	Reconstructing fundamental frequency from noisy speech using initialized autoencoders
dc.type	artículo científico

Este ítem pertenece a la siguiente institución

Universidad de Costa Rica