dc.creatorCoto Jiménez, Marvin
dc.date.accessioned2022-03-25T20:04:54Z
dc.date.accessioned2022-10-20T01:20:34Z
dc.date.available2022-03-25T20:04:54Z
dc.date.available2022-10-20T01:20:34Z
dc.date.created2022-03-25T20:04:54Z
dc.date.issued2018
dc.identifierhttps://ieeexplore.ieee.org/document/8464204
dc.identifier978-1-5386-7506-9
dc.identifierhttps://hdl.handle.net/10669/86291
dc.identifier10.1109/IWOBI.2018.8464204
dc.identifier.urihttps://repositorioslatinoamericanos.uchile.cl/handle/2250/4539525
dc.description.abstractSeveral attempts to enhance statistical parametric speech synthesis have contemplated deep-learning-based postfilters, which learn to perform a mapping of the synthetic speech parameters to the natural ones, reducing the gap between them. In this paper, we introduce a new pre-training approach for neural networks, applied in LSTM-based postfilters for speech synthesis, with the objective of enhancing the quality of the synthesized speech in a more efficient manner. Our approach begins with an auto-regressive training of one LSTM network, whose is used as an initialization for postfilters based on a denoising autoencoder architecture. We show the advantages of this initialization on a set of multi-stream postfilters, which encompass a collection of denoising autoencoders for the set of MFCC and fundamental frequency parameters of the artificial voice. Results show that the initialization succeeds in lowering the training time of the LSTM networks and achieves better results in enhancing the statistical parametric speech in most cases, when compared to the common random-initialized approach of the networks.
dc.languageeng
dc.sourceIEEE International Work Conference on Bioinspired Intelligence (IWOBI). San Carlos, Costa Rica. 18-20 de julio de 2018
dc.subjectDeep learning
dc.subjectDenoising autoencoders
dc.subjectLong short-term memory (LSTM)
dc.subjectMachine learning
dc.subjectSignal processing
dc.subjectSpeech synthesis
dc.titlePre-training Long Short-term Memory neural networks for efficient regression in artificial speech postfiltering
dc.typecomunicación de congreso


Este ítem pertenece a la siguiente institución