Emilia: A speech corpus for Argentine Spanish text to speech synthesis

Torres, Humberto Maximiliano; Gurlekian, Jorge Alberto; Evin, Diego Alexis; Cossio Mercado, Christian Gustavo

dc.creator	Torres, Humberto Maximiliano
dc.creator	Gurlekian, Jorge Alberto
dc.creator	Evin, Diego Alexis
dc.creator	Cossio Mercado, Christian Gustavo
dc.date.accessioned	2020-08-31T14:09:11Z
dc.date.accessioned	2022-10-15T12:22:23Z
dc.date.available	2020-08-31T14:09:11Z
dc.date.available	2022-10-15T12:22:23Z
dc.date.created	2020-08-31T14:09:11Z
dc.date.issued	2019-09
dc.identifier	Torres, Humberto Maximiliano; Gurlekian, Jorge Alberto; Evin, Diego Alexis; Cossio Mercado, Christian Gustavo; Emilia: A speech corpus for Argentine Spanish text to speech synthesis; Springer; Language Resources And Evaluation; 53; 3; 9-2019; 419-447
dc.identifier	1574-020X
dc.identifier	http://hdl.handle.net/11336/112712
dc.identifier	1574-0218
dc.identifier	CONICET Digital
dc.identifier	CONICET
dc.identifier.uri	https://repositorioslatinoamericanos.uchile.cl/handle/2250/4385483
dc.description.abstract	This paper introduces Emilia, a speech corpus created to build a female voice in Spanish spoken in Buenos Aires for the Aromo text-to-speech system. Aromo is a unit selection text-to-speech system, which employs diphones as units of synthesis. The key requirements and design criteria for Emilia were: to synthesize any text in Spanish into high-quality speech with a minimum corpus size. The text corpus was designed to guarantee the phonetic and prosodic coverage. A three-stage strategy was used: in the first stage, 741 sentences were designed with all of the syllables of Spanish spoken in Argentina, with and without stress, and in all positions within the word; in the second stage, 852 sentences were added to balance out the distribution of the diphones; and after a perceptual evaluation of the quality of synthesized speech, in the third and final stage, 625 sentences were added to achieve the specified unit coverage, and to introduce sentences with more complex syntactic and prosodic structures. Issues from all three corpus building stages are reported. The paper also presents the results from the quality perceptual evaluations of the synthesized voice. Emilia has a duration of three hours and 15 minutes; its speech quality synthesized with Aromo system is similar to the level obtained with commercial systems, with a real-time ratio less than one.
dc.language	eng
dc.publisher	Springer
dc.relation	info:eu-repo/semantics/altIdentifier/url/http://link.springer.com/10.1007/s10579-019-09447-7
dc.relation	info:eu-repo/semantics/altIdentifier/doi/http://dx.doi.org/10.1007/s10579-019-09447-7
dc.rights	https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.rights	info:eu-repo/semantics/restrictedAccess
dc.subject	ARGENTINE SPANISH
dc.subject	PHONETIC CORPUS
dc.subject	PHONETIC TRANSCRIPTION
dc.subject	SPEECH CORPUS DESIGN
dc.subject	TEXT-TO-SPEECH
dc.title	Emilia: A speech corpus for Argentine Spanish text to speech synthesis
dc.type	info:eu-repo/semantics/article
dc.type	info:ar-repo/semantics/artículo
dc.type	info:eu-repo/semantics/publishedVersion

Este ítem pertenece a la siguiente institución

Consejo Nacional de Investigaciones Científicas y Tecnológicas (Argentina)