dc.creatorMendonça, Gustavo
dc.creatorAluisio, Sandra Maria
dc.date.accessioned2015-05-29T14:36:19Z
dc.date.accessioned2018-07-04T17:05:16Z
dc.date.available2015-05-29T14:36:19Z
dc.date.available2018-07-04T17:05:16Z
dc.date.created2015-05-29T14:36:19Z
dc.date.issued2014-09
dc.identifierAnnual Conference of the International Speech Communication Association, 15th, 2014, Singapore.
dc.identifier1990-9770
dc.identifierhttp://www.producao.usp.br/handle/BDPI/48885
dc.identifierhttp://www.isca-speech.org/archive/archive_papers/interspeech_2014/i14_1278.pdf
dc.identifier.urihttp://repositorioslatinoamericanos.uchile.cl/handle/2250/1644477
dc.description.abstractThis paper describes the method employed to build a machinereadable pronunciation dictionary for Brazilian Portuguese. The dictionary makes use of a hybrid approach for converting graphemes into phonemes, based on both manual transcription rules and machine learning algorithms. It makes use of a word list compiled from the Portuguese Wikipedia dump. Wikipedia articles were transformed into plain text, tokenized and word types were extracted. A language identification tool was developed to detect loanwords among data. Words’ syllable boundaries and stress were identified. The transcription task was carried out in a two-step process: i) words are submitted to a set of transcription rules, in which predictable graphemes (mostly consonants) are transcribed; ii) a machine learning classifier is used to predict the transcription of the remaining graphemes (mostly vowels). The method was evaluated through 5-fold cross-validation; results show a F1-score of 0.98. The dictionary and all the resources used to build it were made publicly available.
dc.languageeng
dc.publisherChinese and Oriental Languages Information Processing Society - COLIPS
dc.publisherInstitute for Infocomm Research - I2R
dc.publisherInternational Speech Communication Association - ISCA
dc.publisherSingapore
dc.relationAnnual Conference of the International Speech Communication Association, 15th
dc.rightsCopyright ISCA
dc.rightsopenAccess
dc.subjectpronunciation dictionary
dc.subjectgrapheme to phoneme conversion
dc.subjecttext to speech
dc.titleUsing a hybrid approach to build a pronunciation dictionary for brazilian portuguese
dc.typeActas de congresos


Este ítem pertenece a la siguiente institución