dc.creator | Mendonça, Gustavo | |
dc.creator | Aluisio, Sandra Maria | |
dc.date.accessioned | 2015-05-29T14:36:19Z | |
dc.date.accessioned | 2018-07-04T17:05:16Z | |
dc.date.available | 2015-05-29T14:36:19Z | |
dc.date.available | 2018-07-04T17:05:16Z | |
dc.date.created | 2015-05-29T14:36:19Z | |
dc.date.issued | 2014-09 | |
dc.identifier | Annual Conference of the International Speech Communication Association, 15th, 2014, Singapore. | |
dc.identifier | 1990-9770 | |
dc.identifier | http://www.producao.usp.br/handle/BDPI/48885 | |
dc.identifier | http://www.isca-speech.org/archive/archive_papers/interspeech_2014/i14_1278.pdf | |
dc.identifier.uri | http://repositorioslatinoamericanos.uchile.cl/handle/2250/1644477 | |
dc.description.abstract | This paper describes the method employed to build a machinereadable pronunciation dictionary for Brazilian Portuguese. The dictionary makes use of a hybrid approach for converting graphemes into phonemes, based on both manual transcription rules and machine learning algorithms. It makes use of a word list compiled from the Portuguese Wikipedia dump. Wikipedia articles were transformed into plain text, tokenized and word types were extracted. A language identification tool was developed to detect loanwords among data. Words’ syllable boundaries and stress were identified. The transcription task was carried
out in a two-step process: i) words are submitted to a set of transcription rules, in which predictable graphemes (mostly consonants) are transcribed; ii) a machine learning classifier is used to predict the transcription of the remaining graphemes (mostly vowels). The method was evaluated through 5-fold cross-validation; results show a F1-score of 0.98. The dictionary and all the resources used to build it were made publicly available. | |
dc.language | eng | |
dc.publisher | Chinese and Oriental Languages Information Processing Society - COLIPS | |
dc.publisher | Institute for Infocomm Research - I2R | |
dc.publisher | International Speech Communication Association - ISCA | |
dc.publisher | Singapore | |
dc.relation | Annual Conference of the International Speech Communication Association, 15th | |
dc.rights | Copyright ISCA | |
dc.rights | openAccess | |
dc.subject | pronunciation dictionary | |
dc.subject | grapheme to phoneme conversion | |
dc.subject | text to speech | |
dc.title | Using a hybrid approach to build a pronunciation dictionary for brazilian portuguese | |
dc.type | Actas de congresos | |