Un método para la identificación automática del lenguaje hablado basado en características suprasegmentales

ANA LILIA REYES HERRERA

info:eu-repo/semantics/doctoralThesis

Registro en:

http://inaoe.repositorioinstitucional.mx/jspui/handle/1009/646

https://repositorioslatinoamericanos.uchile.cl/handle/2250/7805864

Autor

ANA LILIA REYES HERRERA

Institución

Instituto Nacional de Astrofísica, Óptica y Electrónica (México)

Resumen

The Automatic Language Identification problem consists on recognizing a language based on a sample of speech from an unknown speaker by computational means. The Language Identification has several applications. For example, the telephone companies want to have an efficient identifier of languages for foreign speakers and then to be able to send calls to a capable user to understand the language; a system of multilingual translation with more than two or three languages needs an automatic language identification system as first step to select the system of appropriate translation; And of course, the governments around the world have been interested for a long time in a language identification system for monitoring and security purposes. At present, the best automatic language identification systems use the phonotactic content of each language, that is, they depend on the phonetic representation of the speech signal. In spite of good results of these methods, they depend on a prior linguistic study for each one of the languages to be identified. Those methods are based on the segmentation of the speech signal in phonemes, and on the use of language models which capture all possible combinations of phonemes from a particular language to determine the language. The present doctoral investigation proposes the creation of a method for language identification based on information extracted directly of the speech signal without requiring a phonetic processing module. The results of the investigation are two new methods for the extraction of characteristic acoustics directly of the speech signal applied to the language identification. The first one is based on the use of the Fourier transform, specifically on the use of the Mel Frequency Cepstral Coefficients (MFCC) and the second method is based on the Wavelet transform. Results of these two new methods achieve better results than the state of the art methods. And finally, these new methods were applied to the identification of Mexican native languages, since these native languages do not have phonetics transcription, with very good results. With these results we create the possibility to obtain an automatic language identification of México’s native languages.

La identificación automática del lenguaje hablado consiste en determinar por medios computacionales el idioma de quien habla basándose sólo en una muestra de voz sin considerar al hablante o lo que está diciendo. La identificación automática del lenguaje hablado tiene muy diversas aplicaciones. Por ejemplo, las compañías de teléfono quisieran tener un identificador de idiomas eficiente para los hablantes extranjeros y así poder reenviar sus llamadas a un operador capaz de comprender dicho idioma; un sistema de traducción multilingüe con más de dos o tres idiomas necesita un sistema de identificación de lenguaje hablado como primer paso para seleccionar el sistema de traducción apropiado; además, los gobiernos alrededor del mundo han estado interesados por mucho tiempo en un sistema identificador de idiomas para propósitos de monitoreo y seguridad. En la actualidad, los mejores sistemas automáticos de identificación de lenguaje hablado utilizan información lingüística para la tipificación del idioma, es decir, dependen de la representación fonética de la señal de voz. A pesar de los buenos resultados de estos métodos, se depende de un estudio lingüístico previo para cada uno de los lenguajes a identificar; y en base a este estudio, se pueden establecer los valores de los parámetros de identificación. La presente investigación doctoral propone un método para la identificación del lenguaje hablado basado en información extraída directamente de la señal acústica sin requerir de un módulo de tratamiento fonético. Dos nuevos métodos son propuestos para la extracción de características acústicas directamente de la señal de voz aplicada a la identificación de los idiomas. Uno basado en el uso de la transformada de Fourier, específicamente el uso de los coeficientes cepstrales de frecuencia Mel (MFCC por sus siglas en inglés) y un segundo método basado en la transformada Wavelet. Los resultados de aplicar estos dos nuevos métodos superaron lo reportado en el estado del arte. Estos resultados son muy alentadores, creando la posibilidad de obtener sistemas automáticos de identificación de lenguas con escasos recursos lingüísticos, como es el caso de la mayoría de las lenguas indígenas de México.

Materias

info:eu-repo/classification/Reconocimiento de voz/Speech recognition

info:eu-repo/classification/Transformada wavelet/Wavelet transform

info:eu-repo/classification/Procesamiento del discurso/Processing speech

info:eu-repo/classification/cti/7

info:eu-repo/classification/cti/33

info:eu-repo/classification/cti/3304

info:eu-repo/classification/cti/330413

Mostrar el registro completo del ítem