Trabajo de grado - Maestría
Machine translation strategies for low-resource colombian indigenous languages
Fecha
2022-07-27Registro en:
instname:Universidad de los Andes
reponame:Repositorio Institucional Séneca
Autor
Salazar Cárdenas, Iván David
Institución
Resumen
Low-resource languages are a challenging field for machine translation and natural language
processing. During the past years, a lot of efforts have been made in the search for strategies
that can counter the scarcity of written and spoken material for these languages. Among these
efforts Transformer architecture and Transfer Learning have been used as strategies to work
in the low-resource environment, but the results are not conclusive about their effectiveness.
American indigenous languages are good examples of low-resource languages since they have a
big amount of written and spoken sources, and obtaining them is particularly complicated. In
this thesis, we experiment with the Transformer architecture and Transfer Learning using as a
study case two Colombian indigenous languages. We aim to find which combination of strategies
can be more beneficial to the translation scores of the models. This way we can help in the task
of preserving the endangered languages