dc.contributor | Manrique Piramanrique, Rubén Francisco | |
dc.contributor | Cardozo Álvarez, Nicolás | |
dc.contributor | Moreno Barbosa, Andrés Darío | |
dc.contributor | FLAG | |
dc.creator | Salazar Cárdenas, Iván David | |
dc.date.accessioned | 2022-10-28T16:19:56Z | |
dc.date.accessioned | 2023-09-06T23:41:37Z | |
dc.date.available | 2022-10-28T16:19:56Z | |
dc.date.available | 2023-09-06T23:41:37Z | |
dc.date.created | 2022-10-28T16:19:56Z | |
dc.date.issued | 2022-07-27 | |
dc.identifier | http://hdl.handle.net/1992/62941 | |
dc.identifier | instname:Universidad de los Andes | |
dc.identifier | reponame:Repositorio Institucional Séneca | |
dc.identifier | repourl:https://repositorio.uniandes.edu.co/ | |
dc.identifier.uri | https://repositorioslatinoamericanos.uchile.cl/handle/2250/8726743 | |
dc.description.abstract | Low-resource languages are a challenging field for machine translation and natural language
processing. During the past years, a lot of efforts have been made in the search for strategies
that can counter the scarcity of written and spoken material for these languages. Among these
efforts Transformer architecture and Transfer Learning have been used as strategies to work
in the low-resource environment, but the results are not conclusive about their effectiveness.
American indigenous languages are good examples of low-resource languages since they have a
big amount of written and spoken sources, and obtaining them is particularly complicated. In
this thesis, we experiment with the Transformer architecture and Transfer Learning using as a
study case two Colombian indigenous languages. We aim to find which combination of strategies
can be more beneficial to the translation scores of the models. This way we can help in the task
of preserving the endangered languages | |
dc.language | eng | |
dc.publisher | Universidad de los Andes | |
dc.publisher | Maestría en Ingeniería de Sistemas y Computación | |
dc.publisher | Facultad de Ingeniería | |
dc.publisher | Departamento de Ingeniería Sistemas y Computación | |
dc.relation | Alp Öktem, Mirko Plitt, and Grace Tang. Tigrinya neural machine translation with transfer
learning for humanitarian response, 2020. URL https://arxiv.org/abs/2003.11523 | |
dc.relation | José Álvarez. Curso Inicial de Lengua Wayuu: Lectoescritura y gramática básica, 2008 | |
dc.relation | Rui Wang, Xu Tan, Renqian Luo, Tao Qin, and Tie-Yan Liu. A survey on low-resource
neural machine translation, 2021. URL https://arxiv.org/abs/2107.04239 | |
dc.relation | Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N.
Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2017. URL
https://arxiv.org/abs/1706.03762 | |
dc.relation | Elan van Biljon, Arnu Pretorius, and Julia Kreutzer. On optimal transformer depth for
low-resource language translation, 2020. URL https://arxiv.org/abs/2004.04418 | |
dc.relation | United Nations High Commissioner for Refugees UNHCR. Comunidades indígenas en
Colombia, 2011 | |
dc.relation | Matt Post. A call for clarity in reporting bleu scores, 2018. URL https://arxiv.org/abs/
1804.08771 | |
dc.relation | Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan,
Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas
Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy,
Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style,
high-performance deep learning library, 2019. URL https://arxiv.org/abs/1912.01703 | |
dc.relation | NLLB Team, Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj
Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers,
Safiyyah Saleem, Holger Schwenk, and Jeff Wang. No language left behind: Scaling human-centered machine translation, 2022. URL https://arxiv.org/abs/2207.0467 | |
dc.relation | Nitika Mathur, Timothy Baldwin, and Trevor Cohn. Tangled up in bleu: Reevaluating
the evaluation of automatic machine translation evaluation metrics, 2020. URL https:
//arxiv.org/abs/2006.06264 | |
dc.relation | Manuel Mager, Arturo Oncevay, Annette Rios, Ivan Vladimir Meza Ruiz, Alexis Palmer,
Graham Neubig, and Katharina Kann, editors. Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, Online, June
2021. Association for Computational Linguistics. URL https://aclanthology.org/2021.
americasnlp-1.0 | |
dc.relation | Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training
of deep bidirectional transformers for language understanding, 2018. URL https://arxiv.
org/abs/1810.04805 | |
dc.relation | Ife Adebara, Muhammad Abdul-Mageed, and Miikka Silfverberg. Translating the unseen?
yoruba-english mt in low-resource, morphologically-unmarked settings, 2021. URL https:
//arxiv.org/abs/2103.04225 | |
dc.relation | Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. Transfer learning for low-resource neural machine translation. In Proceedings of the 2016 Conference on Empirical
Methods in Natural Language Processing, pages 1568-1575, Austin, Texas, November 2016.
Association for Computational Linguistics. DOI 10.18653/v1/D16-1163. URL https://
aclanthology.org/D16-1163 | |
dc.relation | Delfino Zacarias and Ivan Meza. Ayuuk-Spanish Neural Machine Translator. In Proceedings of the AmericasNLP 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas, pages 168-172. Association for Computational Linguistics, 2021 | |
dc.relation | Raúl Vázquez, Yves Scherrer, Sami Virpioja, and Jörg Tiedemann. The Helsinki submission to the AmericasNLP shared task. In Proceedings of the AmericasNLP 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas, pages 255-264. Association for Computational Linguistics, 2021 | |
dc.relation | Enrique Uribe-Jongbloed and Carl Anderson. Indigenous and minority languages in colombia: The current situation. Zeszyty Luzyckie, 48, 12 | |
dc.relation | Jörg Tiedemann and Santhosh Thottingal. OPUS-MT - building open translation services for the world. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, pages 479-480, Lisboa, Portugal, November 2020. European Association for Machine Translation. URL https://aclanthology.org/2020.eamt-1.61 | |
dc.relation | Luz Marina Sierra Martínez, Carlos Alberto Cobos, Juan Carlos Corrales, Tulio Rojas, and Luis Carlos Gómez. Sistema de Recuperación de Información para Apoyar la Revitalización del Nasa Yuwe. Iberian Journal of Information Systems and Technologies, 17:407-422, 2019 | |
dc.relation | Luz Marina Sierra Martínez, Carlos Alberto Cobos, Juan Carlos Corrales Muñoz, Tulio Rojas Curieux, Enrique Herrera-Viedma, and Diego Hernán Peluffo-Ordóñez. Building a Nasa Yuwe Language Corpus and Tagging with a Metaheuristic Approach. Computación y Sistemas, 22:881-894, 09 2018. ISSN 1405-5546. URL http://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S1405-55462018000300881&nrm=iso | |
dc.relation | Luz Marina Sierra Martínez, Carlos Cobos, and Juan Corrales. Tokenizer adapted for the nasa yuwe language. Computacion y Sistemas, 20:355364, 09 2016. DOI 10.13053/ CyS-20-3-2455 | |
dc.relation | Luz Marina Sierra, Carlos Alberto Cobos, Juan Carlos Corrales, and Tulio Rojas Curieux. Building a nasa yuwe language test collection. In Alexander Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, pages 112-123, Cham, 2015. Springer International Publishing. ISBN 978-3-319-18111-0 | |
dc.relation | Michael Przystupa and Muhammad Abdul-Mageed. Neural machine translation of low-resource and similar languages with backtranslation. In Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2), pages 224-235, Florence, Italy, August 2019. Association for Computational Linguistics. DOI 10.18653/v1/W19-5431. URL https://aclanthology.org/W19-5431 | |
dc.relation | Maja Popovic. chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, pages 392-395, Lisbon, Portugal, September 2015. Association for Computational Linguistics. DOI 10.18653/v1/W15-3049. URL https://aclanthology.org/W15-3049 | |
dc.relation | Shantipriya Parida, Subhadarshi Panda, Amulya Dash, Esau Villatoro-Tello, A. Seza Dogruöz, Rosa M. Ortega-Mendoza, Amadeo Hernández, Yashvardhan Sharma, and Petr Motlicek. Open Machine Translation for Low Resource South American Languages. In Proceedings of the AmericasNLP 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas, pages 218-223. Association for Computational Linguistics, 2021 | |
dc.relation | Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL '02, page 311-318, USA, 2002. Association for Computational Linguistics. DOI 10.3115/1073083.1073135. URL https://doi.org/10. 3115/1073083.1073135 | |
dc.relation | John Ortega and Krishnan Pillaipakkamnatt. Using morphemes from agglutinative languages like Quechua and Finnish to aid in low-resource translation. In Proceedings of the AMTA 2018 Workshop on Technologies for MT of Low Resource Languages (LoResMT 2018), pages 1-11, Boston, MA, March 2018. Association for Machine Translation in the Americas. URL https://aclanthology.org/W18-2201 | |
dc.relation | Arturo Oncevay. Peru is Multilingual, Its Machine Translation Should Be Too. In Proceedings of the AmericasNLP 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas, pages 194-200. Association for Computational Linguistics, 2021 | |
dc.relation | Toan Q. Nguyen and David Chiang. Transfer learning across low-resource, related languages for neural machine translation. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 296-301, Taipei, Taiwan, November 2017. Asian Federation of Natural Language Processing. URL https://aclanthology.org/I17-2050 | |
dc.relation | Tan Ngoc Le and Fatiha Sadat. Revitalization of indigenous languages through pre-processing and neural machine translation: The case of Inuktitut. In Proceedings of the 28th International Conference on Computational Linguistics, pages 4661-4666, Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics. DOI 10.18653/v1/2020.coling-main.410. URL https://aclanthology.org/2020.coling-main.410 | |
dc.relation | Wilhelmina Nekoto, Vukosi Marivate, Tshinondiwa Matsila, Timi Fasubaa, Tajudeen Kolawole, Taiwo Fagbohungbe, Solomon Oluwole Akinola, Shamsuddee Hassan Muhammad, Salomon Kabongo, Salomey Osei, Sackey Freshia, Rubungo Andre Niyongabo, Ricky Macharm, Perez Ogayo, Orevaoghene Ahia, Musie Meressa, Mofe Adeyemi, Masabata Mokgesi-Selinga, Lawrence Okegbemi, Laura Jane Martinus, Kolawole Tajudeen, Kevin Degila, Kelechi Ogueji, Kathleen Siminyu, Julia Kreutzer, Jason Webster, Jamiil Toure Ali, Jade Abbott, Iroro Orife, Ignatius Ezeani, Idris Abdulkabir Dangana, Herman Kamper, Hady Elsahar, Goodness Duru, Ghollah Kioko, Espoir Murhabazi, Elan van Biljon, Daniel Whitenack, Christopher Onyefuluchi, Chris Emezue, Bonaventure Dossou, Blessing Sibanda, Blessing Itoro Bassey, Ayodele Olabiyi, Arshath Ramkilowan, Adewale Akinfaderin, and Abdallah Bashir. Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2144-2160. Association for Computational Linguistics, 2020 | |
dc.relation | Oscar Moreno. The REPUcs' Spanish-Quechua Submission to the AmericasNLP 2021 Shared Task on Open Machine Translation. In Proceedings of the AmericasNLP 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas, pages 241- 246. Association for Computational Linguistics, 2021 | |
dc.relation | Manuel Mager, Ximena Gutierrez-Vasques, Gerardo Sierra, and Ivan Meza-Ruiz. Challenges of language technologies for the indigenous languages of the Americas. In Proceedings of the 27th International Conference on Computational Linguistics, pages 55-69, Santa Fe, New Mexico, USA, August 2018. Association for Computational Linguistics. URL https: //aclanthology.org/C18-1006 | |
dc.relation | Tom Kocmi and Ondrej Bojar. Trivial transfer learning for low-resource neural machine translation. In Proceedings of the Third Conference on Machine Translation: Research Papers. Association for Computational Linguistics, 2018. DOI 10.18653/v1/w18-6325. URL https://doi.org/10.18653/v1/w18-6325 | |
dc.relation | Rebecca Knowles, Darlene Stewart, Samuel Larkin, and Patrick Littell. NRC-CNRC Machine Translation Systems for the 2021 AmericasNLP Shared Task. In Proceedings of the AmericasNLP 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas, pages 224-230. Association for Computational Linguistics, 2021 | |
dc.relation | Dayana Fernandez, Jose Atencia, Ornela Gamboa, and Óscar Bedoya. Design and implementation of an "web api" for the automatic translation colombia's language pairs: Spanish-wayuunaiki case. In Communications and Computing (COLCOM), 2013 IEEE Colombian Conference on, pages 1-9, 05 2013. ISBN 978-1-4799-0366-5 | |
dc.relation | Isaac Feldman and Rolando Coto-Solano. Neural machine translation models with back-translation for the extremely low-resource indigenous language Bribri. In Proceedings of the 28th International Conference on Computational Linguistics, pages 3965-3976, Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics. DOI 10.18653/v1/2020.coling-main.351. URL https://aclanthology.org/ 2020.coling-main.351 | |
dc.relation | Marcel Bollmann, Rahul Aralikatte, Héctor Murrieta Bello, Daniel Hershcovich, Miryam de Lhoneux, and Anders Søgaard. Moses and the character-based random babbling baseline: CoAStaL at AmericasNLP 2021 shared task. In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, pages 248-254, Online, June 2021. Association for Computational Linguistics. DOI 10.18653/v1/2021. americasnlp-1.28. URL https://aclanthology.org/2021.americasnlp-1.28 | |
dc.relation | El Moatez Billah-Nagoudi, Wei-Rui Chen, Muhammad Abdul-Mageed, and Hasan Cavusoglu. IndT5: A Text-to-Text Transformer for 10 Indigenous Languages. In Proceedings of the AmericasNLP 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas, pages 265-271. Association for Computational Linguistics, 2021 | |
dc.relation | Rachel Bawden, Alexandra Birch, Radina Dobreva, Arturo Oncevay, Antonio Valerio Miceli Barone, and Philip Williams. The University of Edinburgh's English-Tamil and English-Inuktitut submissions to the WMT20 news translation task. In Proceedings of the Fifth Conference on Machine Translation, pages 92-99, Online, November 2020. Association for Computational Linguistics. URL https://aclanthology.org/2020.wmt-1.5 | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 Internacional | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 Internacional | |
dc.rights | http://creativecommons.org/licenses/by-nc-nd/4.0/ | |
dc.rights | info:eu-repo/semantics/openAccess | |
dc.rights | http://purl.org/coar/access_right/c_abf2 | |
dc.title | Machine translation strategies for low-resource colombian indigenous languages | |
dc.type | Trabajo de grado - Maestría | |