dc.contributorManrique Piramanrique, Rubén Francisco
dc.contributorCardozo Álvarez, Nicolás
dc.contributorMoreno Barbosa, Andrés Darío
dc.contributorFLAG
dc.creatorSalazar Cárdenas, Iván David
dc.date.accessioned2022-10-28T16:19:56Z
dc.date.accessioned2023-09-06T23:41:37Z
dc.date.available2022-10-28T16:19:56Z
dc.date.available2023-09-06T23:41:37Z
dc.date.created2022-10-28T16:19:56Z
dc.date.issued2022-07-27
dc.identifierhttp://hdl.handle.net/1992/62941
dc.identifierinstname:Universidad de los Andes
dc.identifierreponame:Repositorio Institucional Séneca
dc.identifierrepourl:https://repositorio.uniandes.edu.co/
dc.identifier.urihttps://repositorioslatinoamericanos.uchile.cl/handle/2250/8726743
dc.description.abstractLow-resource languages are a challenging field for machine translation and natural language processing. During the past years, a lot of efforts have been made in the search for strategies that can counter the scarcity of written and spoken material for these languages. Among these efforts Transformer architecture and Transfer Learning have been used as strategies to work in the low-resource environment, but the results are not conclusive about their effectiveness. American indigenous languages are good examples of low-resource languages since they have a big amount of written and spoken sources, and obtaining them is particularly complicated. In this thesis, we experiment with the Transformer architecture and Transfer Learning using as a study case two Colombian indigenous languages. We aim to find which combination of strategies can be more beneficial to the translation scores of the models. This way we can help in the task of preserving the endangered languages
dc.languageeng
dc.publisherUniversidad de los Andes
dc.publisherMaestría en Ingeniería de Sistemas y Computación
dc.publisherFacultad de Ingeniería
dc.publisherDepartamento de Ingeniería Sistemas y Computación
dc.relationAlp Öktem, Mirko Plitt, and Grace Tang. Tigrinya neural machine translation with transfer learning for humanitarian response, 2020. URL https://arxiv.org/abs/2003.11523
dc.relationJosé Álvarez. Curso Inicial de Lengua Wayuu: Lectoescritura y gramática básica, 2008
dc.relationRui Wang, Xu Tan, Renqian Luo, Tao Qin, and Tie-Yan Liu. A survey on low-resource neural machine translation, 2021. URL https://arxiv.org/abs/2107.04239
dc.relationAshish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2017. URL https://arxiv.org/abs/1706.03762
dc.relationElan van Biljon, Arnu Pretorius, and Julia Kreutzer. On optimal transformer depth for low-resource language translation, 2020. URL https://arxiv.org/abs/2004.04418
dc.relationUnited Nations High Commissioner for Refugees UNHCR. Comunidades indígenas en Colombia, 2011
dc.relationMatt Post. A call for clarity in reporting bleu scores, 2018. URL https://arxiv.org/abs/ 1804.08771
dc.relationAdam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library, 2019. URL https://arxiv.org/abs/1912.01703
dc.relationNLLB Team, Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, and Jeff Wang. No language left behind: Scaling human-centered machine translation, 2022. URL https://arxiv.org/abs/2207.0467
dc.relationNitika Mathur, Timothy Baldwin, and Trevor Cohn. Tangled up in bleu: Reevaluating the evaluation of automatic machine translation evaluation metrics, 2020. URL https: //arxiv.org/abs/2006.06264
dc.relationManuel Mager, Arturo Oncevay, Annette Rios, Ivan Vladimir Meza Ruiz, Alexis Palmer, Graham Neubig, and Katharina Kann, editors. Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, Online, June 2021. Association for Computational Linguistics. URL https://aclanthology.org/2021. americasnlp-1.0
dc.relationJacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding, 2018. URL https://arxiv. org/abs/1810.04805
dc.relationIfe Adebara, Muhammad Abdul-Mageed, and Miikka Silfverberg. Translating the unseen? yoruba-english mt in low-resource, morphologically-unmarked settings, 2021. URL https: //arxiv.org/abs/2103.04225
dc.relationBarret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. Transfer learning for low-resource neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1568-1575, Austin, Texas, November 2016. Association for Computational Linguistics. DOI 10.18653/v1/D16-1163. URL https:// aclanthology.org/D16-1163
dc.relationDelfino Zacarias and Ivan Meza. Ayuuk-Spanish Neural Machine Translator. In Proceedings of the AmericasNLP 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas, pages 168-172. Association for Computational Linguistics, 2021
dc.relationRaúl Vázquez, Yves Scherrer, Sami Virpioja, and Jörg Tiedemann. The Helsinki submission to the AmericasNLP shared task. In Proceedings of the AmericasNLP 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas, pages 255-264. Association for Computational Linguistics, 2021
dc.relationEnrique Uribe-Jongbloed and Carl Anderson. Indigenous and minority languages in colombia: The current situation. Zeszyty Luzyckie, 48, 12
dc.relationJörg Tiedemann and Santhosh Thottingal. OPUS-MT - building open translation services for the world. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, pages 479-480, Lisboa, Portugal, November 2020. European Association for Machine Translation. URL https://aclanthology.org/2020.eamt-1.61
dc.relationLuz Marina Sierra Martínez, Carlos Alberto Cobos, Juan Carlos Corrales, Tulio Rojas, and Luis Carlos Gómez. Sistema de Recuperación de Información para Apoyar la Revitalización del Nasa Yuwe. Iberian Journal of Information Systems and Technologies, 17:407-422, 2019
dc.relationLuz Marina Sierra Martínez, Carlos Alberto Cobos, Juan Carlos Corrales Muñoz, Tulio Rojas Curieux, Enrique Herrera-Viedma, and Diego Hernán Peluffo-Ordóñez. Building a Nasa Yuwe Language Corpus and Tagging with a Metaheuristic Approach. Computación y Sistemas, 22:881-894, 09 2018. ISSN 1405-5546. URL http://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S1405-55462018000300881&nrm=iso
dc.relationLuz Marina Sierra Martínez, Carlos Cobos, and Juan Corrales. Tokenizer adapted for the nasa yuwe language. Computacion y Sistemas, 20:355364, 09 2016. DOI 10.13053/ CyS-20-3-2455
dc.relationLuz Marina Sierra, Carlos Alberto Cobos, Juan Carlos Corrales, and Tulio Rojas Curieux. Building a nasa yuwe language test collection. In Alexander Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, pages 112-123, Cham, 2015. Springer International Publishing. ISBN 978-3-319-18111-0
dc.relationMichael Przystupa and Muhammad Abdul-Mageed. Neural machine translation of low-resource and similar languages with backtranslation. In Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2), pages 224-235, Florence, Italy, August 2019. Association for Computational Linguistics. DOI 10.18653/v1/W19-5431. URL https://aclanthology.org/W19-5431
dc.relationMaja Popovic. chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, pages 392-395, Lisbon, Portugal, September 2015. Association for Computational Linguistics. DOI 10.18653/v1/W15-3049. URL https://aclanthology.org/W15-3049
dc.relationShantipriya Parida, Subhadarshi Panda, Amulya Dash, Esau Villatoro-Tello, A. Seza Dogruöz, Rosa M. Ortega-Mendoza, Amadeo Hernández, Yashvardhan Sharma, and Petr Motlicek. Open Machine Translation for Low Resource South American Languages. In Proceedings of the AmericasNLP 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas, pages 218-223. Association for Computational Linguistics, 2021
dc.relationKishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL '02, page 311-318, USA, 2002. Association for Computational Linguistics. DOI 10.3115/1073083.1073135. URL https://doi.org/10. 3115/1073083.1073135
dc.relationJohn Ortega and Krishnan Pillaipakkamnatt. Using morphemes from agglutinative languages like Quechua and Finnish to aid in low-resource translation. In Proceedings of the AMTA 2018 Workshop on Technologies for MT of Low Resource Languages (LoResMT 2018), pages 1-11, Boston, MA, March 2018. Association for Machine Translation in the Americas. URL https://aclanthology.org/W18-2201
dc.relationArturo Oncevay. Peru is Multilingual, Its Machine Translation Should Be Too. In Proceedings of the AmericasNLP 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas, pages 194-200. Association for Computational Linguistics, 2021
dc.relationToan Q. Nguyen and David Chiang. Transfer learning across low-resource, related languages for neural machine translation. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 296-301, Taipei, Taiwan, November 2017. Asian Federation of Natural Language Processing. URL https://aclanthology.org/I17-2050
dc.relationTan Ngoc Le and Fatiha Sadat. Revitalization of indigenous languages through pre-processing and neural machine translation: The case of Inuktitut. In Proceedings of the 28th International Conference on Computational Linguistics, pages 4661-4666, Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics. DOI 10.18653/v1/2020.coling-main.410. URL https://aclanthology.org/2020.coling-main.410
dc.relationWilhelmina Nekoto, Vukosi Marivate, Tshinondiwa Matsila, Timi Fasubaa, Tajudeen Kolawole, Taiwo Fagbohungbe, Solomon Oluwole Akinola, Shamsuddee Hassan Muhammad, Salomon Kabongo, Salomey Osei, Sackey Freshia, Rubungo Andre Niyongabo, Ricky Macharm, Perez Ogayo, Orevaoghene Ahia, Musie Meressa, Mofe Adeyemi, Masabata Mokgesi-Selinga, Lawrence Okegbemi, Laura Jane Martinus, Kolawole Tajudeen, Kevin Degila, Kelechi Ogueji, Kathleen Siminyu, Julia Kreutzer, Jason Webster, Jamiil Toure Ali, Jade Abbott, Iroro Orife, Ignatius Ezeani, Idris Abdulkabir Dangana, Herman Kamper, Hady Elsahar, Goodness Duru, Ghollah Kioko, Espoir Murhabazi, Elan van Biljon, Daniel Whitenack, Christopher Onyefuluchi, Chris Emezue, Bonaventure Dossou, Blessing Sibanda, Blessing Itoro Bassey, Ayodele Olabiyi, Arshath Ramkilowan, Adewale Akinfaderin, and Abdallah Bashir. Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2144-2160. Association for Computational Linguistics, 2020
dc.relationOscar Moreno. The REPUcs' Spanish-Quechua Submission to the AmericasNLP 2021 Shared Task on Open Machine Translation. In Proceedings of the AmericasNLP 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas, pages 241- 246. Association for Computational Linguistics, 2021
dc.relationManuel Mager, Ximena Gutierrez-Vasques, Gerardo Sierra, and Ivan Meza-Ruiz. Challenges of language technologies for the indigenous languages of the Americas. In Proceedings of the 27th International Conference on Computational Linguistics, pages 55-69, Santa Fe, New Mexico, USA, August 2018. Association for Computational Linguistics. URL https: //aclanthology.org/C18-1006
dc.relationTom Kocmi and Ondrej Bojar. Trivial transfer learning for low-resource neural machine translation. In Proceedings of the Third Conference on Machine Translation: Research Papers. Association for Computational Linguistics, 2018. DOI 10.18653/v1/w18-6325. URL https://doi.org/10.18653/v1/w18-6325
dc.relationRebecca Knowles, Darlene Stewart, Samuel Larkin, and Patrick Littell. NRC-CNRC Machine Translation Systems for the 2021 AmericasNLP Shared Task. In Proceedings of the AmericasNLP 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas, pages 224-230. Association for Computational Linguistics, 2021
dc.relationDayana Fernandez, Jose Atencia, Ornela Gamboa, and Óscar Bedoya. Design and implementation of an "web api" for the automatic translation colombia's language pairs: Spanish-wayuunaiki case. In Communications and Computing (COLCOM), 2013 IEEE Colombian Conference on, pages 1-9, 05 2013. ISBN 978-1-4799-0366-5
dc.relationIsaac Feldman and Rolando Coto-Solano. Neural machine translation models with back-translation for the extremely low-resource indigenous language Bribri. In Proceedings of the 28th International Conference on Computational Linguistics, pages 3965-3976, Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics. DOI 10.18653/v1/2020.coling-main.351. URL https://aclanthology.org/ 2020.coling-main.351
dc.relationMarcel Bollmann, Rahul Aralikatte, Héctor Murrieta Bello, Daniel Hershcovich, Miryam de Lhoneux, and Anders Søgaard. Moses and the character-based random babbling baseline: CoAStaL at AmericasNLP 2021 shared task. In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, pages 248-254, Online, June 2021. Association for Computational Linguistics. DOI 10.18653/v1/2021. americasnlp-1.28. URL https://aclanthology.org/2021.americasnlp-1.28
dc.relationEl Moatez Billah-Nagoudi, Wei-Rui Chen, Muhammad Abdul-Mageed, and Hasan Cavusoglu. IndT5: A Text-to-Text Transformer for 10 Indigenous Languages. In Proceedings of the AmericasNLP 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas, pages 265-271. Association for Computational Linguistics, 2021
dc.relationRachel Bawden, Alexandra Birch, Radina Dobreva, Arturo Oncevay, Antonio Valerio Miceli Barone, and Philip Williams. The University of Edinburgh's English-Tamil and English-Inuktitut submissions to the WMT20 news translation task. In Proceedings of the Fifth Conference on Machine Translation, pages 92-99, Online, November 2020. Association for Computational Linguistics. URL https://aclanthology.org/2020.wmt-1.5
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional
dc.rightshttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rightsinfo:eu-repo/semantics/openAccess
dc.rightshttp://purl.org/coar/access_right/c_abf2
dc.titleMachine translation strategies for low-resource colombian indigenous languages
dc.typeTrabajo de grado - Maestría


Este ítem pertenece a la siguiente institución