dc.contributorDr. Grigori Sidorov
dc.contributorDr. Jim enez Salazar, H ector
dc.creatorPosadas Dur an, Juan Pablo Francisco
dc.date.accessioned2013-01-10T16:35:05Z
dc.date.available2013-01-10T16:35:05Z
dc.date.created2013-01-10T16:35:05Z
dc.date.issued2011-06-09
dc.identifierhttp://www.repositoriodigital.ipn.mx/handle/123456789/9239
dc.description.abstractOne line of research of Natural Language Processing focuses on parallel texts alignment. The utility of aligned parallel texts is that it shows explicitly the relationship between the elements in a text in one language and elements of the same text translated into another language. In this thesis, we propose a method for sentence alignment in parallel texts written in Spanish and English, it uses lexical and statistical information in a dynamic programming framework. The lexical information used is the one contained in a bilingual Spanish-English dictionary limited (incomplete) and for general purpose, as well as the sentence length measured in terms of words and in terms of characters. The proposed method was tested on a corpus of unbalanced literary texts (texts in which the frequency of multiple alignments, omissions and insertions is greater), where we reach a precision aobove the 90 %. We compared our results obtained by the proposed method against those obtained by the Vanilla aligner system (which uses a statistical approach)with the same corpus and found that the developed method is superior, particularly in cases of multiple alignments, omissions and insertions. The results we obtained show that the use of lexical information contained in a bilingual dictionary of general use and statistical information, make this a robust method for sentence alignment in texts that don t have a technical translation with respect to statistical methods alone.
dc.languagees
dc.subjectcorpus paralelo
dc.subjectespañol-ingles
dc.titleCompilaci on de un corpus paralelo espa~nol{ingl es alineado a nivel de oraciones
dc.typeThesis


Este ítem pertenece a la siguiente institución