dc.contributorDeise Prina Dutra
dc.contributorCrysttian Arantes Paixão
dc.contributorBarbara Malveira Orfano
dc.creatorAndressa Rodrigues Gomide
dc.date.accessioned2019-08-14T21:55:50Z
dc.date.accessioned2022-10-03T23:37:24Z
dc.date.available2019-08-14T21:55:50Z
dc.date.available2022-10-03T23:37:24Z
dc.date.created2019-08-14T21:55:50Z
dc.date.issued2016-03-21
dc.identifierhttp://hdl.handle.net/1843/MGSS-A9KGY5
dc.identifier.urihttp://repositorioslatinoamericanos.uchile.cl/handle/2250/3825541
dc.description.abstractThis master thesis deals with the technical and methodological aspects in creating, cleaning and processing a Brazilian university level learner corpus, the Corpus do Inglês sem Fronteiras (CorIsF) v 1.0. The two main goals of this study consist of making the processing of CorIsF replicable and in investigating and describing the variation of some linguistic characteristics across different learner groups, tasks andgenres. The procedure was carried in R, a free software environment for statistical computing and graphics, and was divided in four parts: dataset compilation and preprocessing; dataset processing; extraction of the key features; and data visualization. The first step deals with the method used to collect the data and to do the first cleaning process, such as eliminating unwanted data and keeping the relevant ones. In the following step, CorIsF was subset in five small corpora covering different learner profiles, two different tasks, and on genre, and annotated with a part-ofspeech (POS) tagger. In the third step the variability of POS within subcorpora, the frequency of types and tokens, and the usage of n-grams were investigated. In the final step some exploratory data visualization were performed with the creation and analysis of plots and wordclouds. After the preparation of the data, the language used in each subcorpora was contrasted and analysed, suggesting that task, genre and student background are likely to influence learners written production.
dc.publisherUniversidade Federal de Minas Gerais
dc.publisherUFMG
dc.rightsAcesso Aberto
dc.subjectInglês para fins acadêmicos
dc.subjectCorpus de aprendiz
dc.subjectDesenho de corpus
dc.titleProcessing a learner corpus to identify differences: the influence of task, genre and student background
dc.typeDissertação de Mestrado


Este ítem pertenece a la siguiente institución