Dissertação de Mestrado
Desenvolvimento, implementação e teste de ferramentas integradas para análise textual e tratamento estatístico de dados em pesquisas linguisticas
Fecha
2016-02-15Autor
Rodrigo Araujo e Castro
Institución
Resumen
This thesis reports on a study aimed at developing, applying and testing a set of tools designed for the pre-processing and analysis of structured (spreadsheet) and unstructured data by means of scripts written in the R software and environment. Contributing to Translation Studies, within the scope of appliable linguistics (Halliday, 1985), as conceived of by Systemic Functional Linguistics (Halliday and Matthiessen, 2014), and drawing on Corpus Linguistics, data and text mining and descriptive and multivariate statistics, scripts were written and tested on data retrieved from a study carried out at the Laboratory for Experimentation in Translation, Arts Faculty, Federal University of Minas Gerais, in which four nuclear scientists of the Center for the Development of Nuclear Energy, and four professional translators were asked to produce a translation in an experimental setting. The data set selected were (i) subjects' sociodemographic data and their answers to a questionnaire on their reading and writing habits and proficiency in L1 and L2 (structured data in spreadsheets) ; and (ii) unstructured data (text) retrieved from recall protocols carried out by subjects upon task completion. Structured data were pre-processed in the R environment through designed scripts. The focus of the analysis was summarizing the subjects data, which were triangulated with the clustering results generated through the multivariate analysis technique. Unstructured data were pre-processed in the Notepad++ text editor and through designed scripts in order to analyze the pronouns eu and a gente and verbs co-occurring with them as realizations of PARTICIPANT and PROCESS categories within the TRANSITIVITY system ascribable to instances of subjects' metareflection on their task. Structured data analysis allowed for clustering subjects and obtaining dendrograms. Unstructured data analysis generated frequency lists, word clouds, Keywords in Context and lists of collocates. The results of the implementation study showed which subjects were more similar in each group and in the sample as a whole. They also showed that the most frequent verbs co-occuring with the selected pronouns were those realizing material and relational PROCESSES (associated to subjects representation of their task as doing and attributing activities), followed by mental PROCESSES (including instances of interpersonal metaphors), which, according to Magalhães and Alves (2006), tend to relate, more deliberately, to subjects' metareflection.