Um processo para a geração de recursos lingüísticos aplicáveis em ferramentas de auxílio à escrita científica.
MARQUIAFÁVEL, Vanessa Silva. Um processo para a geração de recursos lingüísticos aplicáveis em ferramentas de auxílio à escrita científica.. 2007. 276 f. Dissertação (Mestrado em Ciências Humanas) - Universidade Federal de São Carlos, São Carlos, 2007.
Marquiafável, Vanessa Silva
Within the context of academic research, English is the lingua franca for various scientific disciplines. It is also widely acknowledged that producing an acceptable academic text is anything but a simple and easy task. This is particularly more acute if the author is a novice researcher and English is not his/her first language. One possible solution to minimize this difficulty is the use of writing tools to assist novice researchers during different stages of the writing process. This could involve, for instance, quick and easy access to a collection of authentic linguistic resources extracted from published scientific papers. AMADEUS (Amiable Article Development for User Support) and SciPo (Scientific Portuguese) are good examples of this type of writing tools. AMADEUS is a resource which was designed to help non-native English users write academic texts. It focuses on the fields of Physics and Computer Science specifically. SciPo is a Web critiquing system for writing theses in Portuguese and focuses on the discipline of Computer Science. A variation of Scipo is SciPo- Farmácia, which is a web-based tool to assist non-native speakers of English in writing scientific papers in the field of Pharmaceutical Sciences. The main purpose of this dissertation is to elaborate a semi-automatic process to generate the necessary English linguistic resources required by supporting writing tools, such as the ones mentioned above. The primary aim is to enable researchers from various disciplines to develop their own aiding writing tool, customized to his/her specific field, with no need to refer to linguists, computer scientists and/or academic writing specialists for help. The semi-automatic process proposed here has been designed to include the knowledge which would be provided by these specialists. The main methodology adopted in this research derives from the discipline of Corpus Linguistics (we have used both corpus-based and corpus-driven approaches). This choice relies on the assumption that the success of such tools is strongly related to the corpus from which users collect well-written text extracts so that they can be recycled and reused in the text being produced. The semi-automatic process was evaluated in two ways: i) clearness and completeness of the manuals describing the linguistic resources and ii) quality of the linguistic resources generated and estimated time for developing all the necessary linguistic resources. For measuring the quality of the two evaluation stages, we have used the statistical system Kappa. The results ranged from k=0.72 e k=1.0. These figures can be interpreted as a good understanding of the tasks described in the manuals evaluated. The present research proves relevant in a number of aspects. It opens up the possibility of generating a computational tool to assist non-native English speakers in writing academic texts in any experimental field, by using the knowledge from the semiautomatic process only. It also promotes the use of supporting writing tools as didactic resource for teaching-learning scientific English and the use of metrics to evaluate rhetorical structure models. Last but not least, it produces a rhetorically annotated corpus which may be used for teaching-learning purposes or in natural language processing.