Actas de congresos
Similarity-based support for text reuse in technical writing
Fecha
2015-09Registro en:
ACM Symposium on Document Engineering, 15th, 2015, Lausanne.
9781450333078
Autor
Soto, Axel J.
Mohammad, Abidalrahman
Albert, Andrew
Islam, Aminul
Milios, Evangelos
Doyle, Michael
Minghim, Rosane
Oliveira, Maria Cristina Ferreira de
Institución
Resumen
Technical writing in professional environments, such as user manual authoring for new products, is a task that relies heavily on reuse of content. Therefore, technical content is typically created following a strategy where modular units of text have references to each other. One of the main challenges faced by technical authors is to avoid duplicating existing content, as this adds unnecessary effort, generates undesirable inconsistencies, and dramatically increases maintenance and translation costs. However, there are few computational tools available to support this activity. This paper investigates the use of different similarity methods for the task of identification of reuse opportunities in technical writing. We evaluated our results using existing ground truth as well as feedback from technical authors. Finally, we also propose a tool that combines text similarity algorithms with interactive visualizations to aid authors in understanding differences in a collection of topics and identifying reuse opportunities.