Pareador de termos para pesquisa clínica: integrate paired toll - IPT

Damasceno, Thaynã Nhaara Oliveira

masterThesis

Fecha

2018-12-18

Registro en:

DAMASCENO, Thaynã Nhaara Oliveira. Pareador de termos para pesquisa clínica: integrate paired toll - IPT. 2018. 70f. Dissertação (Mestrado em Bioinformática) - Instituto Metrópole Digital, Universidade Federal do Rio Grande do Norte, Natal, 2018.

https://repositorio.ufrn.br/jspui/handle/123456789/26912

http://repositorioslatinoamericanos.uchile.cl/handle/2250/3959774

Autor

Damasceno, Thaynã Nhaara Oliveira

Institución

Universidade Federal do Rio Grande do Norte (Brasil)

Resumen

Big Data is a term used to characterize the growing volume of existing data on different topics, whether they are biomedical or not. The enormous volume of biological and biomedical data generated daily, one of the main barriers will be an analysis of these data. The development and use of computational tools that allow the analysis of data through techniques such as Text Mining. Text Mining, a Data Mining strand, can be defined as a method that allows the extraction of relevant information contained in text. In order to allow a differentiated analysis of the data, whether these clinical data or not, a simple algorithm was developed, which allows the analysis of this data without the need of correlation with existing databases, nor the creation of new databases. From this algorithm, a WEB tool was developed so that anyone can access the algorithm (even without the knowledge of computational techniques) and promote the analysis of their data. The Integrate Paired Tool (IPT) algorithm was written in R programming language and uses Data Mining and Text Mining techniques for analyzing clinical data, not restricting its analyzes only to these specific data. IPT promotes pairing of terms by analyzing the existing frequency between data pairs, from a user-supplied .csv file. In addition, the WEB tool was developed from the languages JavaScript, HTML5, CSS and PHP. The algorithm reads the .csv file and pass through it by pairing its terms two by two, regardless of whether the columns are different sizes or incomplete until all columns are paired. After all the groupings, a value is assigned to each grouped pair, adding all pairs with the same frequencies and generating another .csv file containing the existing interactions and their respective frequencies. After the relations and their appearance frequencies are formed, a graph of interactions (in R) is shown on the WEB tool screen, so the user can do their analyzes, in addition to the .csv file with all interactions and frequencies. This graph and this table can contain variable information, depending on the percentage that the user chooses in the IPT tool. This .csv file with interaction and frequency data can be used by the user in other network visualization tools, such as Gephi, for example. For the purposes of tool testing, a data from a neonatal was used. The IPT proved to work well and reached the objectives of the research, and as future goals, we will have the hosting of the tool in the page of the Program of Postgraduate in Bioformtics of UFRN, the analysis of other data and a possible integration of the pre-processing of the data within the IPT itself.

Materias

Text mining

Bioinformática

Biomedical text mining

Grafos

Mostrar el registro completo del ítem