dc.description.abstract | Big Data is a term used to characterize the growing volume of existing data on
different topics, whether they are biomedical or not. The enormous volume of biological and
biomedical data generated daily, one of the main barriers will be an analysis of these data.
The development and use of computational tools that allow the analysis of data through
techniques such as Text Mining. Text Mining, a Data Mining strand, can be defined as a
method that allows the extraction of relevant information contained in text. In order to allow a
differentiated analysis of the data, whether these clinical data or not, a simple algorithm was
developed, which allows the analysis of this data without the need of correlation with existing
databases, nor the creation of new databases. From this algorithm, a WEB tool was developed
so that anyone can access the algorithm (even without the knowledge of computational
techniques) and promote the analysis of their data. The Integrate Paired Tool (IPT) algorithm
was written in R programming language and uses Data Mining and Text Mining techniques
for analyzing clinical data, not restricting its analyzes only to these specific data. IPT
promotes pairing of terms by analyzing the existing frequency between data pairs, from a
user-supplied .csv file. In addition, the WEB tool was developed from the languages
JavaScript, HTML5, CSS and PHP. The algorithm reads the .csv file and pass through it by
pairing its terms two by two, regardless of whether the columns are different sizes or
incomplete until all columns are paired. After all the groupings, a value is assigned to each
grouped pair, adding all pairs with the same frequencies and generating another .csv file
containing the existing interactions and their respective frequencies. After the relations and
their appearance frequencies are formed, a graph of interactions (in R) is shown on the WEB
tool screen, so the user can do their analyzes, in addition to the .csv file with all interactions
and frequencies. This graph and this table can contain variable information, depending on the
percentage that the user chooses in the IPT tool. This .csv file with interaction and frequency
data can be used by the user in other network visualization tools, such as Gephi, for example.
For the purposes of tool testing, a data from a neonatal was used. The IPT proved to work
well and reached the objectives of the research, and as future goals, we will have the hosting
of the tool in the page of the Program of Postgraduate in Bioformtics of UFRN, the analysis
of other data and a possible integration of the pre-processing of the data within the IPT itself. | |