Tese
Organização da informação: um modelo semiautomático de classificação de atrações em perfis turísticos usando aprendizado de máquina
Fecha
2021-03-09Autor
Amarildo Martins de Magalhães
Institución
Resumen
The technological evolution paradigm has brought a disruptive change people's
behavior, who now make decisions based on the content they consume on the Internet.
This aspect is no different in the Tourism industry, where new technologies and the
sharing of reviews allow users to seek information to support decisions such as
choosing a destination, accommodation, attractions, food, among others. Reviews
provide an important source of information; however, their volume can make it difficult
to extract knowledge and use it effectively. How to find out if a particular point of interest
with more than 100,000 opinions written in unstructured text is similar to what a tourist
is looking for? This question motivates the development of this research, which has as
its direct objective the creation of a model that allows transforming the reviews made
by users into tourist classes (profiles). In the literature, some works try to address the
problem of point of interest classification using reviews. In addition, the use of profiles
in tourism is common, as a way of classifying destinations and tourists. In this sense,
this study can present an additional view on both aspects, while allowing the joining of
tourist profiles with review's information. The work presents an applied research, based
on Pragmatism, of a hybrid nature with an exploratory objective. It uses the reviews
organization as they quality nature as a source for a quantitative exploration analysis.
The methodology presents the creation and validation of a classification model at three
levels. At the Conceptual Level, knowledge is explored from domain experts, such as
the creation of a set of 12 tourist profiles and definition of destinations to be used in
the research. At this level, 3.4 million tourist reviews written in Portuguese are also
collected. At the Technological Level, information is organized, represented and an
automatic text classification process is carried out using different Machine Learning
techniques. The Validation Level presents a comparison between automatic methods
and a classification carried out by specialists. The best performing method is used to
explore compatibility between destinations, attractions, states, countries and profiles,
as well as the differences between the popularity and similarity of destinations with a
profile. It also explores the similarity between destinations and the profile variation of
the most visited destinations. The specific results present interesting discoveries in
tourism, such as the identification of the best destinations for each profile, the most
popular destinations that are not the most relevant for a profile or the identification of
a very high degree of similarity between national and international destinations. The
model performance above 70% accuracy, using technology and specialists offer an
important alternative for models of knowledge organization, mainly due to the
dynamism and exponential growth of content on the Internet. The results can help
tourists looking for certain experiences, governments to promote tourism for a specific
audience or private companies that aim to offer targeted products and services.
Regardless of the actor in the process, the organization and classification of tourist
information turn the decision-making process easier.