Finding topics in reviews of hotel services

  • Нађа Булајић QCerris
  • Sandro Radovanovic Faculty of organizational sciences, University of Belgrade
Keywords: Topic Modelling, Latent Dirichlet Allocation, Machine learning

Abstract

Latent Dirichlet Allocation (LDA) is one way of statistical modeling of topics. Within the LDA
model, documents are represented as a mixture of topics, while each topic consists of words with probabilities for each word yes
belongs to the given topic. In this paper, LDA will be introduced using an example related to topic identification
in a set of reviews left by hotel visitors. The mentioned reviews are taken from the Tripadvisor application and
placed in a data set called "Trip Advisor Hotel Review", where a detailed analysis of over 20,000 reviews
reveal the main themes that most often characterize hotels. The data set was processed and adapted to the needs of the research living in order to obtain better results, after which a data dictionary and a corpus were formed which were used as input parameter for model building. The resulting model consists of a list of topics and after visualizing the model, each of those topics are named. At the very end of the paper, the situations in which this model can be applied and thus brought about are described benefits both the users of the application and the hotel management itself.

Published
2023-08-16
Section
Information engineering