Semantic analysis of text

  • Ana Bojanić Bravo Systems d.o.o. Banja Luka, RS, BiH / Elektrotehnički fakultet, Univerzitet u Banjoj Luci, RS, BiH
  • Zoran Đurić Elektrotehnički fakultet, Univerzitet u Banjoj Luci, RS, BiH
Keywords: Text categorization, Supervised and unsupervised machine learning, Natural Language Processing

Abstract

Text categorization using machine learning methods has become one of the key techniques for extracting and summarization of valuable information from text documents. In this paper text pre-processing steps are described, and supervised and unsupervised machine learning approaches for text categorization are analyzed. Five algorithms are evaluated on five standard datasets for text categorization. For majority of applied algorithms, on all datasets, achieved precision and recall are in range 70-90%. In terms of predefined metrics, supervised algorithms perform better on four datasets, while unsupervised approach shows better results on one dataset. Also, main advantage of unsupervised approach comparing to those supervised is emphasized and some possible suggestions for further research in this area are given.

Author Biographies

Ana Bojanić, Bravo Systems d.o.o. Banja Luka, RS, BiH / Elektrotehnički fakultet, Univerzitet u Banjoj Luci, RS, BiH

dipl. ing.

Zoran Đurić, Elektrotehnički fakultet, Univerzitet u Banjoj Luci, RS, BiH

prof. dr

Published
2021-07-01
Section
Information technologies