MODEL SISTEMA ZA EKSTRAKCIJU INFORMACIJA IZ TEKSTOVA PISANIH NA SRPSKOM JEZIKU

Staša Vujičić Stanković

Staša Vujičić Stanković Univerzitet u Beogradu, Matematički fakultet

Keywords: Ontology-Based Information Extraction, Semantic Web, Natural Language Processing, Serbian

Abstract

This paper motivates the need for addressing the problem of enriching Information Extraction systems with the Semantic Web data and technologies, stressing the existing problems related to Serbian. Due to the core nature of Serbian, morphologically rich language, which leads to different types of issues in natural language processing tasks and especially Information Extraction process, and the lack of semantic resources and tools for Serbian, it is necessary to develop techniques for overcoming these problems, with particular emphases to take advantage of existing English resources and tools. The model for Ontology-Based Information Extraction is proposed to deal with the imperfection of Information Extraction related to Serbian, and to enhance essential Information Extraction process by incorporating semantic knowledge encapsulated in the ontologies. The scope of this paper is to describe the model based on: integration of existing resources that have been developed for the processing of both texts in the Serbian language, and those that are aimed to be used for processing of texts in English; adapting existing techniques and tools; and inventing new ways of their implementation in order to overcome significant challenges. We believe this model will encourage development of the Ontology-Based Information Extraction systems for specific domains and applications, dealing the increasing volume of data in Serbian.

MODEL OF A SYSTEM FOR INFORMATION EXTRACTION FROM SERBIAN WRITTEN TEXTS

Abstract