PROJEKTOVANJE PROCESA KLASTEROVANJA POMOĆU PATERNA

Kathrin Kirchner; Boris Delibašić; Milan Vukićević

Kathrin Kirchner Fakultet za ekonomiju poslovnu administraciju Friedrich Schiller, Univerzitet u Jeni
Boris Delibašić Fakultet organizacionih nauka
Milan Vukićević Fakultet organizacionih nauka

Keywords: Clustering, data mining, paterns, CRISP-DM

Abstract

A typical data mining process, as it is described e.g. in the CRISP-DM approach, consists of several phases starting from business and data understanding and proceeds with preprocessing, modeling and evaluation. For each of these phases, several generic tasks are described that have to be carried out. In practice, however, there are difficulties to decide which specialized task solves a generic task best. There are at least three reasons for this. First, a galore of specialized tasks is proposed in the literature and available in data mining software. Second, a lot of these tasks are encapsulated in algorithms, and can’t be used independently of the algorithm. Third, specialized tasks (reusable components - RCs) are not well-organized, i.e. it is not easy to select the appropriate RC for a generic task (sub-problem). In this paper, we propose a white box modeling methodology that supports the design of the data mining process. Our paper concentrates on clus tering algorithms only. Thus, we propose RCs for commonly appearing sub-problems in clustering, as well as pre- and post-processing RCs.

DESIGNING ClUSTERING PROCESS WITH REUSABLE COMPONENTS

Abstract