Квантификација категоричких атрибута за примену алгоритама машинског учења

Aleksa Milosavljević; Milan Vukićević

Aleksa Milosavljević Faculty of Organizational Sciences, University of Belgrade
Milan Vukićević Faculty of Organizational Sciences, University of Belgrade

Keywords: machine learning, categorical attributes, encoding, algorithm performance, classification, data preprocessing

Abstract

This paper explores various methods for encoding categorical attributes for the application of machine learning algorithms. Categorical attributes represent information that often describes qualitative characteristics, commonly present in real-world datasets, such as gender, color, educational level, item category, and similar. Encoding such attributes is important because most machine learning algorithms require numerical inputs, and the lack of encoding of categorical attributes can significantly impact the efficiency and performance of the algorithm. The paper will present several methods for quantification, such as One-hot encoding, Target encoding, Count encoding, and others. These methods will be applied to several datasets containing different types of categorical attributes, and after applying machine learning algorithms, their performance will be evaluated. The research results will provide a deeper understanding of encoding methods and their effects on algorithm performance.

Encoding of categorical attributes for machine learning algorithms

Abstract