Encoding of categorical attributes for machine learning algorithms
Abstract
This paper explores various methods for encoding categorical attributes for the application of machine learning algorithms. Categorical attributes represent information that often describes qualitative characteristics, commonly present in real-world datasets, such as gender, color, educational level, item category, and similar. Encoding such attributes is important because most machine learning algorithms require numerical inputs, and the lack of encoding of categorical attributes can significantly impact the efficiency and performance of the algorithm. The paper will present several methods for quantification, such as One-hot encoding, Target encoding, Count encoding, and others. These methods will be applied to several datasets containing different types of categorical attributes, and after applying machine learning algorithms, their performance will be evaluated. The research results will provide a deeper understanding of encoding methods and their effects on algorithm performance.