DETECTING MALICIOUS URLS USING MACHINE LEARNING METHODS
Abstract
In this paper, we describe methods for detecting malicious URLs using machine learning algorithms with a purpose of discovering rules in data which could not be detected by traditional blacklist approach. The particular challenge was to analyze data sets and choose the most appropriate features. In order to show that efficiently predicting malicious URLs can be done using just URL, without page content, we focus on lexical and host-based features. Further, we implement automated services for gathering and generating all proposed feature values. We explore six binary classifiers using two data sets. The experimental results show that the combination of the proposed URL features and classifiers in this paper can achieve accuracy 96-99%. We also discuss issues and indicate some important open problems for further research.