APACHE HADOOP ANALYSIS AND USAGE POSSIBILITIES

  • Dragan Šušak diva-e Platforms
  • Zoran Đurić Elektrotehnički fakultet, Univerzitet u Banjoj Luci, RS, BiH
Keywords: Apache Hadoop, Apache Spark, machine learning, news classification

Abstract

This paper analyzes Apache Hadoop application framework, some of its accompanying tools, and their usage possibilities. Research on alternative data processing framework in distributed environments, known as Apache Spark, is done in more detail. Description of hardware, software and security requirements, necessary for setting up a typical Hadoop cluster, according to Cloudera referent specification, is in this paper as well. As a practical example, relatively complex application for Web news classification is implemented, using Apache Spark application framework and machine learning principles. This paper also contains appropriate UML diagrams, used during the implementation process. As an illustration, few classification examples of unclassified Web news can also be found below.

Published
2020-01-20
Section
Information technologies