APACHE HADOOP ANALYSIS AND USAGE POSSIBILITIES
Abstract
This paper analyzes Apache Hadoop application framework, some of its accompanying tools, and their usage possibilities. Research on alternative data processing framework in distributed environments, known as Apache Spark, is done in more detail. Description of hardware, software and security requirements, necessary for setting up a typical Hadoop cluster, according to Cloudera referent specification, is in this paper as well. As a practical example, relatively complex application for Web news classification is implemented, using Apache Spark application framework and machine learning principles. This paper also contains appropriate UML diagrams, used during the implementation process. As an illustration, few classification examples of unclassified Web news can also be found below.