Abstract: The problems faced by airlines market is not unique. But their large amount of unstructured data and incomplete information creates a problem for analytics to analyze these data. So analyzing the complex unstructured data by using traditional tools and techniques is an expensive task. Airlines needed a proper analysis result to increase their market and reduce expenses. In this paper, the analysis of the airline data set is performed using Spark-SQL and hive which runs Hadoop in the background. HDFS is used for storing huge amount of airlines data, Hive and spark have been used for querying the data in which hive uses HiveQL statements which runs on MapReduce framework and spark uses Spark-SQL which runs on spark framework. Data visualization has been done by extracting the output of the HIVE and SPARK query in excel and plotting the data using line and bar plot charts. The visualization of the data shows some patterns that exist different airlines delays caused by weather, security, NAS delay, late aircraft delay etc.
Keywords: Hadoop, airlines datasets, big data, MapReduce, HDFS, spark, data analysis, data visualization
| DOI: 10.17148/IJARCCE.2020.9537