Abstract: Large collection of data sets includes different types such as structured, unstructured and semi-structured data. This data is categories as “Big Data” due to its absolute volume, variety and velocity. Traditional data management, warehousing and analysis system fall short of tools to analyze this data. Big data exceeds the processing capability of traditional database to capture, manage, and process the voluminous amount of data. Due to its specific nature of Big Data, in this paper we first introduce the big data is stored in distributed file system architectures. Hadoop and HDFS by Apache is widely used for storing and managing Big Data and the data processing is done by the Map Reduced system. To process or analyse this huge amount of data or extracting meaningful information is a challenging task.
Keywords: Big Data, HDFS, Map Reduced, Cluster.
| DOI: 10.17148/IJARCCE.2020.9533