Abstract: We live in on-demand, on command digital universe with data prolife ring by institution, individuals and machines at a very high rate. This data is categories as “Big Data” due to its sheer volume, variety and velocity .Most of this data is unstructured , quasi structured or semi structured and it is heterogeneous inn nature . the volume and the heterogeneity of data with the speed it is generated , makes it difficult for the present computing infrastructure to manage Big Data. Traditional data management , warehousing and analysis system fall short of tools to analyze this data . Due to its specific nature of Big Data, it is stored in distributed file system architectures. Hadoop and HDFS by Apache is widely used for storing and managing Big Data. Analyzing Big Data is a challenging task as it involved large distributed file system.
Keywords: Big Data, HDFS, Map Reduced, Cluster
| DOI: 10.17148/IJARCCE.2019.8242