International Journal of Advanced Research in Computer and Communication Engineering

A monthly peer-reviewed online and print journal

ISSN Online 2278-1021
ISSN Print 2319-5940

Abstract: Human life is depending on demand. This data is categories as "Big Data" due to its three Volume, Variety and Velocity. Most of this data is unstructured, quasi structured or semi structured and it is heterogeneous in nature. The volume and the heterogeneity of data with the speed it is generated, makes it difficult for the present computing infrastructure to manage Big Data. Due to its specific nature of Big Data, it is stored in distributed file system architectures. Hadoop and HDFS by Apache is widely used for storing and managing Big Data. Analyzing Big Data is a challenging task as it involves large distributed file systems which should be fault tolerant, flexible and scalable. Map Reduce is widely been used for the efficient analysis of Big Data. Traditional DBMS techniques like Joins and Indexing and other techniques like graph search is used for classification and grouping of Big Data. These techniques are being adopted to be used in Map Reduce. In this paper we suggest basic over Hadoop and Hadoop Distributed File System (HDFS).

Keywords: Hadoop Distributed File System (HDFS), Relational Databases, Non-structured or semi-structured data model (NoSQL)


PDF | DOI: 10.17148/IJARCCE.2018.71146