International Journal of Advanced Research in Computer and Communication Engineering

A monthly peer-reviewed online and print journal

ISSN Online 2278-1021
ISSN Print 2319-5940

Since 2012

Abstract: Data Analytics and Machine Learning are emerging as the leading technologies of 21st century. Any acquired/ given data can be analysed and conclusions drawn accordingly. The acquired or given data usually exists in its crude or raw state. Data pre-processing helps to format the data into useful form by removing redundancy and noise, eliminating missing and non-numerical values, and also by normalization. Data analysis and visualization are carried out to improve the statistical analysis of given data. Logistic regression is carried out on the data since it contains lot of columns with categorical values. Accuracy, precision, and f1 score of the model have been measured. Various conclusions can be drawn from this interdependent data set and can be stored as historical data for future analysis. Linear Regression is also carried out on the data set and r-squared values noted. R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression. In this paper a data set of different car manufacturers in the automobile industry is taken and analysed. A ML model is built by employing both logistic regression and linear regression for the automobile industry. This Business Intelligence model is a boon to the manufacturers and sales department in identifying their product in the 21st century market.

Keywords: Data Analytics and Machine Learning, Data pre-processing, Logistic regression, accuracy, precision, and f1 score, linear regression, categorical values, data analysis and visualization, R-squared, Business Intelligence



PDF | DOI: 10.17148/IJARCCE.2019.8712