Abstract: The following study concerns exploring the performance of multiple regression algorithms of machine learning in the context of house pricing, while attempting to enhance the precision and offering practical implications for the stakeholders in the real estate industry. Using dataset that is collected from the real estate platforms, property records and other fresh data obtained directly from the real estate agencies, models like Random Forest, Gradient Boosting Machines (GBM), XGBoost, Support Vector Regression (SVR) and Neural Networks are examined. It also entails carrying out massive data preprocessing, feature construction, and other computationally expensive steps such as tuning of hyperparameters for achieving high accuracy. The residual plots indicate the prediction accuracy of each of the 23 models of some levels and weakness in the various methods employed in the models. For example, Random Forest and XG Boost exhibit typical non-linear patterns to capture, but they have heteroscedasticity to some extent in residuals. On the other hand, standard models like the SVR with the linear kernel show some level of failure in dealing with the interleaved pattern between the data, resulting in systematic biases. Thus, it is crucial to choose a right model depending on the data set properties and certain market conditions are considered in the study. Thus, it is seen that this research adds to the literature on machine learning real estate by offering a step-by-step comparison of these five advanced regression techniques that will be useful in determining the effectiveness of such techniques in the prediction of housing prices. Acquired knowledge is expected to benefit, for instance, real estate agents, investors, and policy-makers towards increasing market transparency leading to efficiency.
Keywords: Mortgage prediction, GBM, SVR, XGBoost, Elastic Net
| DOI: 10.17148/IJARCCE.2024.13902