Abstract: Accurate prediction of healthcare insurance costs plays a crucial role in improving cost management, policy design, and healthcare planning. This study investigates the effectiveness of various machine learning (ML) algorithms in forecasting healthcare insurance expenditures and identifies the most suitable model for reliable cost estimation. A publicly available dataset containing demographic and lifestyle-related attributes such as age, sex, body mass index (BMI), number of children, smoking status, and region was utilized. Multiple regression-based ML models, including Linear Regression (LR), Support Vector Regression (SVR), Random Forest Regressor (RFR), XGBoost Regressor (XGBR), LightGBM (LGBM), and Gradient Boosted Regression (GBR), were implemented and compared. The evaluation results demonstrate that the GBR model outperformed other approaches by achieving the lowest mean squared error (MSE = 18,153,562.14) and mean absolute error (MAE = 2,270.97), along with the highest coefficient of determination (R² = 0.87), peak signal-to-noise ratio (PSNR = 22.97), and signal-to-noise ratio (SNR = 9.97). Cross-validation further confirmed its robustness, with the tenth fold achieving an R² of 0.91. To enhance model interpretability, explainable artificial intelligence (XAI) tools such as SHAP and LIME were applied to the final GBR model, revealing that “region” and “smoker” were the most influential factors affecting insurance costs. The findings confirm that GBR, combined with explainable AI techniques, offers a robust, transparent, and reliable solution for predicting healthcare insurance costs. Future work will focus on integrating more advanced explainable frameworks and real-world healthcare datasets to further improve reliability and applicability.

Keywords: Healthcare insurance cost prediction; machine learning; explainable artificial intelligence (XAI); regression models; gradient boosting


Downloads: PDF | DOI: 10.17148/IJARCCE.2025.141249

How to Cite:

[1] Md. Shahidur Rahman Saklain, Antar Sarker, Md. Sadiq Iqbal, "Optimized Ensemble Regression with Explainable AI for Interpretable Healthcare Cost Prediction," International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2025.141249

Open chat
Chat with IJARCCE