Abstract: Income prediction from demographic data remains challenging due to inherent class imbalance and the black box nature of modern machine learning algorithms. This study develops a comprehensive explainable AI framework to predict income levels using the Adult Income dataset while addressing the critical 76/24 class distribution skew. The research implements and compares four state of the art algorithms (XGBoost, LightGBM, RandomForest, and CatBoost) enhanced with SMOTE balancing and optimal threshold selection. Through systematic application of SHAP, LIME, and permutation importance methods, the framework provides transparent model interpretability. Results demonstrate that LightGBM achieves the best performance with 72.91% F1 score and 82.23% balanced accuracy after threshold optimization, representing a significant improvement over baseline models. The XAI analysis reveals marital status and capital gains as dominant predictive features, with strong consensus across explainability methods. Learning curve analysis confirms model convergence at approximately 35,000 samples with minimal overfitting gaps below 3%. The framework's novelty lies in combining multiple explainability techniques with systematic threshold optimization for imbalanced data. These findings have important implications for fair and transparent automated decision making in financial services, lending, and human resource applications where understanding model reasoning is crucial.

Keywords: Explainable artificial intelligence, income prediction, class imbalance, threshold optimization, SHAP analysis, machine learning interpretability.


Downloads: PDF | DOI: 10.17148/IJARCCE.2025.14801

How to Cite:

[1] May Stow, "Explainable Machine Learning Framework for Income Prediction with Class Imbalance Optimization," International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2025.14801

Open chat
Chat with IJARCCE