← Back to VOLUME 15, ISSUE 6, JUNE 2026
This work is licensed under a Creative Commons Attribution 4.0 International License.
MACHINE LEARNING TECHNIQUES FOR DIABETES PREDICTION: A COMPREHENSIVE REVIEW
Tanu Sharma, Naveen Sharma
π 6 viewsπ₯ 2 downloads
Abstract: Diabetes mellitus is one of the most common and heavy non-communicable diseases globally, impacting around 537 million adult people worldwide by 2021 and is expected to further increase to 783 million by 2045. Early and accurate diabetes prediction is essential for prompt clinical intervention in order to minimize diabetes complications, and to decrease healthcare costs. The application of machine learning (ML) algorithms has grown to be a powerful approach to identify non-linear, complex patterns in clinical and demographic data that can allow for early stage risk stratification.The aim of this paper is to systemize and comprehensively review the current state-of-the-art machine learning approaches used for diabetes prediction. It critically synthesises results from over 60 peer-reviewed research studies published between 2015β2024, compares the performance of models against each other, and considers a geographically focused case study analysing the use of the model in the Punjab region of India where diabetes prevalence rates have exceeded the national average.Literature search was performed using PubMed, IEEE Xplore, Scopus, and Web of Science database, following the PRISMA guidelines. Various algorithms are examined, such as logistic regression, support vector machines, decision trees, random forests, XGBoost, Naive Bayes, k-nearest neighbour, and deep learning architectures like artificial neural networks, convolutional neural networks and long short-term memory networks. The accuracy, sensitivity, specificity, F1-score and area under the receiver operating characteristic curve (AUC) are systematically compared.Benchmark datasets like PIMA Indian Diabetes Database and CDC BRFSS consistently achieve the best predictive performance (AUC 91%-96%) for XGBoost and deep learning architectures. Ensemble methods are better in generalisation than single classifiers. In the Punjab region case study a Random Forest model trained on regional eHR was able to reach an accuracy of 89.4% and an AUC of 0.93, with the best predictive features identified as glucose level, BMI, age and family history. Diabetes prediction using a machine learning approach has great clinical and public health benefits, especially if models are adapted for regional epidemiological aspects. Issues of data sparsity, class imbalance and model explainability are significant challenges that need to be overcome to enable responsible clinical use. Going forward, the need and the focus should be on federated learning, explainable Artificial Intelligence (XAI), and multimodal data sources integration.
Keywords: diabetes mellitus; machine learning; deep learning; XGBoost; random forest; PIMA dataset; clinical decision support; Punjab; predictive modelling; artificial intelligence
Keywords: diabetes mellitus; machine learning; deep learning; XGBoost; random forest; PIMA dataset; clinical decision support; Punjab; predictive modelling; artificial intelligence
How to Cite:
[1] Tanu Sharma, Naveen Sharma, βMACHINE LEARNING TECHNIQUES FOR DIABETES PREDICTION: A COMPREHENSIVE REVIEW,β International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2026.15646
