Abstract: This study investigates the application of machine learning algorithms to predict cancer risk levels based on a dataset of various risk factors. Four classification models, namely Support Vector Machines (SVM), Logistic Regression, Random Forest, and XGBoost, were trained and evaluated on a dataset containing patient information and associated risk factors. The data was preprocessed to handle categorical features and scale numerical features before splitting into training and testing sets. The models were trained on the training data and their performance was assessed using accuracy on the test set. Logistic Regression achieved the highest accuracy of 0.9000, followed by SVM (0.8800), XGBoost (0.8775), and Random Forest (0.8575). The results demonstrate the potential of machine learning models, particularly Logistic Regression, in predicting cancer risk levels based on the provided factors. This can aid in identifying individuals at higher risk and potentially facilitate early intervention strategies.

Keywords: Random Forest, XGBoost, Healthcare Analytics, Risk Factors, Data Science.


Downloads: PDF | DOI: 10.17148/IJARCCE.2025.1411113

How to Cite:

[1] Afrin Mubarak Shaikh, Mr. Deepak Singh, "“A Comparative Analysis of SVM, Logistic Regression, Random Forest, and XGBoost for Cancer Risk Prediction”," International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2025.1411113

Open chat
Chat with IJARCCE