Abstract: The thyroid gland produces thyroid hormones levothyroxine (abbreviated T4) and triiodothyronine (abbreviated T3). These hormones play an important role in protein synthesis, body temperature regulation, and total energy generation and regulation. Many disorders affect the thyroid gland, some of which are very frequent, such as hypothyroidism and hyperthyroidism. Thyroid disorders (TD) impact 42 million individuals in India, with hypothyroidism being the most common, affecting one in every ten adults. According to a study report published in the journal Lancet in February 20221 type 1 diabetes among people under the age of 25 accounted for at least 73.7% of the overall 16,300 diabetes fatalities in this age group in 2019. This is even though this illness is largely treatable. To reduce such TD, early detection of the disease is essential. A fast, accurate, and interpretable machine learning model is a research subject. Fewer features reduce the computational effort and improve interpretation. A 3-Stage hybrid feature selection approach and several classification models are evaluated on the TD dataset obtained from the kaggle.com website with 29 features and one outcome variable. Stage-1 uses a Genetic Algorithm and Logistic Regression Architecture for Feature Selection and selects 13 features well correlated with the class but not among themselves. Stage-2 utilizes the same Genetic Algorithm and Logistic Regression Architecture for Feature Selection to select 11 features. In Stage-3, Logistic Regression (LR), Naïve Bayes (NB), Support Vector Machine (SVM), Extra Trees (ET), Random Forest (RF), and Gradient Boosting (GDB) are used with the 11 features to identify patients with or without TD. Data splitting, several metrics, and statistical tests are used, along with 10-fold cross-validation, to do a comparative analysis. LR, NB, SVM, ET, RF, and GDB demonstrate improvement across performance measures by reducing the number of features to 11. When compared to prior research, many performance metrics such as accuracy, sensitivity, specificity, f-measure, AUC values, and kappa statistics showed superior outcomes with fewer features. Finally, with 100% classification results, the proposed ensemble model demonstrated its worth. The output findings were compared to those of previous research on the same dataset, and the proposed model was determined to be the most successful across all performance dimensions.
1. https://www.downtoearth.org.in/news/health/1-in-10-indians-have-hypothyroidism-61693
Keywords: Thyroid Disorders, Machine Learning Classifiers, Feature Selection, Genetic Algorithm(GA), Extra Trees(ET), Gradient Boosting(GDB), Random Forest(RF)
| DOI: 10.17148/IJARCCE.2022.11341