Abstract: Diabetes (Diabetes Mellitus), is a group of metabolic disorders and millions of people are affected. Detection of diabetes is of a great significance and serious complications should be concerned. Many research studies have been done on the diagnosis of diabetes, most of the research studies are based on one particular data set which is the Pima Indian diabetes data set. This Pima Indian data set is a data set of studies of women in India's population that began in 1965., and its onset rate is relatively high in diabetes. Most research studies were carried out prior to focusing primarily on one or two specialized complex techniques for testing data, while an inclusive research on several general techniques are missing. In this system, we extensively explore the most popular techniques in Machine Learning (e.g. KNN algorithm) used to identify the diabetes and pre-processing of data methods. We will examine this technique by the accuracy of the cross validation on the UCI ML repository data set.

Keywords: Machine learning, Classification, KNN, Diabetes.

