Nikolaos Baroutis, "Adaptive neuro-fuzzy inference systems (ANFIS) applied on medical diagnosis", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2017
https://doi.org/10.26233/heallink.tuc.69917
The last thirty years Artificial Intelligence (AI) and Machine Learning (ML) used forcomputer systems to make fast, inexpensive, non invasive medical predictions and have a crucial importance as supporting tools for the doctors. Since 2013, cardiovascular disease (CVD) is the number one killer factor in the world with 31% of global population and also requires very costly and time consuming hospital treatment. From CVD 42% of the deaths are because of the coronary heart disease (CHD) which we research in this thesis and by using AI and/or ML tobuild a Computer Aided Diagnosis (CAD) diagnosis system which offers optimal predictability.CHD is the cause of many other CVDs and is incriminated for brain stroke too. CHD is the stenosis of the main heart arteries caused when a wax substance called plaque builds up inside the coronary arteries. narrowing the coronary arteries and reducing the flow to the heart, leading to serious heart problems or heart failure. The danger of the disease is the silent appearance. The causes are: the age, sex, high cholesterol levels, angina, abnormal blood pressure, the years as smoker, the number of smoking cigarettes per day, family history, high fasting blood sugar, anxiety and the lack of exercise. In this Thesis we examine the problem of Computer Aided Diagnosis (CAD) ofCoronary Heart Disease (CHD), which classifies patients as well as possible with respect to the optimal minimization of the cost of diagnosis, the speed and the less stress and pain for the patients. By using AI and/or ML techniques our goal is to classify the patients in three levels of risk: Absence - Medium high - Very high risk differentiating our research from the previous researches since 1988 where the classification was binary (absence or presence). Then to achieve better results we went deeper into the data science and by using various data preprocessingtechniques we aim to construct different datasets of patient’s diagnosis data in order to find which dataset offers the best result. Furthermore, based on the above proposed concept we set apart our method even more by proposing a new dataset of patient’s diagnosis data which is different than the data of previous researches. To achieve this, we consulted by cardiologist and used datapreprocessing techniques. We used the database from University of Cleveland which includes 298 patient cases, with 13 parameters per patient, used since 1988. Moreover, we used the patient’s datasets of University of California Irvine (UCI) machine learning repository, which have 4% missing data of the 15% patient cases. In order to increase the Cleveland’s database, we recovered the missing data of UCI’s database, using statistical data preprocessing. The result is to increase theCleveland’s dataset by 21%. In collaboration with the cardiologist we constructed and proposed a new diagnosis dataset for each patient, including for each patient a subset of the existing until now parameters, such as: data from the interview answers, the biochemical blood test and from the electrocardiograph (ECG) test, excluding the parameters of stress test and fluoroscopy test. We applied statistical data preprocessing on data and we processed them with the following AI and ML techniques: A) Adaptive Neuro-fuzzy Inference Systems (ANFIS) based on, i) Subtractive Clustering, ii) Fuzzy C Means, iii) Particle Swamp Optimization, iv) Genetic Algorithm, v) using datasets from PCA with all the above techniques again, B) Artificial Neural Networks (ANN). The mission was to find which strategy will export diagnosis with the optimal accuracy. After multiply adjustments on the above techniques a multilayer Neural Network was is the best. We created a unique appropriate weight initialization for the feed forward pass and for the scaled conjugate gradient descent algorithm, also adjusted the levels, the nodes and the split viii ratio. 74% accuracy - mean value for the three classes. Specifically, the class Absence, which is the most important for the patient’s safety on the scale of credibility based on ROC performance{Almost excellent, Very Good, Good, Mediocre, Worthless} has Very Good credibility. The classes Medium high and Very high risk have Good credibility. The supporting diagnosis system uses data from basic questions to the patient, simple biochemical examination and ECG, excluding the invasive-expensive-time consuming examinations such as the stress test and the fluoroscopy.