Analisis Metode Smote, Adasyn dan K - Means Smote untuk Menangani Ketidakseimbangan Data Dalam Klasifikasi Penyakit Diabetes
Abstract
This study aims to find out the impact of the SMOTE, ADASYN, and K-Means SMOTE methods to overcome data imbalance in the diabetes disease dataset. To find out the impact of these methods, machine learning algorithms such as Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) were used. The trial was carried out by dividing the ratio of training data and test data into 7:3, 8:2, and 9:1 including the K/N parameter in each method. In the trial without using the method, the SVM and KNN algorithms produced a recall that was smaller than the precision. After using the method, there was an increase in recall of 59% - 75% in each algorithm. The Recall value in KNN even reached 100%, using the SMOTE and ADASYN methods. Although the resulting performance increased recall, it reduced the accuracy value by up to 17%. Of the three methods, K-Means SMOTE was able to make a higher increase than the SMOTE and ADASYN methods. This is proven by the performance produced by the KNN algorithm which has a performance of 98% accuracy, 97% precision, and 98% f1-score.