Analisis Metode Smote, Adasyn dan K - Means Smote untuk Menangani Ketidakseimbangan Data Dalam Klasifikasi Penyakit Diabetes
Abstract
This study aims to find out the impact of the SMOTE, ADASYN, and K-Means  SMOTE methods to overcome data imbalance in the diabetes disease dataset. To  find out the impact of these methods, machine learning algorithms such as Support  Vector Machine (SVM) and K-Nearest Neighbor (KNN) were used. The trial was  carried out by dividing the ratio of training data and test data into 7:3, 8:2, and 9:1  including the K/N parameter in each method. In the trial without using the method,  the SVM and KNN algorithms produced a recall that was smaller than the  precision. After using the method, there was an increase in recall of 59% - 75% in  each algorithm. The Recall value in KNN even reached 100%, using the SMOTE  and ADASYN methods. Although the resulting performance increased recall, it  reduced the accuracy value by up to 17%. Of the three methods, K-Means SMOTE  was able to make a higher increase than the SMOTE and ADASYN methods. This  is proven by the performance produced by the KNN algorithm which has a  performance of 98% accuracy, 97% precision, and 98% f1-score.