Implementasi Klasifikasi Penyakit Liver dengan Teknik Penanganan Data Tidak Seimbang Synthetic Minority Oversampling Technique – Edited Nearest Neighbor
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Fakultas Ilmu Komputer
Abstract
Liver disease is a general term that refers to various disorders or abnormalities affecting the liver, including fatty liver disease, cirrhosis, hepatitis, liver cancer, liver tumors, and others. Due to non-specific symptoms, liver disease is challenging to diagnose at an early stage. However, early diagnosis is important to enable timely treatment, thereby preventing disease progression to more severe stages. Machine learning can be used for the early diagnosis of liver disease using classification methods. However, if the data used is imbalanced, it can bias the classification model. In this research, a liver disease classification model is built using the k-Nearest Neighbor (k-NN), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) algorithms by applying Synthetic Minority Oversampling Technique - Edited Nearest Neighbors (SMOTE-ENN) to handle imbalanced data. SMOTE-ENN was applied by experimenting with several values of the sampling_strategy (SMOTE) hyperparameter. Hyperparameter tuning was also performed on both SMOTE-ENN and the classification algorithm using Grid Search. Based on accuracy, precision, recall, and F1-score, the RF model with sampling_strategy (SMOTE) = default and {0:475} using a 90:10 data split achieved the best performance. Both configurations achieved accuracy, precision, recall, and F1-score of 88%, 93%, 90%, and 92%, respectively. Based on further evaluation using learning curve, cross-validation, confusion matrix heatmap, Receiver Operating Characteristic (ROC) curve, and Precision-Recall (PR) curve, the selected model for implementation in the web application was the RF model with sampling_strategy (SMOTE) = {0:475}.
Description
Approved by Teddy
