Implementasi Klasifikasi Penyakit Liver dengan Teknik Penanganan Data Tidak Seimbang Synthetic Minority Oversampling Technique – Edited Nearest Neighbor

Nurifatul Laily

Implementasi Klasifikasi Penyakit Liver dengan Teknik Penanganan Data Tidak Seimbang Synthetic Minority Oversampling Technique – Edited Nearest Neighbor

Files

Skripsi Repository.pdf (1.42 MB)

Date

2026-04-27

Authors

Nurifatul Laily

Publisher

Fakultas Ilmu Komputer

Abstract

Liver disease is a general term that refers to various disorders or abnormalities affecting the liver, including fatty liver disease, cirrhosis, hepatitis, liver cancer, liver tumors, and others. Due to non-specific symptoms, liver disease is challenging to diagnose at an early stage. However, early diagnosis is important to enable timely treatment, thereby preventing disease progression to more severe stages. Machine learning can be used for the early diagnosis of liver disease using classification methods. However, if the data used is imbalanced, it can bias the classification model. In this research, a liver disease classification model is built using the k-Nearest Neighbor (k-NN), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) algorithms by applying Synthetic Minority Oversampling Technique - Edited Nearest Neighbors (SMOTE-ENN) to handle imbalanced data. SMOTE-ENN was applied by experimenting with several values of the sampling_strategy (SMOTE) hyperparameter. Hyperparameter tuning was also performed on both SMOTE-ENN and the classification algorithm using Grid Search. Based on accuracy, precision, recall, and F1-score, the RF model with sampling_strategy (SMOTE) = default and {0:475} using a 90:10 data split achieved the best performance. Both configurations achieved accuracy, precision, recall, and F1-score of 88%, 93%, 90%, and 92%, respectively. Based on further evaluation using learning curve, cross-validation, confusion matrix heatmap, Receiver Operating Characteristic (ROC) curve, and Precision-Recall (PR) curve, the selected model for implementation in the web application was the RF model with sampling_strategy (SMOTE) = {0:475}.

Description

Approved by Teddy

Keywords

Liver Disease, SMOTE-ENN, k-Nearest Neighbor, Random Forest, Extreme Gradient Boosting

URI

https://repository.unej.ac.id/handle/123456789/9932

Collections

UT-Faculty of Computer Science

Full item page

Implementasi Klasifikasi Penyakit Liver dengan Teknik Penanganan Data Tidak Seimbang Synthetic Minority Oversampling Technique – Edited Nearest Neighbor

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By