Klasifikasi Penyakit Hati Menggunakan Algoritma XGBoost, Neural Network, dan Decision Tree dengan Seleksi Fitur Least Absolute Shrinkage and Selection Operator

KURNIASARI, Ila Rahayu

View/Open

Ila Rahayu Kurniasari - Ilmu Komputer.pdf (1.024Mb)

Date

2024-07-25

Author

KURNIASARI, Ila Rahayu

Metadata

Show full item record

Abstract

Liver disease is a health problem that significantly affects liver function and quality of life. Common causes include infection, injury, certain medications, exposure to harmful substances, and genetic factors. This study aims to classify liver diseases using three machine learning algorithms for comparison: XGBoost, Neural Network (NN), and Decision Tree (DT). There are 3 datasets used, consisting of the Indian Liver Patient Dataset (ILPD) from the UCI Machine Learning Repository, the BUPA Medical Research Liver Disorder liver disease dataset, and from the Kaggle Liver Patient Dataset website. Research begins with data pre-processing, which involves cleaning and normalizing the data, to prepare for further analysis. Two scenarios are used for classification. In the first scenario, the pre-processed dataset undergoes feature selection using the Least Absolute Shrinkage and Selection Operator (LASSO) to identify the attributes most correlated with liver disease. In the second scenario, the pre-processed dataset goes directly to classification. The processed dataset is then used to train and test the XGBoost, NN (Multilayer Perceptron), and DT models. Research findings show that the disease prediction. Using LASSO feature selection, the Kaggle dataset shows the best results with XGBoost achieving an accuracy of 1.0, followed by the ILPD dataset using XGBoost with an accuracy of 0.70, and finally the Liver Disorder dataset using NN with an accuracy of 0.68. Without feature selection, the dataset from the Kaggle website using the XGBoost algorithm reaches a value of 1, followed by the NN algorithm from the Kaggle dataset with an accuracy value of 0.82 and the lowest among the three datasets is the Decision Tree algorithm from the ILPD dataset with an accuracy of 0.66.

URI

https://repository.unej.ac.id/xmlui/handle/123456789/124006

Collections

UT-Faculty of Computer Science [1056]