Klasifikasi Penyakit Hati Menggunakan Algoritma XGBoost, Neural Network, dan Decision Tree dengan Seleksi Fitur Least Absolute Shrinkage and Selection Operator
Abstract
Liver disease is a health problem that significantly affects liver function and
quality of life. Common causes include infection, injury, certain medications,
exposure to harmful substances, and genetic factors. This study aims to classify
liver diseases using three machine learning algorithms for comparison: XGBoost,
Neural Network (NN), and Decision Tree (DT). There are 3 datasets used,
consisting of the Indian Liver Patient Dataset (ILPD) from the UCI Machine
Learning Repository, the BUPA Medical Research Liver Disorder liver disease
dataset, and from the Kaggle Liver Patient Dataset website. Research begins with
data pre-processing, which involves cleaning and normalizing the data, to prepare
for further analysis. Two scenarios are used for classification. In the first scenario,
the pre-processed dataset undergoes feature selection using the Least Absolute
Shrinkage and Selection Operator (LASSO) to identify the attributes most
correlated with liver disease. In the second scenario, the pre-processed dataset goes
directly to classification. The processed dataset is then used to train and test the
XGBoost, NN (Multilayer Perceptron), and DT models. Research findings show
that the disease prediction. Using LASSO feature selection, the Kaggle dataset
shows the best results with XGBoost achieving an accuracy of 1.0, followed by the
ILPD dataset using XGBoost with an accuracy of 0.70, and finally the Liver
Disorder dataset using NN with an accuracy of 0.68. Without feature selection, the
dataset from the Kaggle website using the XGBoost algorithm reaches a value of 1,
followed by the NN algorithm from the Kaggle dataset with an accuracy value of
0.82 and the lowest among the three datasets is the Decision Tree algorithm from
the ILPD dataset with an accuracy of 0.66.