Klasifikasi Penyakit Anemia Defisiensi Gizi Menggunakan Majority Voting Ensemble
Abstract
Nutritional deficiency anemia, caused by inadequate absorption of essential substances for red blood cell formation, affects about 1.5 billion people globally, with varying prevalence rates. Diagnosis typically relies on Complete Blood Count (CBC) tests, which cannot specify anemia types, and advanced tests are often unavailable in smaller healthcare centers due to high costs. Therefore, developing cost-effective diagnostic methods is essential. This study develops a decision support system using machine learning algorithms for classifying nutritional deficiency anemia types based on CBC results. An ensemble majority voting approach, combining decision tree, random forest, and logistic regression algorithms, was optimized using hyperparameter tuning with GridSearchCV. Two voting methods, hard voting and soft voting, were applied. The dataset includes blood test results from 15,300 patients from 2013 to 2018, provided by Gaziosmanpaşa University, Tokat, Turkey. Results show that data preprocessing and hyperparameter tuning significantly improve the performance of the algorithms. Basic models without these optimizations performed worse. The hard voting ensemble achieved 99.94% in all metrics, while the soft voting ensemble achieved 99.95%. Combining optimized models in an ensemble improves classification performance and consistency. This study concludes that the soft voting ensemble approach significantly enhances the accuracy of nutritional deficiency anemia diagnosis using CBC results, offering an effective and efficient alternative to advanced tests, especially in resource-limited settings.