Peningkatan Performa Rekognisi Code smell pada Bahasa Pemrograman Python Menggunakan Metode SMOTEENN dan Decision Tree Pruning
Abstract
This research addresses the enhancement of code smell recognition through an
extensive investigation into two prominent types, Large Class and Long Method,
utilizing advanced machine learning techniques. The study commences by
establishing a comprehensive understanding of the code smell phenomenon, its
detrimental impact on software quality, and the significance of efficient detection
methods. Employing a novel approach, the research leverages two primary
datasets, Large Class and Long Method, and introduces a meticulous data
collection process, incorporating numerical features representing various
program metrics. The preprocessing stage involves normalization and feature
selection, refining the datasets for subsequent analysis. The utilization of the
SMOTEENN technique proves instrumental in addressing class imbalance,
resulting in a more balanced distribution and improved correlation between
features and target variables. Model development encompasses hyperparameter
optimization using RandomizedSearchCV and cross-validation, yielding refined
decision trees, random forests, and boosting algorithms. The results showcase
remarkable performance enhancements, outperforming previous studies, with
accuracy and Matthews Correlation Coefficient reaching up to 99.69% and
99.38%, respectively, for Long Method detection using XGBoost and
SMOTEENN. The overall findings underscore the efficacy of ensemble methods
and data augmentation techniques in bolstering code smell recognition models.
The research concludes by emphasizing the substantial advancements achieved in
comparison to prior studies and suggests future investigations into diverse code
smell types to further enrich dataset diversity and improve the robustness of the
models.