Analisis Kinerja Kombinasi Efficientnet-B0 dan BILSTM Dalam Pengenalan Aksi Manusia pada Data Video
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Fakultas Ilmu Komputer
Abstract
Human Action Recognition (HAR) based on video data is a significant research
area in computer vision, particularly for surveillance, human–computer interaction,
and activity monitoring applications. This study proposes a video-based human
action recognition model that integrates EfficientNet-B0 as a spatial feature
extractor and Bidirectional Long Short-Term Memory (BiLSTM) for temporal
modeling, evaluated on the J-HMDB dataset. Each video is represented as a fixed
length sequence of 32 frames obtained through uniform temporal sampling. Frame
preprocessing includes resizing to 160×160 pixels and normalization using
ImageNet statistics to ensure compatibility with the pretrained EfficientNet-B0
model. Several modeling scenarios are evaluated, including the baseline
EfficientNet-B0 + BiLSTM, the addition of self-attention, the application of center
loss, and their combination. Model performance is assessed using accuracy,
precision, recall, F1-score, and confusion matrix analysis. The baseline model
achieves 95% across all evaluation metrics. Incorporating self-attention improves
performance consistently to 96%, demonstrating enhanced temporal feature
weighting. In contrast, the application of center loss alone reduces performance to
93% accuracy, 92% recall, and 91% F1-score, indicating limited global
effectiveness. The combination of self-attention and center loss restores
performance to 96%, but does not surpass the self-attention-only configuration.
Overall, the results indicate that EfficientNet-B0 combined with BiLSTM and self
attention provides the most optimal configuration, achieving stable and competitive
performance on the J-HMDB dataset while maintaining balanced spatial–temporal
modeling.
Description
:: Finalisasi Repositori File 25 Mei 2026_Kurnadi
