Peningkatan Performa Clustering pada Large Text Dataset Menggunakan Stamantic Spherical K-Means

ADRIENUFA, Kharisma Trinanda

Please use this identifier to cite or link to this item: https://repository.unej.ac.id/xmlui/handle/123456789/123395

Full metadata record

DC Field	Value	Language
dc.contributor.author	ADRIENUFA, Kharisma Trinanda	-
dc.date.accessioned	2024-08-12T02:31:58Z	-
dc.date.available	2024-08-12T02:31:58Z	-
dc.date.issued	2023-07-26	-
dc.identifier.nim	192410103019	en_US
dc.identifier.uri	https://repository.unej.ac.id/xmlui/handle/123456789/123395	-
dc.description.abstract	Document clustering can’t avoid the problem of high dimensionality, which can be overcome by combining the advantage of statistical and semantic features. This study aims to determine the performance of clustering with the Stamantic (statistical and semantic) feature extraction technique compared to the several Bag Of Words Model (Bag Of All Word, Bag Of Noun, Bag Of Noun and Adjective) as well as a comparison between Spherical K-Means and K-Means++ clustering algorithm. Stamantic feature extraction use the Wordnet (Wn) database to form semantic features, while statistical features are obtained from TF-IDF (Term Frequency Inverse Document Frequency) word sense. Evaluation were carried out on clustering results with several metrics. The highest Silhouette score is 0.162213 on the BONA feature from Pubmed dataset which clustered with K-Means++ algorithm. The highest Purity score around 0.949643 on the BONA feature from Scopus dataset with Spherical K-Means algorithm. The highest AMI (Adjusted Mutual Information) score is 0.880835 on the BONA feature from Scopus dataset with Spherical K-Means clustering algorithm. The test results show that the Stamantic feature loses to all BOW features. Due to the loss data information from the effect of using Wn library and disambiguation process which inappropriate.	en_US
dc.description.sponsorship	Bapak Achmad Maududie, ST., M.Sc. Bapak Tio Dharmawan, S.Kom., M.Kom	en_US
dc.language.iso	other	en_US
dc.publisher	Fakultas Ilmu Komputer	en_US
dc.subject	Clustering	en_US
dc.subject	Stamantic Spherical K-Means	en_US
dc.title	Peningkatan Performa Clustering pada Large Text Dataset Menggunakan Stamantic Spherical K-Means	en_US
dc.title.alternative	Clustering Enhancement Of Large Text Dataset Using Stamantic Spherical K-Means	en_US
dc.type	Skripsi	en_US
dc.identifier.prodi	Informatika	en_US
dc.identifier.pembimbing1	Achmad Maududie ST, M.Sc.	en_US
dc.identifier.pembimbing2	Tio Dharmawan, S.Kom., M.Kom	en_US
dc.identifier.validator	validasi_repo_ratna_juni_2024	en_US
dc.identifier.finalization	0a67b73d_2024_07_tanggal 10	en_US
Appears in Collections:	UT-Faculty of Computer Science

Files in This Item:

File	Description	Size	Format
skripsi repo.pdf Until 2028-01-27		1.02 MB	Adobe PDF	View/Open Request a copy

Show simple item record

Admin Tools

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets