Please use this identifier to cite or link to this item: https://repository.unej.ac.id/xmlui/handle/123456789/112593
Title: Forming Dataset of The Undergraduate Thesis using Simple Clustering Methods
Authors: DHARMAWAN, Tio
CANDRAMAYA, Chinta ’Aliyyah
WIDHARTA, Vandha Pradwiyasma
Keywords: Document Clustering
Text Mining
Relevant Term
Information Retrieval
Topic Identification
Issue Date: 1-Jan-2023
Publisher: INTERNATIONAL JOURNAL OF INNOVATION IN ENTERPRISE SYSTEM
Abstract: Each university collects many undergraduate theses data but has yet to process it to make it easier for students to find references as desired. This study aims to classify and compare the grouping of documents using expert and simple clustering methods. Experts have done ground truth using OR Boolean Retrieval and keyword generation. The best cluster was discovered by the experiments using the K-Means, K-Medoids, and DBSCAN clustering methods and using Euclidean, Manhattan, City Block, and Cosine Similarity metrics. The cluster with the best Silhouette Score compared to the accuracy of the categorization of each document. The K-Means clustering method and the Cosine Similarity metric gave the best results with a Silhouette Score value of 0.105534. The comparison between ground truth and the best cluster results shows an accuracy of 33.42%. The result shows that the simple clustering method cannot handle data with Negative Skewness and Leptokurtic Kurtosis.
URI: https://repository.unej.ac.id/xmlui/handle/123456789/112593
Appears in Collections:LSP-Jurnal Ilmiah Dosen

Files in This Item:
File Description SizeFormat 
FASILKOM_Forming Dataset of The Undergraduate Thesis using Simple.pdf1.17 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.