Analisis Perbandingan Metode Tf-idf Dan Word2vec Dengan Penggunaan N-gram Terhadap Pencarian Artikel Ilmiah Pada Repositori Digital
Abstract
The paper search system helps researchers quickly find the articles they need from the number of articles that continues to increase every year. This research compares the use of the TF-IDF and Word2vec methods as text representation methods and will evaluate which method is better in the data search system for scientific articles on scraped data from journals managed by BRIN. The N-Gram method is also used to equate the language of article data by detecting and translating it into English. Based on research that has been carried out, the precision of the TF-IDF method produces higher values than the Word2Vec method. The highest precision value in TF-IDF is 96.27% obtained from a 9:1 ratio model with a parameter value of max_features 25000 and Top-N 5. The highest precision value in Word2Vec is 91.60% obtained from a 9:1 ratio model with a vector_size parameter value of 100, window 5, and Top-N 5.