Comparison of Naive Bayes, K-Nearest Neighbor, and Support Vector Machine Classification Methods in Semi-Supervised Learning for Sentiment Analysis of Kereta Cepat Jakarta Bandung (KCJB)

Authors

  • Muhammad Farhan Politeknik Statistika STIS
  • Renata De La Rosa Manik Politeknik Statistika STIS
  • Hana Raihanatul Jannah Politeknik Statistika STIS
  • Lya Hulliyyatus Suadaa Politeknik Statistika STIS

DOI:

https://doi.org/10.34123/icdsos.v2023i1.332

Keywords:

supervised learning, Naive Bayes, K-NN, SVM, Sentiment Analysis, KCJB, Semi-Supervised Learning

Abstract

Transportation technology has developed very rapidly in the 21st century; one of them is high-speed trains. Currently, the Indonesian government is implementing the construction of the Kereta Cepat Jakarta-Bandung (KCJB) project in collaboration with China. The construction of this fast train project has attracted various comments and opinions from the public on Twitter and social media. This research aims to compare the classification methods of Naïve Bayes, K-Nearest Neighbor (K-NN), and Support Vector Machine (SVM) in classifying sentiment in tweets about high-speed trains obtained by scraping Twitter. The comparison process was carried out using semi-supervised learning, and the results showed that the semi-supervised SVM model had the best performance with an average accuracy of 86%, followed by the semi-supervised Naïve Bayes model and semi-supervised K-NN with an average accuracy of 81% and 58% respectively. Overall, the prediction results from the three models conclude that there are more tweets with negative sentiment than tweets with positive and neutral sentiment.

Downloads

Published

2023-12-29

How to Cite

Muhammad Farhan, Renata De La Rosa Manik, Hana Raihanatul Jannah, & Lya Hulliyyatus Suadaa. (2023). Comparison of Naive Bayes, K-Nearest Neighbor, and Support Vector Machine Classification Methods in Semi-Supervised Learning for Sentiment Analysis of Kereta Cepat Jakarta Bandung (KCJB). Proceedings of The International Conference on Data Science and Official Statistics, 2023(1), 109–120. https://doi.org/10.34123/icdsos.v2023i1.332