A Sentiment Analysis and Topic Modelling of The Socio-Economic Registration 2022


  • Indah Simbolon Politeknik Statistika STIS
  • Nicholas H Manurung Politeknik Statistika STIS
  • Sukma Andini Politeknik Statistika STIS
  • Lya Hulliyyatus Suadaa Politeknik Statistika STIS




Regsosek, Sentiment Analysis, Topic Modeling


Socio-Economic Registration or Regsosek is an activity of Statistics Indonesia (BPS) that aims to collect data related to the profile, social and economic conditions, and welfare levels of all residents in 514 regencies/cities in Indonesia. One indicator of the success of Regsosek 2022 is the response and opinion from the community regarding the activity. The response and opinion can provide an overview of the implementation of Regsosek 2022 so that the picture can be used as a lesson learned to carry out the following population data collection. This study uses several methods to analyze the results of community responses and opinions on Regsosek activities, especially on Twitter social media. The method used in this research is sentiment analysis classification with four techniques: Naïve Bayes, Nearest Centroid, K-Nearest Neighbors, and Support Vector Machine. Then, the performance of the four techniques will be compared. In addition, the topic modeling method will also be used with two techniques, namely Latent Semantic Analysis and Latent Dirichlet Allocation. Data is collected using web scraping techniques. The results obtained from the sentiment analysis classification are that the Nearest Centroid method provides the best results with a relatively high and balanced f1-score value in positive and negative sentiments, which are 59% and 66%, respectively. Moreover, LDA modeling results are better than the LSA method for topic modeling results.




How to Cite

Simbolon, I., Manurung, N. H., Andini, S., & Lya Hulliyyatus Suadaa. (2023). A Sentiment Analysis and Topic Modelling of The Socio-Economic Registration 2022. Proceedings of The International Conference on Data Science and Official Statistics, 2023(1), 73–83. https://doi.org/10.34123/icdsos.v2023i1.301