Business Description Categorization to the Five-Digit Indonesian Standard Classification of Business Field (KBLI) Using Machine Learning and Transfer Learning

Authors

  • Muh. Alfian Amnur STIS Polytechnic of Statistics, Indonesia, Jakarta, Indonesia
  • La Ode Muhammad Gazali STIS Polytechnic of Statistics, Indonesia, Jakarta, Indonesia
  • Amir Mumtaz Siregar STIS Polytechnic of Statistics, Indonesia, Jakarta, Indonesia
  • Faruq Ariya Jalaksana STIS Polytechnic of Statistics, Indonesia, Jakarta, Indonesia
  • Made Nisa Rahayu Ananda Suwendra STIS Polytechnic of Statistics, Indonesia, Jakarta, Indonesia
  • Nurul Fadila Utami STIS Polytechnic of Statistics, Indonesia, Jakarta, Indonesia
  • Alif Median Ramadhan STIS Polytechnic of Statistics, Indonesia, Jakarta, Indonesia
  • Elisse Krisela Fabrianne STIS Polytechnic of Statistics, Indonesia, Jakarta, Indonesia
  • Eurorea Wirata Raja Panjaitan STIS Polytechnic of Statistics, Indonesia, Jakarta, Indonesia
  • Fitri Aini Izzati STIS Polytechnic of Statistics, Indonesia, Jakarta, Indonesia
  • Jernita Bintang Yuliani Manalu STIS Polytechnic of Statistics, Indonesia, Jakarta, Indonesia
  • Muhammad Gilang Hidayat STIS Polytechnic of Statistics, Indonesia, Jakarta, Indonesia
  • Lya Hulliyyatus Suadaa STIS Polytechnic of Statistics, Indonesia, Jakarta, Indonesia
  • Budi Yuniarto STIS Polytechnic of Statistics, Indonesia, Jakarta, Indonesia
  • Setia Pramana STIS Polytechnic of Statistics, Indonesia, Jakarta, Indonesia

DOI:

https://doi.org/10.34123/icdsos.v2025i1.719

Keywords:

IndoBERT, KBLI, machine learning, transfer learning, text classification

Abstract

The Indonesian Standard Classification of Business Fields (KBLI) is essential for economic statistics, yet manual classification of business descriptions to five-digit KBLI codes is time-consuming and prone to inconsistencies. This study aims to develop and compare machine learning (Support Vector Machine and Random Forest) and transfer learning  (IndoBERT) models for automating KBLI classification, supported by the preparation of synthetic and real-world datasets for model training. The synthetic data were generated using large language models, validated through human majority voting and complemented with realworld data from the National Labor Force Survey (Sakernas) and the Micro and Small Industry Survey (IMK). The findings indicate that Fine-tuned IndoBERT achieved superior performance, achieving an F1-score of 92.99% and an accuracy of 93.40% on synthetic data, alongside top-1, top-5, and top-10 accuracies of 32.93%, 54.71%, and 63.24% on real-world data. The deployment of fine-tuned IndoBERT as a RESTful API demonstrates its scalability and efficiency, presenting a reliable solution for large-scale KBLI classification in official statistics. 

Downloads

Published

2025-12-22

How to Cite

Amnur, M. A., Muhammad Gazali, L. O., Mumtaz Siregar, A., Ariya Jalaksana, F., Nisa Rahayu Ananda Suwendra, M., Fadila Utami, N., Median Ramadhan, A., Krisela Fabrianne, E., Wirata Raja Panjaitan, E., Aini Izzati, F., Bintang Yuliani Manalu, J., Gilang Hidayat, M., Hulliyyatus Suadaa, L., Yuniarto, B., & Pramana, S. (2025). Business Description Categorization to the Five-Digit Indonesian Standard Classification of Business Field (KBLI) Using Machine Learning and Transfer Learning. Proceedings of The International Conference on Data Science and Official Statistics, 2025(1), 558–575. https://doi.org/10.34123/icdsos.v2025i1.719