Job Competency Extraction in Information and Technology Sector Using K-Means and Non-Negative Matrix Factorization (NMF) Algorithms

Authors

  • Alfitra Rifa Geandra Politeknik Statistika STIS, Jakarta, Indonesia
  • Amir Mumtaz Siregar Politeknik Statistika STIS, Jakarta, Indonesia
  • Rani Nooraeni Politeknik Statistika STIS, Jakarta, Indonesia

DOI:

https://doi.org/10.34123/icdsos.v2025i1.684

Keywords:

competency extraction, digital labor market, K-Means, NMF, text mining

Abstract

The advancement of information technology has led to a surge in online job vacancy data, which contains valuable information about the skill demands in the digital labor market. This study aims to extract job competency in the information and technology sector using a combination of KMeans clustering and Non-Negative Matrix Factorization (NMF). A total of 350 job postings were collected from the Kalibrr platform and processed through web scraping, text preprocessing, and feature representation using TF-IDF. The clustering results indicate that the optimal configuration consists of 10 clusters, as evaluated using the Silhouette Score and Davies-Bouldin Index. Each cluster represents a specific job topic, such as backend development, data science, QA automation, cybersecurity, and digital marketing. The results offer a structured overview of digital skill demands and can be utilized by educational institutions, training providers, and labor policy makers. However, the dataset’s limited size, reliance on a single job platform, and the use of traditional machine learning techniques may not capture all semantic variations and complexities present in the broader job market. Consequently, future work should involve larger and more diverse datasets as well as advanced deep learning text representation approaches to enhance the robustness and generalizability of the results. 

Downloads

Published

2025-12-22

How to Cite

Rifa Geandra, A., Mumtaz Siregar, A., & Nooraeni, R. (2025). Job Competency Extraction in Information and Technology Sector Using K-Means and Non-Negative Matrix Factorization (NMF) Algorithms. Proceedings of The International Conference on Data Science and Official Statistics, 2025(1), 527–542. https://doi.org/10.34123/icdsos.v2025i1.684