Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos en-US Mon, 22 Dec 2025 10:12:22 +0000 OJS 3.2.1.4 http://blogs.law.harvard.edu/tech/rss 60 Correlation Analysis of Land Surface Temperature (LST) and Vegetation Density Using Landsat 8 and 5 Imagery in Purwakarta Regency https://proceedings.stis.ac.id/icdsos/article/view/528 <p>Urbanization and industrial development in urban areas have led to a decrease in vegetation and an increase in land surface temperature. This phenomenon impacts microclimate change and environmental quality, as seen in Purwakarta Regency. The conversion of vegetated land into industrial and residential areas reduces the vegetation index. This vegetation index can be measured using the Normalized Difference Vegetation Index (NDVI) method. Meanwhile, monitoring the increase in surface temperature can be calculated using the Land Surface Temperature (LST) method, which can indicate physical changes on the Earth's surface. The purpose of this study is to analyze the relationship between vegetation density and the increase in surface temperature using remote sensing and Geographic Information System (GIS) methods. The analysis results show that vegetated land area decreased significantly from 67,564.8 ha (2004) to 44,970 ha (2024), while built-up land increased threefold. In the same period, the average surface temperature increased from 37.31°C to 40.41°C. The correlation analysis shows a strong positive correlation between the decrease in NDVI and the increase in LST, with a correlation coefficient of 0.707 in 2024.</p> Aida Ainulmila, S Tiana, K N Mumtaz, D S F Azhari, F Ibrahim, T S Anggraini Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/528 Mon, 22 Dec 2025 00:00:00 +0000 Forest and Land Fire Severity Analysis in 2022-2023 in Hulu Sungai Selatan Regency Using the NBR (Normalized Burn Ratio) Method https://proceedings.stis.ac.id/icdsos/article/view/626 <p>Forest and land fires are recurring disasters in Indonesia that cause environmental, health, and socio-economic losses. Hulu Sungai Selatan Regency, South Kalimantan, is among the affected regions, particularly during 2022–2023 when the El Niño phenomenon and flammable peatlands increased fire risk. This study analyzes the spatial extent and severity of fires and their potential impact on local communities by integrating remote sensing and demographic data. The Normalized Burn Ratio (NBR) and Difference Normalized Burn Ratio (dNBR) derived from Landsat 8 and 9 imagery (2021–2023) were used to map fire severity, supported by hotspot data from the Ministry of Environment and Forestry and settlement data from the Geospatial Information Agency. Population data from the Central Bureau of Statistics (BPS) were incorporated to develop a Fire Vulnerability Index (FVI) representing community exposure to fire-prone areas. The results show that burned areas in 2023 expanded compared to 2022, with increasing low to moderate severity classes. Subdistricts with dense populations, such as Kandangan and Angkinang, showed higher fire vulnerability values, indicating potential socio environmental risks. These findings emphasize the importance of integrating remote sensing and statistical data to support effective fire mitigation and risk reduction in vulnerable regions.</p> Desti Meirisa Putri, Muhammad Refa, Sheren Siti Salamah, Tania Septi Anggraini, Shafira Himayah Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/626 Mon, 22 Dec 2025 00:00:00 +0000 Detecting Marine Debris Using Sentinel-2 Satellite Images https://proceedings.stis.ac.id/icdsos/article/view/552 <p>Plastic waste pollution in the oceans remains a global problem. Kuta Beach is one of Bali's tourist destinations that has been affected by plastic waste pollution. This is not in line with the 14th SDGs, which is to prevent and reduce marine debris pollution. However, the marine debris monitoring process carried out by the Ministry of Environment and Forestry requires officers to conduct direct monitoring in the field, which incurs higher costs. Therefore, satellite imagery can be an alternative option for more effective and efficient marine debris detection. This study aims to detect marine debris on Kuta Beach using machine learning algorithms, namely Random Forest (RF), XGBoost, and LightGBM. This study uses the Marine Debris Archive (MARIDA) dataset, which has marine debris labels, and Sentinel-2 images of Kuta Beach from 2019–2023. The LightGBM algorithm provided the best performance in detecting marine debris with an F1-score of 95.16%. The area detected as marine debris on Kuta Beach in 2019–2023 was 500 m<sup>2</sup>, 0 m<sup>2</sup>, 100 m<sup>2</sup>, 300 m<sup>2</sup>, and 400 m<sup>2</sup>, respectively. Based on these results, marine debris is generally detected around the coastline, particularly in the southern area of Kuta Beach, which is located near a shopping center.</p> Fadiah Faradinah Nasir, Robert Kurniawan Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/552 Mon, 22 Dec 2025 00:00:00 +0000 Fault Modeling to Determine the Reliability Status of Rotating Machines Using Deep Learning Methods Based on Vibrations from Acoustic Emissions from Cooling Fans https://proceedings.stis.ac.id/icdsos/article/view/569 <p>Modern industrial production acknowledges the increasing significance of maintenance. As of right now, maintenance is seen as a service that aims to maintain the effectiveness of systems and installations while adhering to quality, energy efficiency, and protection standards. An inventive technique to automate rotating machine maintenance procedures has been created in this study. To identify failures and flaws in the motors through their supports, where the fan blades are attached, a technique based on capturing the noises produced by their cooling fans and utilizing deep learning to diagnose problems was investigated. Two operational circumstances were envisioned: the absence of fault and the presence of fault. The machine is correctly powered and running in ideal circumstances when it is not having any issues. In contrast, failures were gradually created purposefully and then documented in order to better understand the faults. Utilizing a pre-trained network (SqueezeNet) built on the ImageNet database, the convolutional neural network (CNN)-based technique was constructed. Applying transfer learning to the spectrograms obtained from the sound emission recordings of our machine's fan in both working modes demonstrated outstanding performance (accuracy = 0.987), confirming the methodology's outstanding quality.</p> FERNAND JOSEPH TOUKAP NONO, DIANORE TOKOUE NGATCHA , Florence OFFOLE, FRANCELIN NDI, Marcelin MOUZONG PEMI Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/569 Mon, 22 Dec 2025 00:00:00 +0000 The Impact of Training-Testing Proportion on Forecasting Accuracy: A Case of Agricultural Export in Indonesia https://proceedings.stis.ac.id/icdsos/article/view/649 <p>Accurate forecasting of agricultural exports is crucial for supporting trade policy and ensuring economic stability in Indonesia. This study investigates the impact of training–testing proportions on the forecasting accuracy of six models: linear regression, decision tree, optimized decision tree, neural network, Auto Regressive Integrated Moving Average (ARIMA), and exponential smoothing. Using Indonesia’s agricultural export data, model performance was evaluated under two data-splitting schemes (80%:20% and 75%:25%) with error metrics including MAE, MSE, RMSE, and MAPE. The results consistently show that statistical time series models outperform regression-based and machine learning approaches. In particular, SES achieved the lowest forecasting errors across all evaluation criteria, with MAPE values as low as 0.93%, followed by ARIMA as the second-best performer. Machine learning models, on the other hand, produced relatively higher error values, suggesting their limited ability to capture temporal dependencies in the data. Importantly, the choice of training–testing proportion did not significantly alter the ranking of model performance, indicating that model selection plays a more critical role than data partitioning. Overall, this study highlights the robustness of exponential smoothing methods as reliable forecasting tools for Indonesia’s agricultural exports and provides evidence-based insights for policymakers in designing effective trade strategies.</p> Tri Wijayanti Septiarini, Made Diyah Putri Martinasari, Eka Pariyanti Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/649 Mon, 22 Dec 2025 00:00:00 +0000 An Intelligent Conversational Agent Using Self-Reflective Retrieval-Augmented Generation for Enhanced Large Language Model Support in National Accounts Learning https://proceedings.stis.ac.id/icdsos/article/view/575 <p>BPS Statistics Indonesia plays a strategic role in compiling balance sheet statistics as the foundation for national policy analysis. This role requires a deep understanding of the concepts, definitions, and compilation standards outlined in the System of National Accounts (SNA) manual. However, in practice, comprehending such complex technical documents is not always straightforward. To address this challenge, this study proposes the development of an intelligent conversational agent in the form of a chatbot that implements the Self-Multimodal RAG approach. This approach integrates self-reflection mechanisms to generate more accurate and relevant responses. The evaluation was conducted using the LLM-as-a-Judge framework across four metrics: answer correctness, answer relevancy, context relevancy, and context faithfulness. Experimental results demonstrate that the Self-Reflective RAG achieved a score of 80% on the answer correctness metric, with competitive performance in terms of relevancy and faithfulness. From the chatbot implementation perspective, black-box testing confirmed that all functionalities operated as expected, while system usability testing using the CSUQ instrument yielded a score of 74.704%, indicating that the chatbot is well-accepted by users.</p> Farhan, Yunofri ., Etjih Tasriah, Lya Hulliyyatus Suadaa, Setia Pramana Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/575 Mon, 22 Dec 2025 00:00:00 +0000 Classification of Urban and Rural Villages with Machine Learning on Satellite Image Data and Points of Interest https://proceedings.stis.ac.id/icdsos/article/view/495 <p>An evaluation of the Sustainable Development Goals with data disaggregated by residential area, namely urban and rural areas, is essential. This study proposes the use of satellite imagery and point of interest (POI) data with machine learning methods to classify urban and rural villages, specifically in North Sumatra Province. The data used includes satellite imagery from various sources, such as NOAA-20, Sentinel-2, Sentinel-5P, and Terra, as well as Google Maps, covering various variables including NTL, NDVI, NDBI, NDWI, NO?, CO, and LST, along with POIs categorized under education, economy, health, and entertainment. The machine learning methods used were Decision Tree and Support Vector Machine, with data imbalance addressed through resampling techniques such as Random Under sampling (RUS). The results of the study show that the Support Vector Machine model with RUS produced the best weighted average F1-score of 87.74% for the classification of urban and rural villages, with NTL being the most important feature in the model formation. This study is expected to be an alternative for BPS in the classification of urban and rural villages.</p> Bony Parulian Josaphat, Alvandi Syukur Rahmat Zega Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/495 Mon, 22 Dec 2025 00:00:00 +0000 GIS-Based Analytical Hierarchy Process Flood Hazard Mapping in Deli Serdang, Indonesia Using Satellite Images https://proceedings.stis.ac.id/icdsos/article/view/680 <p class="Abstract" style="margin-bottom: 28.35pt;">As of the regions with a high frequency and significant impact of flood disasters, Deli Serdang in North Sumatera, Indonesia highly requires spatial-based hazard mapping as a foundation for mitigation efforts. This study aims to map the flood hazard levels by integrating the Analytical Hierarchy Process (AHP) and Geographic Information Systems (GIS). Five parameters were analyzed to construct the model: elevation, slope, rainfall, Normalized Difference Vegetation Index (NDVI), and Normalized Difference Built-up Index (NDBI), with data acquired through the Google Earth Engine platform. The AHP weighting results indicate that rainfall is the most dominant factor (40%) influencing the hazard level. The resulting hazard map identifies a clear spatial pattern with a north-to-south gradation, where 50.17% of the total area falls into the high-hazard category, 47.57% into the moderate category, and the remainder into the low-hazard category. A significant finding reveals that all sub-districts within the study area are classified as either moderate or high hazard, confirming the northern coastal zone as the most critical area. The results of this research can serve as a scientific basis for local government in formulating more adaptive and targeted disaster mitigation policies and spatial planning.</p> Zaidan Hafizhahurrahman, Shafnanda Aulia Kamal Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/680 Mon, 22 Dec 2025 00:00:00 +0000 Development of Village Administrative Data Management System Through PAPEDA (Village Population Administration Development Application) in Pitu Village, North Halmahera Regency https://proceedings.stis.ac.id/icdsos/article/view/504 <p>This study discusses the utilization of technology in managing village administrative data, improving public service systems, and providing base data for local government decisionmaking. Using qualitative methods for data collection and the SDLC Waterfall Model for system development, this research analyzes the benefits of PAPEDA (Aplikasi Pembangunan Administrasi Kependudukan Desa), an output of the Desa CANTIK program, on village administrative data management and public services. Based on the evaluation results using Black Box Testing and User Satisfaction Surveys, this study shows that technology utilization in villages positively impacts the community. The use of PAPEDA not only makes it easier for village officials to manage village administrative data but also accelerates the public service process in the village. Residents can access various administrative services online, anytime, and anywhere. Additionally, village monographs and stunting monitoring enable local governments to use them as a basis for development. However, uneven internet connectivity hinders technology utilization, emphasizing the need for local governments to improve internet infrastructure.</p> R A D Ikram, A M Kahar, Gusrizal . Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/504 Mon, 22 Dec 2025 00:00:00 +0000 AI-Driven Transformation in the Textile Industry: A Bibliometric Analysis and Scoping Review https://proceedings.stis.ac.id/icdsos/article/view/516 <p>Artificial Intelligence (AI) is rapidly reshaping the global textile industry, driving efficiency, precision, and sustainability across its value chain. Yet despite growing enthusiasm, the integration of AI remains fragmented, with limited statistical understanding of where, how, and why these technologies take root. This study addresses that gap by combining bibliometric network analysis and systematic scoping review to map and statistically interpret two decades (2003–2023) of research on AI applications in textiles. Using association strength normalization, VOS modularity clustering, and thematic centrality density mapping, we identified eight manufacturing clusters ranging from fabric defect detection and supply chain optimization to textile waste management and sustainability that structure the field. The novelty of this work lies in repositioning bibliometric analysis as a statistical instrument, not merely a descriptive tool. Keyword co-occurrence networks and citation trajectories are translated into evidence-based research agendas, connecting cluster signals to methodological pathways such as regression modeling, support vector machines, neural networks, and hybrid ML-statistical frameworks. This statistical logic is used to surface gaps. Particularly in empirical validation, predictive modeling, and cross-cluster integration and to chart future directions for data-driven textile innovation. By grounding future agendas in measurable statistical patterns rather than narrative interpretation alone, this study offers a rigorous analytical framework that links research structure to methodological opportunity. The resulting roadmap invites scholars and practitioners to bridge AI, textile engineering, and applied statistics, shifting the field from fragmented experimentation toward coherent, evidence-based innovation.</p> Fajar Pitarsi Dharma, Moses Laksono Singgih, Dedy Dwi Prastyo Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/516 Mon, 22 Dec 2025 00:00:00 +0000 Optimized Feature Engineering for Transaction Fraud Detection Using Sequential and HMM-Based Features https://proceedings.stis.ac.id/icdsos/article/view/529 <p>Fraud detection in financial transactions remains a major challenge because fraudulent activities are extremely rare—often described as finding a “needle in a haystack”— and must be detected in real time. This study presents a hybrid feature engineering framework that integrates lightweight sequential indicators with Hidden Markov Model (HMM)-based behavioural features to improve accuracy and interpretability. Using the PaySim dataset containing 2.77 million transactions (0.2965% fraud), we extracted 22 sequential and 14 HMMbased features, from which 28 highly discriminative variables were retained. To address class imbalance, a batch-wise SMOTETomek approach was applied, expanding 1.94 million clean samples to 3.86 million balanced samples. Experimental results show that HMM-based features alone yield moderate performance (ROC AUC = 0.778, F2 = 0.051), but the combined ensemble of tuned XGBoost and LightGBM achieves superior accuracy (ROC AUC = 0.9983, F2 = 0.8431, MCC = 0.827). SHAP analysis identifies HMM-derived entropy and state likelihoods, together with transaction amount dynamics, as key predictors. The results demonstrate that optimized feature engineering plays a crucial role in achieving accurate, scalable, and interpretable fraud detection.</p> Kaung Wai Thar, Thinn Thinn Wai Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/529 Mon, 22 Dec 2025 00:00:00 +0000 Business Description Categorization to the Five-Digit Indonesian Standard Classification of Business Field (KBLI) Using Machine Learning and Transfer Learning https://proceedings.stis.ac.id/icdsos/article/view/719 <div>The Indonesian Standard Classification of Business Fields (KBLI) is essential for economic statistics, yet manual classification of business descriptions to five-digit KBLI codes is time-consuming and prone to inconsistencies. This study aims to develop and compare machine learning (Support Vector Machine and Random Forest) and transfer learning </div> <div>(IndoBERT) models for automating KBLI classification, supported by the preparation of synthetic and real-world datasets for model training. The synthetic data were generated using large language models, validated through human majority voting and complemented with realworld data from the National Labor Force Survey (Sakernas) and the Micro and Small Industry Survey (IMK). The findings indicate that Fine-tuned IndoBERT achieved superior performance, achieving an F1-score of 92.99% and an accuracy of 93.40% on synthetic data, alongside top-1, top-5, and top-10 accuracies of 32.93%, 54.71%, and 63.24% on real-world data. The deployment of fine-tuned IndoBERT as a RESTful API demonstrates its scalability and efficiency, presenting a reliable solution for large-scale KBLI classification in official statistics. </div> Muh. Alfian Amnur, La Ode Muhammad Gazali, Amir Mumtaz Siregar, Faruq Ariya Jalaksana, Made Nisa Rahayu Ananda Suwendra, Nurul Fadila Utami, Alif Median Ramadhan, Elisse Krisela Fabrianne, Eurorea Wirata Raja Panjaitan, Fitri Aini Izzati, Jernita Bintang Yuliani Manalu, Muhammad Gilang Hidayat, Lya Hulliyyatus Suadaa, Budi Yuniarto, Setia Pramana Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/719 Mon, 22 Dec 2025 00:00:00 +0000 Investigating the Profile of Digital Readiness and Sustainability Development: An Explainable Clustering https://proceedings.stis.ac.id/icdsos/article/view/545 <p>The level of digital readiness within Islamic Higher Education Institutions (IHEIs) has emerged as a critical concern, drawing increasing scholarly and institutional attention over the past five years. This study aims to examine the empirical relationship between two key dimensions: digital readiness, as reflected by the National Readiness Index (NRI), and progress toward the Sustainable Development Goals (SDGs). Data were collected from more than 20 IHEIs between 2023 and 2024 to support a sequential analytical approach. Pearson’s correlation coefficient was employed to identify associations between NRI-based digital readiness and SDG performance within the IHEI context. Subsequently, cluster analysis was conducted using the Duda–Hart Index, while the Pseudo T² statistic was applied to validate the robustness of the clustering outcomes. A cartographic visualization was also generated to illustrate variations across readiness and sustainability clusters. The results indicate a considerable disparity between digital readiness and sustainability among IHEIs. Only a limited number of institutions demonstrate consistent performance in both areas, suggesting that effective leadership and strategic investment in digital infrastructure are essential prerequisites for achieving sustainable institutional transformation.</p> Agus Pamuji, Aries Susanty, Budi Warsito Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/545 Mon, 22 Dec 2025 00:00:00 +0000 The Digital Footprint of Public Attention: Forecasting Indonesian Gold Prices using Google Trends Index and Optimized Support Vector Regression https://proceedings.stis.ac.id/icdsos/article/view/730 <p>To provide actionable forecasting insights for gold prices in Indonesia’s public sentiment-driven market, this study developed a machine learning framework using the Google Trends Index (GTI) as a sentiment proxy. We employed an Optuna-optimized Support Vector Regression (SVR) model to comparatively evaluate three feature sets (GTI, historical Lag, and a Mix) across seven forecasting horizons (t+1 to t+30). A key advantage of our approach was the identification of horizon-dependent predictor dynamics: results revealed that while historical data excelled for short-term forecasts (MAPE 0.50% at t+5), the contribution of GTI became vital for long-term accuracy, where the hybrid model achieved its peak performance (MAPE 1.92% at t+30). Notably, the GTI-only model showed solid standalone potential (MAPE &lt; 20%). We conclude that a hybrid approach is most effective, validating GTI as a relevant predictor for Indonesia. Furthermore, the proposed SVR-Optuna framework offers a generalizable methodology for forecasting other sentiment-driven assets, providing a clear, actionable guide for model selection based on forecasting horizons.</p> Muhammad Restu Ilahi, Arie Wahyu Wijayanto Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/730 Mon, 22 Dec 2025 00:00:00 +0000 Unsupervised YouTube Video Segmentation of “Bendera One Piece” Content Using Medoid-Based Clustering with Statistical Significance Testing https://proceedings.stis.ac.id/icdsos/article/view/639 <p>The curse of dimensionality and sparsity are well-documented phenomena in applied statistics where the data’s dimensionality (number of features) far outnumbers the observations. This work aims to present an integrated applied statistics framework to distill semantic structure from high-dimensional data by combining pre-processing, dimensionality reduction via principal component analysis, medoid-based clustering (partitioning around medoids and simple k medoids), and a modified Statistical Significance Clustering (SigClust) test for validation and inference in the context of viral media. In this case study, we demonstrate an approach that segments and interprets YouTube videos from the lens of the Indonesian viral media “Bendera One Piece” through its user commentary. The PCA-based dimensionality reduction helped resolve the curse of dimensionality, where the first principal component alone explained 80% of the variance in text-based features and captured a dominant socio-political pattern. Internal validation and the SigClust test agreed on the presence of a statistically significant three-cluster solution that could be labelled as the audiences of “Pop-Culture Enthusiasts”, “Cautious Observers”, and “Political Protesters”. The study presents the utility of integrating established statistical methods with a modified validation step for high-dimensional text data analysis and pattern recognition.</p> Weksi Budiaji, Patricia Kumenap, M Fabian Delano, Ferdian Wijaya, Rifqi Riyanto Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/639 Mon, 22 Dec 2025 00:00:00 +0000 The Digital Frontline: A Thematic Analysis of User Grievances and Satisfaction Drivers for Indonesian Public Service Apps https://proceedings.stis.ac.id/icdsos/article/view/738 <div>This research assesses Indonesia's digital public service ecosystem by analyzing 50 mobile applications from a wide range of state agencies. Using a computational content analysis of metadata and user reviews from the Google Play Store, this study presents a dual-faceted evaluation. First, a thematic analysis of negative reviews (1-2 stars) reveals that user grievances are overwhelmingly dominated by foundational issues, such as login/access problems, slow performance, and technical glitches, rather than a lack of advanced features. Second, a corresponding analysis of positive reviews (5 stars) identifies that user satisfaction is primarily driven by high-quality features, ease of use, and overall application reliability. Quantitative findings show significant performance disparities across institutional categories, with Ministrydeveloped apps receiving the lowest average user satisfaction. An Importance-Performance Quadrant Analysis further uncovers a critical paradox: many high-download, mandatory apps suffer from low user ratings, indicating a clear disconnect between enforced adoption and usercentric quality. The research concludes that enhancing digital public services requires a strategic shift from feature proliferation to foundational reliability. Ensuring robust core functionalities is paramount to building citizen trust and achieving a successful digital transformation.</div> Ferdian Bangkit Wijaya, Weksi Budiaji, Rafly Priyantama Ramadhan Bagaskara, Zilda Ainun Tazkia, Dinda Dwi Anugrah Pertiwi Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/738 Mon, 22 Dec 2025 00:00:00 +0000 Two-Stage RFM and Macroeconomics Interaction Model for Accurate CLV Prediction in Direct Sales https://proceedings.stis.ac.id/icdsos/article/view/642 <p>This study introduces a two-stage predictive model integrating Recency, Frequency, Monetary (RFM) metrics with macroeconomic indicators to estimate Customer Lifetime Value (CLV) in direct sales, addressing dynamic customer behavior in volatile markets. Data from the Halalmart Sales Integrated System (January 2023–July 2025, 29,893 transactions, ~431 unique customers monthly) were combined with Indonesian macroeconomic indicators (Consumer Confidence Index, Consumer Expectation Index) from Bank Indonesia and inflation data from the Central Bureau of Statistics (BPS). The first stage uses CatBoost classification, achieving 89.3% accuracy to identify active customers, followed by an ensemble regression (CatBoost, XGBoost, LightGBM, Ridge, RandomForest), yielding an R<sup>2</sup> of 0.894 for CLV prediction. RFM features contribute 40.3% to classification and 16.2% to regression variance, while macroeconomic interactions dominate, contributing 59.7% and 83.8%, respectively. A key interaction, Monetary and Consumer Confidence Index, shows a 0.773 correlation with CLV. SHAP analysis enhances model interpretability. Despite a skewed dataset with approximately 65% zero CLV, the model supports targeted marketing strategies, offering valuable insights for strategic decision-making in direct sales environments</p> Unung Istopo Hartanto, I Gusti Putu Asto Buditjahjanto, Wiyli Yustanti Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/642 Mon, 22 Dec 2025 00:00:00 +0000 Water Quality Measurement in Illegal Gold Mining Areas Using Sentinel-2A MSI Satellite Images of the Batanghari River, Tebo Tengah District https://proceedings.stis.ac.id/icdsos/article/view/570 <p>Water quality in Indonesian rivers has declined due to pollution from solid and liquid waste from industrial and domestic sources. The Batanghari River, the longest river on the island of Sumatra, faces various environmental problems, including pollution from illegal mining activities. Artisanal and small-scale gold mining (ASGM) contributes to mercury release, contaminating water and soil and posing health risks to communities. Conventional monitoring methods have limitations in coverage and efficiency. Therefore, this study utilizes Sentinel-2A MSI satellite imagery to assess and map water quality conditions around illegal gold mining areas along the Batanghari River in Tebo Tengah District. The developed model uses K- Means, Fuzzy C-Means (FCM), Principal Component Analysis (PCA), and Weighted Arithmetic Water Quality Index (WAWQI) to extract water quality features. The findings indicate that WAWQI provides a more representative quantitative assessment, revealing that areas near illegal gold mining sites in Batanghari river exhibit moderately to heavily polluted water quality. This approach is expected to support water quality monitoring and assist policymakers in managing water resources and the environment.</p> Baginda Sinaga, Robert Kurniawan Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/570 Mon, 22 Dec 2025 00:00:00 +0000 Clustering of Junior High School Education in West Java Based on Density and Dropout Ratios Using Quartile and KMeans Methods https://proceedings.stis.ac.id/icdsos/article/view/662 <div>Education disparities across regions often reflect differences in school density, teacher availability, and student dropout rates. This study aims to classifies junior high school education in West Java into more homogeneous groups to better understand these disparities. Two clustering approaches were applied: quartile grouping and the K-Means algorithm. Quartile grouping provided a simple categorization of each indicator into four levels (very high, high, low, very low), while K-Means offers a more flexible and data-driven segmentation. K-Means algorithm produced three distinct clusters: (1) Balanced and Stable regions with proportional ratios and low dropout rates, (2) High-Density but Stable regions concentrated in urban and periurban areas with high student-teacher and student-school ratios but controlled dropout levels, and (3) Elevated Dropout Risk regions, mostly in rural and southern areas, with lower density but higher dropout rates. The comparison shows that quartile grouping is easy to interpret for individual indicators, while K-Means provides more comprehensive insights into multidimensional patterns. This research highlights the potential of clustering methods to guide policymakers in designing differentiated strategies, from infrastructure expansion in dense regions to social support programs in dropout-prone areas. </div> Eva Nurkhofifah, Dwilaras Athina, Arna Ristiyanti Tarida, Friska Amelia Pratiwi Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/662 Mon, 22 Dec 2025 00:00:00 +0000 Forecasting Composite Stock Price Index on Indonesia Stock Exchange Using Extreme Learning Machine https://proceedings.stis.ac.id/icdsos/article/view/496 <p>Technological advances have driven active participation in digital economic activities, including capital market investment. Stocks remain a dominant instrument, with the Composite Stock Price Index or Indeks Harga Saham Gabungan (IHSG) serving as a primary benchmark for investment decisions in Indonesia. However, its high volatility—driven by economic, political, global, and market sentiment factors—demands accurate forecasting methods. Traditional approaches such as ARIMA and linear regression are limited in capturing the non-linear and complex patterns of stock market data. This study proposes the use of the Extreme Learning Machine (ELM), an artificial intelligence method considered more adaptive to market dynamics. To enhance prediction accuracy, hyperparameter optimization was performed using the grid search method. The research forecasts IHSG performance by incorporating exogenous variables, namely gold prices, the US dollar to rupiah exchange rate, and a COVID-19 dummy variable. The optimal model utilized a hidden layer configuration of nine neurons. Evaluation results indicate that the ELM models effectively perform multi horizon forecasting (t+1 to t+5), as evidenced by low MAE, MAPE, and RMSE values across horizons. The five-day IHSG forecasts are 7,242.28, 7,228.42, 7,211.02, 7,192.67, and 7,174.06, demonstrating the model’s potential in supporting investment decision-making with high accuracy.</p> Bony Parulian Josaphat, Dhevri Leonardo Hutajulu Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/496 Mon, 22 Dec 2025 00:00:00 +0000 The Application of Retrieval-Augmented Generation (RAG) in Developing an Intelligent Risk Management Platform: A Case Study at Statistics Jawa Timur https://proceedings.stis.ac.id/icdsos/article/view/591 <div>Risk management is a crucial element in the governance of modern organizations, especially for public institutions such as Statistics Indonesia (BPS), which is responsible for providing official state statistics. Currently, the conventional methodology at Statistics Jawa Timur remains manual, relying on spreadsheet software, which results in slow and unresponsive processes for addressing dynamic risks. This condition reduces the effectiveness of internal controls, particularly with a massive strategic agenda like the 2026 Economic Census (SE2026) approaching. To address these limitations, this research proposes the development of Kadiri-A Risk Management Information System and Worksheet, an intelligent system that integrates Artificial Intelligence (AI) technology, specifically Large Language Models using the RetrievalAugmented Generation (RAG) method. The Kadiri system is designed to transform risk management from a reactive to an initiative-taking process, accelerating the identification, analysis, and mitigation recommendations by leveraging BPS internal knowledge base. The RAG methodology enables an AI model, such as Google Gemini, to provide contextual and relevant suggestions based on the organization's historical data. The outcome of this development is a digital platform that speeds up risk analysis, enhances accountability, and aligns with the bureaucracy reform agenda.</div> I Putu Agus Wahyu Dupayana, Eko Hardiyanto Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/591 Mon, 22 Dec 2025 00:00:00 +0000 Impact of Land Use Changes Due to Tourism on Ecosystem Services Using InVEST https://proceedings.stis.ac.id/icdsos/article/view/607 <p>Ecosystem services play a vital role in supporting human life and environmental sustainability. However, tourism activities in Badung Regency, Bali, have led to significant changes in land cover and use, impacting the function of ecosystem services. This study integrates remote sensing, machine learning, and InVEST technology to understand the impact of Land Use/Land Cover (LULC) changes on ecosystem services in Badung Regency. The results show a decrease in non agricultural vegetation area from 17659.65 hectares in 2014 to 11405.84 hectares in 2024. Meanwhile, built-up land experienced a drastic increase from 15074.47 hectares in 2014 to 22134.06 hectares in 2024. In addition, the InVEST model shows a decrease in carbon stock by 1379,841.68 tons in the period 2014 to 2024. Meanwhile, water yield, nitrogen export, and sediment export increased, reflecting a relationship between tourism development and the decline in ecosystem services. Correlation analysis shows a consistent negative correlation between water yield and carbon stock, as well as a positive correlation between nitrogen export and sediment export. The results of this study are expected to serve as a reference for further studies on the dynamics of ecosystem services and support sustainable environmental management efforts in areas with rapidly growing tourism activity.</p> Atanasius Alfandi, Yuliagnis Transver Wijaya Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/607 Mon, 22 Dec 2025 00:00:00 +0000 Machine Learning Framework for Early Detection of Mental Health Conditions from Textual Data https://proceedings.stis.ac.id/icdsos/article/view/613 <p>Mental health disorders significantly affect global populations, placing heavy burdens on healthcare systems worldwide. Traditional diagnostic methods, mainly clinical assessments and self-reports, lack real-time monitoring, are prone to biases, and often result in delayed interventions. Recent advancements in machine learning (ML) offer promising opportunities to enhance mental health detection through behavioural and physiological data analysis. This study evaluates four widely used machine learning algorithms—Support Vector Machines (SVM), Logistic Regression, Naïve Bayes, and Random Forests—in identifying early indicators of mental health conditions from textual data. A dataset of 27,978 textual records from the “Analysis and Modelling on Mental Health Corpus” was analysed. Data preprocessing involved normalization, stop word removal, lemmatization, and TF–IDF vectorization to prepare robust features for model training. Model performance was assessed using accuracy, precision, recall, and F1-score metrics. Results showed that SVM and Logistic Regression outperformed other models, achieving accuracy rates of 92% and 91%. These findings demonstrate the potential of ML-based frameworks to support earlier and more accurate mental health interventions. Integrating such techniques into clinical practice can improve diagnostic accuracy, reduce healthcare workload, and enhance patient outcomes.</p> Basheer Riskhan, Abdullah Al Hadi, S M Asiful Islam Saky, Md Saiful Arefin, Khalid Hussain Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/613 Mon, 22 Dec 2025 00:00:00 +0000 Equipment Borrowing and Room Booking Information System at the Politeknik Statistika STIS https://proceedings.stis.ac.id/icdsos/article/view/624 <p>The management of goods and space lending services at the Politeknik Statistika STIS is currently still done manually, resulting in various operational constraints such as limited access to information, inefficient processes, and potential errors in recording. This impacts the quality of service and the effectiveness of campus asset utilization. This study aims to design and build a website-based goods and space lending information system to address these issues. The system developers aimed to provide users with access to information on goods and space availability, simplify the loan application process, and improve the accuracy of inventory data. The system was developed using the SDLC method with a prototyping approach, while The researchers carried out the evaluation process using Black Box Testing and a PSSUQ survey survey to measure ease of use and user satisfaction. The developers successfully built the system and confirmed through Black Box Testing that all features operate correctly, and the PSSUQ evaluation shows an average score of 1.69, indicating that this system is well received and provides a high level of satisfaction for users.</p> Setya Hadi Nugroho, Waris Marsisno Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/624 Mon, 22 Dec 2025 00:00:00 +0000 Comparative Study of Autoencoder and LSTM-AE for Extreme Temperature Anomaly Detection in Semarang https://proceedings.stis.ac.id/icdsos/article/view/549 <p>Climate change has increased the frequency and intensity of extreme weather events, including heatwaves and cold spells, posing critical risks to public health and urban infrastructure. This study proposes and compares two deep learning frameworks based on Autoencoders, namely the Long Short-Term Memory Autoencoder (LSTM-AE) and the standard Autoencoder (AE), for detecting extreme temperature anomalies using historical daily data from 2005 to 2025 in Semarang City. Unlike conventional anomaly detection methods, the LSTM-AE introduces temporal learning through recurrent memory cells, enabling it to capture sequential temperature dependencies that static AE models cannot. Both models are trained to reconstruct “normal” temperature patterns, with anomalies identified when reconstruction errors exceed the 95th percentile threshold. The results demonstrate that the LSTM-AE more consistently identifies significant heatwave and cold spell events, with seasonal alarm rates that closely align with local climatic transitions. Several detected peaks coincide with historically documented events such as the 2015–2019 El Niño and 2019–2020 transition periods reported by BMKG, confirming climatological relevance. In contrast, the standard AE detects a higher number of anomalies (726 vs 366 from the LSTM AE) but tends to generate false alarms outside transitional periods. Model performance is evaluated using reconstruction error distributions, Jaccard similarity indices, and monthly alarm rates. This study highlights the potential of LSTM-based architectures for improving anomaly detection in climate data and contributes to developing data-driven strategies for urban climate resilience in tropical regions.</p> Galih Kusuma Wijaya, Aliyya Anggraeni, Tsalisa Chulaili Sahri Nova, Muhammad Alifian yusuf, Iqbal Kharisudin Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/549 Mon, 22 Dec 2025 00:00:00 +0000 Spatio-Temporal Modeling of Agricultural Drought in Indramayu Using the NDDI Index (2015-2024) https://proceedings.stis.ac.id/icdsos/article/view/635 <div>This study examines the spatio-temporal patterns of agricultural drought in Indramayu Regency, Indonesia, using the Normalized Difference Drought Index (NDDI) derived from Landsat imagery between 2015 and 2024. The analysis employed spatial autocorrelation techniques, including Global Moran’s I and Local Indicators of Spatial Association (LISA), to identify spatial clustering and persistence of drought conditions. The results show consistent spatial vulnerability, with the southern region forming stable High-High drought clusters across multiple years, while the northern region remains dominated by LowLow clusters. These findings indicate that drought distribution in Indramayu demonstrates strong spatial persistence and temporal continuity, reflecting long-term environmental and landuse characteristics. A supporting correlation analysis between NDDI and rice productivity (? = 0.164; p-value = 0.651) revealed no significant relationship, suggesting that effective irrigation systems have mitigated the impact of meteorological drought on agricultural output. Overall, the study highlights the need for location-specific drought management in spatially vulnerable southern areas to enhance agricultural resilience and regional food security.</div> Sypa Septiani, Irene, Hilya, Dela Oktaviani, Fifin Trisulistiani, Salwa Alifia, Tiara Handayani, Siti Zahrotunnisa Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/635 Mon, 22 Dec 2025 00:00:00 +0000 Evaluating the Impact of Ibu Kota Nusantara (IKN) Development on Land Cover Using Machine Learning-Based Sentinel-2A Satellite Image Classification https://proceedings.stis.ac.id/icdsos/article/view/431 <p>The development of Ibu Kota Nusantara (IKN) in East Kalimantan as Indonesia's new capital city has the potential to cause significant changes to land cover patterns, especially in tropical rainforest areas. This study aims to evaluate the impact of IKN development on land cover using Sentinel-2A satellite image data and a machine learning approach. The study area is focused on the IKN Core Urban Area by comparing land cover conditions in 2022 before development and 2024 after development. Three classification methods were used including Random Forest, Support Vector Machines, and Classification and Regression Trees. The results showed that the RF model had the best accuracy with an overall accuracy value above 93% in both time periods. Spatial analysis showed a decrease in vegetation area and an increase in open land as an indication of intensive land clearing activities. These findings emphasize the importance of continuous land cover monitoring to support IKN's vision as a green city and achieve sustainable development targets (SDGs 11 and 15). This research is expected to serve as a reference for the formulation of adaptive and environmentally friendly spatial policies.</p> Wisnu Aimariyadi, Adinda Batrisybazla, Vanessa Ruth Evelyn Tobing, Robert Kurniawan Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/431 Mon, 22 Dec 2025 00:00:00 +0000 Real-Time Vibration Fault Detection in Rotating Machines Using Transformers to Minimize Production Losses in Industry 5.0: VIBT https://proceedings.stis.ac.id/icdsos/article/view/566 <p>Quickly identifying anomalies in rotating machinery is crucial for safety and profitability in contemporary industry (Industry 5.0). Unidentified failures can cause costly malfunctions and production interruptions. This research proposes an innovative strategy based on Transformer for the analysis of multidimensional vibration events (VIBT), with a view to early and accurate detection of anomalies in rotating machinery. The goal is to minimize production interruptions in Industry 5.0. The study highlights the limitations of conventional vibration analysis approaches and traditional deep learning techniques, emphasizing the need for innovative solutions. VIBT incorporates transformers and a filter bank convolution (FBC) module for effective denoising, as well as an adaptive wavelet transformation (WTA) mechanism for dynamic feature fusion at various scales, thereby addressing the challenges posed by non-stationary and noisy signals. Extensive testing on the Mafaulda dataset reveals that VIBT achieves 98.1% precision and 98.8% accuracy, significantly outperforming existing standard models. The results suggest that VIBT not only improves fault detection capabilities but also optimizes maintenance strategies in industrial applications, paving the way for future research on semi-supervised learning based on transformers and the integration of intermodal data.</p> FERNAND JOSEPH TOUKAP NONO, DIANORE TOKOUE NGATCHA , Florence OFFOLE, Steyve Nyatte, Marcelin MOUZONG PEMI Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/566 Mon, 22 Dec 2025 00:00:00 +0000 Satellite-Based Detection of Floating Plastic Debris in Jakarta Bay (2021–2024) https://proceedings.stis.ac.id/icdsos/article/view/573 <p data-start="237" data-end="746">Plastic waste is a critical environmental issue in Jakarta Bay, causing ecosystem degradation and challenging coastal management. This study analyzes seasonal dynamics and spatial impacts of floating plastic debris using Sentinel-2 imagery from July 2021 to November 2024. The Floating Debris Index (FDI) and Normalized Difference Vegetation Index (NDVI) were applied, with optimum thresholds determined through ROC curve analysis. Monthly median composites were processed to minimize atmospheric noise. The results show a recurring seasonal pattern, with debris consistently peaking in June, likely influenced by monsoon driven runoff and human activities. A clear increasing trend from 2021 to 2023 was followed by a decline in 2024, coinciding with the implementation of the National Ocean Love Month program. Buffer analysis indicated that most debris accumulates within 500 m of the shoreline, particularly near river mouths, ports, and settlements, while Thiessen Polygon analysis revealed hotspots concentrated along the eastern and western coasts. These findings highlight that floating plastic debris in Jakarta Bay is strongly shaped by seasonal cycles and land-based inputs, providing critical insights for designing targeted, evidence-based waste management policies.</p> Marchadha Santi Wilda, Ernawati Pasaribu Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/573 Mon, 22 Dec 2025 00:00:00 +0000 Spatial Analysis of Food Security Index and Its Factor to Support Program Priority Area in Central Java, Indonesia https://proceedings.stis.ac.id/icdsos/article/view/652 <p>Food security Index (FSI) is a global issue influenced by ecological and socio-economic factors. Food security is a condition in which humans can meet their food needs. Therefore, it is necessary to identify the conditions of food security and the factors that can influence it as a first step in overcoming food insecurity. The study area of this research is Central Java. This study uses spatial autocorrelation method. This method can determine patterns or correlations between study locations using Moran’s I and LISA. This method also provides information related to the relationship between poverty distribution characteristics between locations in Central Java. This study also analyzes the Food Security Index (FSI) in Central Java Province by integrating drought parameters (Normalized Difference Drought Index), poverty levels, food expenditure, and open unemployment rates. The results of the analysis show a correlation between ecological conditions and FSI achievements. These results confirm that the FSI level in the study area does not only depend on natural resources but is also influenced by socioeconomic factors. Thus, the results of this analysis may be beneficial as recommendations for policymakers through a spatial-based approach to provide strategies for improving food security, especially in Central Java.</p> Saskia Syafinda Fyndiani, Hanung Putri Titisari, Muhammad Fadhiil Al-Ghifaary, Tiara Handayani, Achmad Fadhilah Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/652 Mon, 22 Dec 2025 00:00:00 +0000 Predictive Insights: Unmasking Breast Cancer Biomarkers through machine learning and Systems Biology https://proceedings.stis.ac.id/icdsos/article/view/493 <p>Breast cancer is a complex and heterogeneous disease in nature with quite high rates<br />of metastasis and recurrence that cause significant morbidity and mortality. Despite the<br />improved treatment options with new medical therapies, a proper understanding of the molecular mechanism in breast cancer development and its progression is of utmost necessity. Hence, we conducted a comprehensive analysis on transcriptomic profiling combined with SHAP feature importance calculation in an attempt to find potential molecular targets. Among the 9 machine learning models generated, random forest model displayed an accuracy value of 0.96 for breast cancer prediction. KRT17, KRT5 and FABP5 were the commonly resulted prognostic biomarkers during the DGE and feature selection approaches. Furthermore, gene enrichment and functional annotations of key genes reveals the importance of these key genes in breast cancer progression. The survival analysis confirms the risk associate with key genes in breast cancer patients. Therefore, this finding show the effectiveness of machine learning combine with DGE in Biomarkers discovery and experimental validation of these genes would be a promising approach to eliminate the clinical complications during the breast cancer treatment.</p> A A Zainulabidin, A J Sufyan, M K Thirunavukkarasu Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/493 Mon, 22 Dec 2025 00:00:00 +0000 Detection and Mapping of Invasive Alien Plant Water Hyacinth using Satellite Imagery and Machine Learning (Case Study: Rawa Pening Lake, Indonesia) https://proceedings.stis.ac.id/icdsos/article/view/580 <p>Rawa Pening Lake, one of the 15 national priority lakes in Indonesia, faces a significant threat from invasive water hyacinth (Eichhornia crassipes). This plant once covered up to 70% of the lake's surface and continued to cause ecological and socio-economic impacts as of 2024, necessitating periodic monitoring to prevent future blooms. This study aimed to identify the optimal features to characterize water hyacinth, determine the most effective classification model, and map the plant’s distribution. Adopting the CRISP-DM framework, the study utilized Sentinel-1 (radar) and Sentinel-2 (optical) satellite imagery with multispectral band features, radar bands, and composite indexes. Feature selection was performed using Jenks Natural Breaks, and classification modeling was conducted using Random Forest and Convolutional Neural Network (CNN). The results demonstrated that the CNN achieved higher accuracy in distinguishing among land cover classes. The final mapping identified water hyacinth covering 34,775 pixels, 32,627 pixels, and 34,175 pixels in June, July, and August, respectively. This approach offers a reliable method for periodic monitoring of water hyacinths in Rawa Pening Lake.</p> Adib Sulthon Muammal, waris marsisno Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/580 Mon, 22 Dec 2025 00:00:00 +0000 Extracting Information on Aspects of Sustainable Tourism in ASEAN Using Named Entity Recognition (NER) https://proceedings.stis.ac.id/icdsos/article/view/601 <p>Sustainable tourism is an important issue in the ASEAN region, which has experienced rapid growth in the tourism sector but faces challenges in maintaining a balance between economic, social, and environmental aspects. Information on sustainability practices is scattered across various forms of text, making it difficult to analyze manually. This study aims to extract information on aspects of sustainability in tourism using a transformer-based Named Entity Recognition (NER) approach. Three data sources were used: government websites, online news, and travel reviews on TripAdvisor. Five transformer models were compared, namely BERT, ALBERT, DistilBERT, ELECTRA, and RoBERTa, to evaluate entity extraction performance. The dataset was divided using an 80:10:10 ratio for training, validation, and testing. The results showed that DistilBERT provided the best performance with a balance of accuracy and computational efficiency. In addition, an analysis of the distribution of sustainability aspects in ASEAN countries and Indonesia in particular was conducted to identify practices that have already been implemented. These findings are expected to contribute to the development of more sustainable tourism policies and practices in the ASEAN region and Indonesia.</p> Sisilia Manalu, Yuliagnis Transver Wijaya Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/601 Mon, 22 Dec 2025 00:00:00 +0000 Job Competency Extraction in Information and Technology Sector Using K-Means and Non-Negative Matrix Factorization (NMF) Algorithms https://proceedings.stis.ac.id/icdsos/article/view/684 <div>The advancement of information technology has led to a surge in online job vacancy data, which contains valuable information about the skill demands in the digital labor market. This study aims to extract job competency in the information and technology sector using a combination of KMeans clustering and Non-Negative Matrix Factorization (NMF). A total of 350 job postings were collected from the Kalibrr platform and processed through web scraping, text preprocessing, and feature representation using TF-IDF. The clustering results indicate that the optimal configuration consists of 10 clusters, as evaluated using the Silhouette Score and Davies-Bouldin Index. Each cluster represents a specific job topic, such as backend development, data science, QA automation, cybersecurity, and digital marketing. The results offer a structured overview of digital skill demands and can be utilized by educational institutions, training providers, and labor policy makers. However, the dataset’s limited size, reliance on a single job platform, and the use of traditional machine learning techniques may not capture all semantic variations and complexities present in the broader</div> <div>job market. Consequently, future work should involve larger and more diverse datasets as well as advanced deep learning text representation approaches to enhance the robustness and generalizability of the results. </div> Alfitra Rifa Geandra, Amir Mumtaz Siregar, Rani Nooraeni Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/684 Mon, 22 Dec 2025 00:00:00 +0000 Enhanced EV Battery Degradation Modeling in Tropical Environments via CVAE-GRU for Sustainable Transportation https://proceedings.stis.ac.id/icdsos/article/view/610 <div>Electric Vehicle (EV) battery degradation in tropical environments remains poorly understood, with traditional linear models like OLS facing significant challenges such as multicollinearity, leading to unreliable insights into influential factors. This study aims to experimentally characterize lithium-ion battery degradation and comprehensively evaluate the influence of local climatic (temperature, humidity, dust) and driving conditions (road quality, mileage) in a Cameroonian tropical context, addressing the limitations of conventional statistical approaches. Our unique contribution involves providing empirical real-world data from a subSaharan environment and applying a novel hybrid CVAE-GRU methodology to capture complex non-linear and temporal dependencies. An embedded system continuously collected battery parameters (SoH, internal resistance) alongside environmental and driving data. The CVAE learns robust latent representations from these correlated inputs, while the GRU models their temporal dynamics for degradation prediction. Results confirm progressive SoH degradation, significantly accelerated by high temperatures, humidity, dust, and poor road quality. The CVAE-GRU approach effectively mitigates multicollinearity, offering superior accuracy and deeper insights into these influences. This work highlights the critical impact of tropical conditions on EV battery aging, providing crucial findings for developing adapted Battery Management Systems and fostering sustainable mobility in similar regions.</div> Hervé LOTCHOUANG FUSTE, Kibong Marius, Nyatte Steyve, Sapnken Emmanuel, Mewoli Edwige, Tamba Gaston Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/610 Mon, 22 Dec 2025 00:00:00 +0000 Analysis of the Effectiveness of Iterative Prompts in the Integration of Classification and Summarization of User Reports Based on NLP https://proceedings.stis.ac.id/icdsos/article/view/510 <p>User reports submitted through feedback features or ticketing systems provide valuable insights for improving mobile applications. However, the high volume of reports creates challenges for review and decision-making. Effective classification and summarization are therefore essential to manage this information efficiently, allowing developers to quickly identify recurring issues and support data-driven development strategies. This study automates large-scale user feedback processing using Natural Language Processing (NLP) and evaluates multiple language models. The Bigbird-Small model achieved the highest agreement with the majority (81.51%) due to its ability to process long-text contexts. XLM-R-Base performed competitively (78.08%), while BERT-Base and Roberta-Base showed stable performance (75.68% and 74.32%). Distilbert-Base, though more computationally efficient, had slightly lower accuracy (74.32%). For summarization, Simple Prompt and Iterative Prompt approaches were compared. The Iterative Prompt with four iterations performed best, achieving similarity 0.911, compression 0.846, keyword overlap 0.624, and redundancy 0.070. These results demonstrate that combining automated classification with iterative summarization can significantly improve both efficiency and accuracy in managing user reports, supporting better decision-making and enhanced mobile app development.</p> Sulisetyo Puji Widodo, Ilmi Aulia Akbar, Waiz Al Qorni, Rifqi Ramadhan , Febi Dwi Haryono Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/510 Mon, 22 Dec 2025 00:00:00 +0000 From Noisy Data to Insight: SOM Filtering Implementation For Improving the Machine Learning Model https://proceedings.stis.ac.id/icdsos/article/view/614 <p>The filtering of representative training data from Big Data are critical steps in developing machine learning models, particularly for official statistics. This study demonstrates the application of Self-Organizing Map (SOM) filtering for enhancing training data quality in remote sensing-based classification of paddy phenological stages using satellite data. By clustering the data, SOM identifies and filters representative samples, which further removing noise and irrelevancy. Following the filtering, comparison is conducted between several purity threshold scheme and non-filtering dataset during model development. Findings reveal that increasing the purity threshold consistently improves classification performance and accuracy respectively, as filtering becomes stricter. The results demonstrate SOM filtering as an effective strategy for improving the representativeness and reliability of training datasets in remote sensing applications, while emphasizing the trade-offs when optimizing machine learning model robustness and generalizability.</p> Achmad Firmansyah Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/614 Mon, 22 Dec 2025 00:00:00 +0000 Harnessing the Potential of the Blue Economy in Central Java https://proceedings.stis.ac.id/icdsos/article/view/708 <p>This study pioneers the mapping and analysis of the blue economy's potential across the 35 regencies/municipalities of Central Java by constructing a novel Blue Economy Index (BEI). Notably, this research is among the first in Indonesia to build the BEI using granular satellite data and digital sensor information, and to apply the Two-Step System GMM approach to dynamically analyze the factors influencing its development. This combination provides unprecedented sub national detail and robust insights into effective policy levers. The findings reveal significant disparities among the southern coastal, northern coastal, and non-coastal areas. The southern coastal regions exhibit higher BEI values compared to their northern coastal and non-coastal counterparts, which fall below the average. Results from the Two-Step System GMM regression analysis indicate that internet usage, infrastructure, and the COVID-19 period exert significant effects on the BEI. Specifically, infrastructure development, proxied by Nighttime Light (NTL), demonstrates a negative impact on the BEI, suggesting that environmentally unsustainable infrastructure may undermine the sustainability of the blue economy. Meanwhile, access to digital technology through internet usage plays a crucial role in fostering inclusive blue economy growth. Based on these findings, the proposed policy recommendations include optimizing environmentally friendly infrastructure development, leveraging digital technology to expand market access, and strengthening the resilience of the blue economy through Adaptive-Responsive-Innovative (ARI) crisis policies. Consequently, the development of the blue economy in Central Java is expected to enhance the sustainable welfare of coastal communities while fully optimizing the potential of coastal areas.</p> Almira Ajeng Pangestika, Dwi Wahyudi, Ridson Al Farizal Pulungan Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/708 Mon, 22 Dec 2025 00:00:00 +0000 Small Area Estimation of Extreme Poverty Using Zero-Inflated Binomial GLMM: A District-Level Case Study in North Sumatra 2024 https://proceedings.stis.ac.id/icdsos/article/view/714 <p>Eradicating extreme poverty is a key objective of Sustainable Development Goal (SDG) 1, with a global benchmark of reducing the proportion of people living below the US$1.90 PPP poverty line. However, in 2024, Indonesia—particularly North Sumatra Province—continues to face persistent challenges in achieving this target. Direct estimation based on the Foster-Greer-Thorbecke (FGT) formula using SUSENAS microdata suffers from large sampling errors (RSE &gt; 25 percent) and zero estimates in multiple districts due to small or absent samples, indicating serious issues of zero inflation and overdispersion. To overcome these limitations, this study applies a model-based Small Area Estimation (SAE) approach using the Zero-Inflated Binomial Generalized Linear Mixed Model (ZIB-GLMM). This method incorporates auxiliary variables from the 2024 PODES dataset and effectively addresses the dual complexities of excess zeros and inter-district variability. Simulation results show that ZIB-GLMM outperforms conventional SAE models in terms of predictive accuracy and model stability. The proposed method offers realistic and policy-relevant district-level estimates of extreme poverty, providing robust evidence to inform targeted interventions and strengthen Indonesia’s national agenda to eradicate extreme poverty.</p> Marta Desna Fitria Br. Lumban Gaol, Beta Septi Iryani, Eni Lestariningsih Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/714 Mon, 22 Dec 2025 00:00:00 +0000 Data Collection for Nearest Public Facility Using Ball Tree Algorithm and Google Maps API https://proceedings.stis.ac.id/icdsos/article/view/541 <p>Accessibility to public facilities is a crucial factor in regional development, including<br />at the village level as the smallest administrative unit. The Central Bureau of Statistics (BPS)<br />currently collects data on public facilities and their distances to village offices through<br />interviews, making the results dependent on respondents’ perceptions. This research aims to<br />measure the nearest distance from village offices to public schools by utilizing the BallTree<br />algorithm and the Google Maps API. The dataset consists of 128 village offices and a list of<br />public schools classified into four categories. BallTree was used to filter the nearest school<br />candidates within a given radius, after which the route distance of the ten nearest candidates was<br />calculated using the Google Maps Distance Matrix API to identify the school with the nearest<br />route distance based on the road network. The findings show that straight-line distance often<br />aligns with route distance, although not at all, highlighting the importance of Google Maps route<br />calculation. This research concludes that combining BallTree and the Google Maps API<br />improves computational efficiency while providing objective and reliable information.</p> Handika Ramadhan Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/541 Mon, 22 Dec 2025 00:00:00 +0000 Disaggregating the Hidden: Small Area Estimates of Child Labor in Bali Province https://proceedings.stis.ac.id/icdsos/article/view/433 <p><span class="NormalTextRun SCXW145894874 BCX0">Child labor remains a critical concern in Indonesia, including in Bali Province, which exhibits a higher prevalence than the national average. However, efforts to formulate effective local policies are often hindered by the unreliability of child labor statistics at the regency/municipality level, primarily due to high Relative Standard Error (RSE) values. This study seeks to estimate more reliable proportion of child labor at the regency level in Bali through the application of Small Area Estimation (SAE). The analysis utilizes data from the August 2024 Sakernas survey, supplemented with contextual variables from the 2024 PODES dataset. The SAE approach employed was the Hierarchical Bayes method with a Beta distribution (HB-Beta). The findings indicate that the HB-Beta model yields better accurate estimates, as evidenced by RSE values below 25% across all regencies. This demonstrates the potential of the HB-Beta model produces more accurate estimates than direct estimates, as it can better reflect differences between regency and help design more effective local policies to reduce child labor.</span></p> Ahmad Nadifa Al Agung, Arlita Dwina Firlana Sari, Clarissa Azarine, Lisda Oktaviana, Zidan Akbar Al Aqsha, Nofita Istiana Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/433 Mon, 22 Dec 2025 00:00:00 +0000 Logical Modelling of Statistical Data Using the SDMX Standard: Case Study on the Quarterly Gross Regional Domestic Product Table https://proceedings.stis.ac.id/icdsos/article/view/641 <p>Poverty, as a national issue, necessitates data-driven policy planning informed by<br />accurate and consistent statistics. To ensure the optimal quality and consistency of statistical data<br />reporting across diverse regions, the adoption of an international standard is crucial. The<br />Statistical Data and Metadata Exchange (SDMX) standard facilitates the structured exchange of<br />data and metadata. This study aims to design and implement a statistical indicator data model<br />using the SDMX standard to improve table consistency. We utilized Quarterly Provincial Gross<br />Regional Domestic Product (GRDP) data as a case study and applied the Design Science<br />Research Method (DSRM) as the methodology. The results demonstrate that modeling the<br />GRDP data using SDMX yields a uniform and highly consistent table structure, significantly<br />enhancing the consistency of statistical data reporting across regions.</p> Kartika Amandasari, Nano Yulian Pratama, Farhan Satria Aditama, Waris Marsisno Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/641 Mon, 22 Dec 2025 00:00:00 +0000 Application of the Geographically Weighted Negative Binomial Regression (GWNBR) Method to Tuberculosis Cases in North Sumatra Province in 2024 https://proceedings.stis.ac.id/icdsos/article/view/474 <p>Tuberculosis is one of the leading causes of death worldwide. Approximately 1.2 million deaths occur annually due to tuberculosis. According to the World Health Organization (WHO), Indonesia is the second-largest tuberculosis country after India, with a 10% prevalence rate (WHO, 2024). According to Ministry of Health data, in 2024, North Sumatra was the province with the highest number of TB cases on Sumatra Island, with several cases above the national average, ranking third in Indonesia. The number of tuberculosis cases in North Sumatra is census data and is overdispersed, with spatial influences. Therefore, the method used is Geographically Weighted Negative Binomial Regression (GWNBR), which produces local parameters. The results show that GWNBR forms eight regional groups based on significant variables. Rainfall and per capita expenditure variables have a significant influence in all districts/cities, and the percentage of BCG immunizations and the percentage of smoking population have a significant influence in almost all regions. Meanwhile, health fund allocation only shows a significant influence in several districts/cities. The AIC value of the GWNBR is not smaller than the AIC value of the negative binomial regression. However, the GWNBR model can be used to examine the influence of independent variables on tuberculosis cases spatially in North Sumatra.</p> Titin Julianti Br Tinambunan, Nisa Hayatun Nufus, Nadia Lutfi Meilawati, Rezky Rahma, Febri Wicaksono Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/474 Mon, 22 Dec 2025 00:00:00 +0000 Development of Portal Pintar Utilization Evaluation Dashboard (Case Study: BPS Province of Bengkulu) https://proceedings.stis.ac.id/icdsos/article/view/581 <p>BPS Statistics Province of Bengkulu (BPS Provinsi Bengkulu) plays a role in<br />supporting statistical operations in Province of Bengkulu. As a vertical agency of Statistics<br />Indonesia (BPS), BPS Province of Bengkulu also holds an important role in providing statistical<br />data at the regional level. Naturally, BPS Province of Bengkulu also requires an integrated<br />system to facilitate all activities, such as providing easier and faster access to information for all<br />employees—both in reporting work progress and in monitoring the implementation of activities<br />such as agenda planning, facility usage, facility loan management, and cross-unit coordination.<br />Portal Pintar is a portal used to facilitate the management of various activities in BPS Province<br />of Bengkulu. By using Portal Pintar, users can access and manage various types of information<br />and documents, such as activity agendas, correspondence, and facility loan applications. BPS<br />Province of Bengkulu then produces periodic evaluations of Portal Pintar’s utilization, which are<br />distributed to all employees. However, the evaluations conducted are not yet visualized<br />automatically and in real time, hence the need to develop a Portal Pintar Utilization Evaluation<br />Dashboard in which visualizations are generated automatically and connected to Portal Pintar’s<br />API. Through the development of this dashboard, it is expected that the evaluation of Portal<br />Pintar’s utilization will become more integrated.</p> Bony Parulian Josaphat, Rifka Humaira Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/581 Mon, 22 Dec 2025 00:00:00 +0000 Do Extracurricular Activities give ‘Extra’ on Academic Performance? Evidence from Propensity Score Matching Methods https://proceedings.stis.ac.id/icdsos/article/view/498 <p>This study compares different statistical methods to determine whether participating<br />in extracurricular activities helps improve students’ academic performance. Utilizing a dataset<br />of 1,000 students, the study balances students who did and did not take part in extracurriculars<br />by adjusting for factors like study hours and attendance. It compares Nearest Mahalanobis<br />Distance, Nearest Neighbor Matching (with and without a caliper), Optimal Pair Matching,<br />Optimal Full Matching, Coarsened Exact Matching (CEM), and Inverse Probability Weighting<br />(IPW) based on covariate balance, sample retention, and average treatment effect. Results reveal<br />that IPW performs best in the covariates balance, reducing nearly all standardized mean<br />differences to near zero while retaining the majority of the dataset. Nearest Neighbor Matching<br />with Caliper and Optimal Pair Matching also perform well with significant treatment effect<br />estimates and relatively strong model fits. However, each method involves trade-offs in which<br />IPW excels in covariate balance but has a higher AIC, a slight compromise in model fit, while<br />Nearest Neighbor Matching with Caliper offers a balance between precision, model fit, and<br />sample retention. In contrast, CEM provides strong covariate balance for categorical variables<br />but results in significant sample loss, demonstrating the trade-off between strict matching criteria<br />and practical applicability. Conversely, Nearest Neighbor Matching without Caliper performed<br />poorly in balancing covariates. As evidenced by the average treatment effect estimates derived<br />from the propensity score matching (PSM) methods, this study concludes that participation in<br />extracurricular activities has a positive and significant impact on students' academic<br />performance, with study hours, attendance, and resource accessibility emerging as critical factors<br />as well. The novelty of this study is in comparing multiple statistical matching approaches side<br />by side in an educational context, providing guidance for researchers and policymakers.</p> Bryan Nozaleda Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/498 Mon, 22 Dec 2025 00:00:00 +0000 Determinants of Comprehensive Understanding of Stunting among Indonesian Pregnant Women and Mothers of Toddlers Aged 0–23 Months in 2023 https://proceedings.stis.ac.id/icdsos/article/view/688 <p>Stunting is a chronic nutritional disorder that remains a priority in Indonesia. As with<br />the second goal of the SDGs (zero hunger), the Ministry of Health (MoH) has implemented a<br />communication strategy for behavioural change and community empowerment through a class<br />program for pregnant women and mothers of toddlers class using the Maternal and Child Health<br />(MCH) book. However, it is still not optimal to increase the understanding of stunting. The 2023<br />Indonesian Health Survey (IHS) shows that women in Indonesia still have a poor comprehensive<br />understanding of stunting. It has includes pregnant women and breastfeeding mothers as key<br />target groups for stunting reduction. This study aims to describe and analyse the characteristics<br />of Indonesian pregnant women and mothers of toddlers aged 0–23 months that significantly<br />influence their comprehensive understanding levels of stunting. Data from 2023 IHS were<br />analysed using descriptive statistics with graph and table, together with inferential analysis<br />through ordinal logistic regression using the Proportional Odds Model (POM). The result shows<br />that the majority of these mothers have a poor level of comprehensive understanding of stunting,<br />with five variables having a significant influence, namely: access to information, education level,<br />employment status, socioeconomic status, and residence area.</p> Agnes Rosihan Kristianti Silalahi, Rini Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/688 Mon, 22 Dec 2025 00:00:00 +0000 Did the Digital Push Last? E-Commerce and Rural Agricultural Earnings in Indonesia During and After COVID-19, Evidence from Sakernas https://proceedings.stis.ac.id/icdsos/article/view/623 <p>This paper examines the impact of e-commerce adoption on earnings and income<br />distribution among rural agricultural employers in Indonesia, both during and after the COVID19 pandemic. Using microdata from the National Labour Force Survey/Sakernas (2018–2024)<br />and applying probit, OLS, Propensity Score Matching, and quantile regression models, we<br />identify the determinants of adoption and its impact on earnings. Adoption was strongly driven<br />by education, training, and enterprise characteristics, while older age and reliance on unpaid<br />household labor constrained uptake. Results show that e-commerce adopters earned substantially<br />higher than non-adopters (more than 30 percent) both during and after the pandemic, confirming<br />sustained income gains beyond the crisis. Quantile regressions reveal that the lowest-income<br />employers benefited most, with earnings gains exceeding 50 percent at the bottom quantile<br />during the pandemic. Although relative advantages shifted toward higher earners after the<br />pandemic, large and significant effects remained for the lowest-income groups. These findings<br />indicate that e-commerce not only enhances market access but also contributes to improving<br />income distribution. Policy interventions to strengthen digital literacy, rural infrastructure, and<br />financial access are essential to preserve its inclusive role and ensure that vulnerable agricultural<br />employers continue to benefit disproportionately.</p> Kadir Ruslan, Weni Lidya Sukma Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/623 Mon, 22 Dec 2025 00:00:00 +0000 The Individual and Contextual Factors of Precarious Employee Status of Youth Workers in Indonesia 2024: Application Multilevel Binary Logistic Regression https://proceedings.stis.ac.id/icdsos/article/view/561 <p>Human resources are a strategic component for countries in achieving development<br />goals and promoting progress. Among age groups, youth play an important role as drivers of a<br />country's development. However, the challenge of obtaining decent work is a serious problem<br />that causes many youth people in Indonesia to be forced into precarious employment. In the last<br />four years, the Precarious Employment Rate (PER) of youth people in Indonesia in 2024 has<br />increased dramatically compared to the previous year, even becoming the highest among all age<br />groups. This study aims to determine the general picture and analyze the individual and<br />contextual factors that influence the status of precarious employees among youth workers in<br />Indonesia. The analysis method used is multilevel binary logistic regression. The results of the<br />study show that 85.97 percent of youth workers in Indonesia have precarious employee status.<br />The analysis shows that individual factors such as gender, marital status, education level,<br />participation in training, regional classification, employment sector, labor union membership,<br />and contextual factors such as the provincial minimum wage have a significant effect on the<br />precarious employee status of youth workers in Indonesia in 2024.</p> Arya Samuel Mandy, Sugiarto . Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/561 Mon, 22 Dec 2025 00:00:00 +0000 Regional Clustering of Food Insecurity to Support the Attainment of SDG 2: Zero Hunger through Machine Learning Approaches https://proceedings.stis.ac.id/icdsos/article/view/475 <p>Food security remains a persistent development challenge in Indonesia, with regional disparities posing significant barriers to achieving equitable access to nutritious and sufficient food. This study aims to classify and cluster districts and cities in Indonesia based on their food security vulnerability levels, thereby supporting the attainment of SDG 2: Zero Hunger. We employed a machine learning approach using a dataset of 514 regions and nine food security indicators sourced from national databases. The classification phase compared three algorithms, Random Forest, XGBoost, and LightGBM, under multiple data preprocessing scenarios, including outlier handling (IQR and Isolation Forest) and class balancing (SMOTE). LightGBM with IQR preprocessing delivered the best performance, achieving an accuracy and F1-score of 0.984. For clustering, DBSCAN and HDBSCAN were applied using the six most important features identified by the classifier. DBSCAN showed slightly better performance based on Silhouette Score (0.5639), resulting in three regional groupings: food-secure, highly vulnerable, and outlier regions. The analysis revealed that socio-economic factors and access to basic infrastructure remain critical determinants of food insecurity. The results underscore the importance of data-driven approaches in policy formulation and highlight the value of machine learning in producing more targeted, efficient, and adaptive food security interventions in Indonesia.</p> Siti Nuradilla, Wawan Saputra, Muhammad Rizal Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/475 Mon, 22 Dec 2025 00:00:00 +0000 Promoting Peaceful and Inclusive Information Security Compliance: A Systematic Review of Assurance Behavior in IT Employees within the Context of SDG-16 in Malaysia https://proceedings.stis.ac.id/icdsos/article/view/508 <p>This systematic review examines the alignment between IT employees' desire,<br />intention, and compliance with information security protocols, a critical issue in Malaysia where<br />human error is a leading cause of data breaches. Situated within the context of Sustainable<br />Development Goal 16 (SDG-16), the study analyzes 30 peer-reviewed articles to identify key<br />behavioral factors. Findings indicate that while training improves knowledge, its impact on longterm behavior is limited. A significant compliance gap is driven by psychological factors like<br />work overload and optimism bias, as well as organizational elements such as culture and<br />management support. The review concludes that effective information security assurance<br />requires a holistic strategy integrating tailored, ethical training with strong organizational support<br />to mitigate psychological strain and foster a robust security culture. This approach is essential<br />not only for strengthening cybersecurity but also for supporting Malaysia's commitment to digital<br />resilience and the principles of SDG-16.</p> Aziela Isma Zarilla Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/508 Mon, 22 Dec 2025 00:00:00 +0000 Enhancing Poverty Rates Reliability Using Small Area Estimation https://proceedings.stis.ac.id/icdsos/article/view/695 <p>This study systematically compares the performance of three Small Area Estimation<br />(SAE) methods—Empirical Best Linear Unbiased Predictor (EBLUP), Hierarchical Bayes (HB)<br />Beta, and HB Flexible Beta—using two different auxiliary data sources-Village Potential<br />(Podes) and Socio-Economic Registration data (Regsosek). The SAE methodologies were<br />applied in a case study focusing on Java Island, Indonesia. Direct estimates remain has high<br />Relative Standard Errors (RSE) above 25%, indicating low reliability. EBLUP methods<br />improved estimate reliability but still produced some unreliable estimates. The HB Beta method<br />further reduced RSE values, while the HB Flexible Beta model achieved the lowest RSE,<br />eliminating all unreliable estimates. Moreover, Socio-Economic Registration data consistently<br />resulted in lower RSE values compared to Village Potential data, particularly when used with<br />the HB Flexible Beta model. These result highlight that integrating advanced SAE models such<br />as HB Flexible Beta with high-quality administrative data such as Socio-Economic Registration<br />data is crucial for producing reliable and precise poverty estimates for more targeted and<br />effective poverty alleviation policies.</p> Novia Permatasari Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/695 Mon, 22 Dec 2025 00:00:00 +0000 Estimating the Unemployment Rate at Sub-District Level in West Java Province in 2024 Using Hierarchical Bayesian Approach with Cluster Information https://proceedings.stis.ac.id/icdsos/article/view/518 <p>Unemployment is a substantial obstacle to growth in Indonesia, affecting both social<br />and economic stability. The Unemployment Rate is a crucial metric that quantifies the proportion<br />of the labor force actively pursuing work opportunities. The unemployment rate serves as a<br />critical indicator of labor market imbalances, essential for labor policy formulation and<br />assessment. Nonetheless, unemployment data has limitations, particularly at the micro-level,<br />owing to sample constraints. Small Area Estimation (SAE) can address these constraints. This<br />study estimates the unemployment rate at the sub-district level in West Java province for 2024<br />utilizing the Hierarchical Bayes Beta methodology and clustering techniques. The modeling<br />results indicate that most sub-districts exhibit a low to medium unemployment rate, however 21<br />locations demonstrate a very high unemployment rate, ranging from 23.00 percent to 48.06<br />percent.</p> Randy Daffa Aditya , Awika Yuliati Zukhrufah, Eksis Auliya, Dyah Widyastuti, Adrian Kesar Pratama Lubis, Anggie Dwi Nugraha, Siti Muchlisoh Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/518 Mon, 22 Dec 2025 00:00:00 +0000 A Hybrid Method for Standardising Civil Registration and Vital Statistics (CRVS) Location Data https://proceedings.stis.ac.id/icdsos/article/view/618 <p> Civil Registration and Vital Statistics (CRVS) systems in archipelagic contexts like<br />Indonesia face persistent challenges in location data standardisation due to free-text entries that<br />vary in spelling, formatting, and granularity. This study introduces a multi-stage hybrid<br />framework that systematically converts these unstructured entries into official administrative<br />codes using deterministic matching, fuzzy probabilistic matching, and geocoding. This study<br />processed 841,126 birth and death records using Python (Pandas, RapidFuzz, Geopy).<br />Cumulatively, all stages achieved a combined match rate of 85.44% for births and 67.12% for<br />deaths. The layered pipeline ensured speed, precision, and coverage for real-world CRVS data.<br />The findings demonstrate enhanced geographic precision in vital statistics, enabling more<br />reliable public health and demographic applications. Future improvements may include<br />transformer-based embeddings, active learning for ambiguous records, and uncertainty-aware<br />geocoding techniques. This framework establishes a scalable, robust pathway for elevating the<br />granularity and reliability of geolocated vital event data.</p> Ignatius Sandyawan, Yeni Rimawati, Ari Rismansyah Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/618 Mon, 22 Dec 2025 00:00:00 +0000 Comparison of Imputation Methods: Traditional, Machine Learning, and Deep Learning on Multivariate Time Series with MCAR and MNAR https://proceedings.stis.ac.id/icdsos/article/view/707 <p>This study compares the methods of Linear Interpolation, Kalman Filtering, SVR, and RNN-GRU for multivariate time series that exhibit linear trends and seasonality. Synthetic data for three variables were generated for small, medium, and large sample sizes. Missing values were systematically inserted using Missing Completely at Random (MCAR) and Missing Not at Random (MNAR) patterns with proportions of 10%, 20%, and 35%. The accuracy of imputation was evaluated using RMSE, MAPE, and R² over 150 simulation repetitions per scenario. The results indicate that each method has advantages under certain conditions. Linear Interpolation is suitable for data with linear trends, small sample sizes, and low to moderate missingness levels, and is effective for both MCAR and MNAR patterns. Kalman Filtering is optimal for medium to large datasets, particularly in handling linear and seasonal trend patterns with high proportions of missing data due to MCAR. SVR excels in large seasonal data scenarios with MNAR missingness patterns. RNN-GRU performs well under low missingness conditions, particularly for small seasonal datasets with MNAR patterns. These findings emphasise that the choice of imputation method should consider data size, trend patterns, and the missing data mechanism to minimise bias and preserve the integrity of the temporal structure.</p> Ferigo Taufani Tri Hakiki, Naufal Luthfan Tasbihi, Akila Akhtar El Dafi, Nurfaudzan ., Andi Shahifah Muthahharah Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/707 Mon, 22 Dec 2025 00:00:00 +0000 The Influence of Child, Households, and Villages/Sub-Districts Characteristics on The Working Status of Children in East Nusa Tenggara Province 2024 https://proceedings.stis.ac.id/icdsos/article/view/629 <p>The percentage of the poor population in East Nusa Tenggara Province is being the<br />fourth highest in Indonesia in 2024, but the highest percentage of child labor in Indonesia. The<br />purpose of this study is to find out the picture, influencing factors, and trends of factors affecting<br />child labor in East Nusa Tenggara Province in 2024. The unit of analysis was children aged 10-<br />17 years who were unmarried and not as head of household with a sample of 9,117 children from<br />6,123 households and 1,165 villages/sub-districts. The data used are Susenas Kor and Modules<br />March, as well as Podes 2024 sourced from BPS. The analysis method in this study is multilevel<br />binary logistics regression. The results of the study show that children who work are boys aged<br />15-17 years. The child lives in households with a low level of head of households’ education and<br />household work in the agricultural sector, a small number member of productive age, and have<br />micro and small enterprises, and live in villages/sub-districts with many micro and small<br />industries and the main source of income for most of the population in the agricultural sector.</p> Angga Prayoga, Budiasih . Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/629 Mon, 22 Dec 2025 00:00:00 +0000 Estimation of Energy Transition Index based on Official Statistics and Satellite Imagery Data https://proceedings.stis.ac.id/icdsos/article/view/724 <p>Energy has a crucial role in sustaining human life, its implementation should be optimized based on the principles of sustainable development through a shift from non-renewable to renewable sources. To monitor this shift, the World Economic Forum (WEF) developed the Energy Transition Index (ETI), which measures national-level transitions using conventional statistical data. However, the ETI is limited to the country level, while more detailed assessments are needed at smaller administrative scales such as regencies and cities to capture regional specificities. This study addresses the gap by constructing an energy transition index at the regency/city level in Indonesia for 2024. The analysis integrates official statistics with satellite imagery data to overcome limitations in subnational data availability. Methodologically, Exploratory Factor Analysis and uncertainty analysis were applied. Among five scenario of uncertaincy analysis tested, scenario 1 featuring min-max normalization, unequal weighting across indicators and factors, and linear aggregation produced the most reliable results. The findings reveal that the index is composed of four main factors. Overall, Indonesia’s energy transition index values show a relatively even distribution, yet disparities remain evident across islands and between regencies/cities. Higher scores are concentrated in the western regions, while lower scores dominate the eastern parts of the country.</p> Sabilla Hamda Syahputri, Waris Marsisno Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/724 Mon, 22 Dec 2025 00:00:00 +0000 Analysis of Factors Affecting Deforestation in Riau From 2001 To 2023 Using The ARDL Approach https://proceedings.stis.ac.id/icdsos/article/view/556 <p>Forests are one of the most important elements for human life. One of Indonesia's<br />problems for decades has been high rates of deforestation. Riau is the province with the highest<br />total deforestation in Indonesia in the last 23 years. The government has implemented various<br />measures to achieve both short-term and long-term targets related to reducing deforestation.<br />Therefore, this study aims to analyze the variables suspected of influencing deforestation in the<br />short and long term using the Autoregressive Distributed Lag. The results of the study indicate<br />that the variables influencing deforestation in Riau Province in the short term are the GDP of the<br />agriculture, forestry, and fisheries sectors and forest and land fires. In the long term, the<br />significant influencing variables are the GDP of the agriculture, forestry, and fisheries sectors,<br />the implementation of Law No. 18 of 2013, and the extent of forest and land fires. Based on<br />these findings, in the short term, the government is expected to transform the agricultural sector<br />economy toward a more sustainable direction and halt the clearing of forest areas for oil palm<br />plantations, especially those conducted through forest burning. In the long term, the government<br />should further strengthen the implementation of the law.</p> I Wayan Divandra Maharesandya Sukajaya, Efri Diah Utami Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/556 Mon, 22 Dec 2025 00:00:00 +0000 Spatial Model for Food Security in Eastern Indonesia 2024 https://proceedings.stis.ac.id/icdsos/article/view/468 <p>Food security is the condition of meeting food needs for the country down to the individual level, as measured by the availability, affordability, utilization, and stability of food. Despite being a basic human need, food security in Indonesia is not evenly distributed, especially in Eastern Indonesia. Based on these findings, this study aims to determine the general picture of food security and the factors influencing it in districts/cities in Eastern Indonesia in 2024. The method used is the Spatial Durbin Model (SDM) with an inverse distance weighting matrix. The results show that the variables Distribution of GRDP of Sector Agriculture, Forestry and Fishing, Poverty Rate, Average Years of Schooling, Lag of Food Security Index, Lag of Open Unemployment Rate, and Lag of Poverty Rate have a significant influence on the Food Security Index variable in districts/cities in Eastern Indonesia in 2024.</p> Fathiyah Nur Shohwah, Imam Fathoni Arufi, Mohammad Iqbal Wicaksono, Nadia Lutfi Meilawati, Nilam Cahya Meilani, Gama Putra Danu Sohibien Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/468 Mon, 22 Dec 2025 00:00:00 +0000 Improving The Accuracy of Area Sampling Frame Estimators for Agricultural Surveys Using Unequal Clustered Segment Sampling: The Case of Indonesia https://proceedings.stis.ac.id/icdsos/article/view/477 <p>Accurate rice production data are vital for maintaining national food security and formulating effective agricultural policies. In Indonesia, the Area Sampling Frame (KSA) method has been widely implemented to estimate rice harvest areas using segments of 300 meters×300 meters represented by nine observation points. However, this approach faces limitations, particularly the risk of undercoverage bias when estimating areas across different rice growth stages, especially if the observation points fall outside the target rice-growing regions as population area. To address this issue, the present study introduces the Unequal Clustered Segment Sampling method as an alternative to the traditional KSA approach. The Unequal Clustered Segment Sampling method improves estimation accuracy by refining the sampling frame and excluding non-target segments, spatial points located outside actual rice-growing regions. Through a design-based estimation framework, the proposed method accounts for unequal cluster sizes, allowing a more representative depiction of field conditions. The empirical results demonstrate that the Unequal Clustered Segment Sampling method significantly reduces bias and enhances the precision of rice area estimates compared to the conventional KSA. These findings suggest that incorporating unequal clustered segment sampling designs into KSA-based surveys can yield more reliable and representative estimates, particularly in heterogeneous or fragmented agricultural landscapes.</p> Hazanul Zikra, Widyo Pura Buana, Yocco Bimarta, Nurina Paramitasari Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/477 Mon, 22 Dec 2025 00:00:00 +0000 A Multi-Temporal Remote Sensing Approach to Quantify Land Cover Change and its Impact on Ecosystem Sustainability in Riau, Indonesia https://proceedings.stis.ac.id/icdsos/article/view/691 <p>This study analyzes land cover change in Riau Province from 2015 to 2024, focusing<br />on deforestation and degradation as indicators of ecosystem sustainability. Landsat 8 OLI/TIRS<br />and Landsat 9 OLI-2 imagery processed in Google Earth Engine (GEE), combined with MODIS<br />hotspot data (MOD14A1) and socioeconomic indicators—Gross Regional Domestic Product<br />(GRDP) and Open Unemployment Rate (OUR) from Statistics Indonesia (BPS)—were used to<br />assess spatiotemporal patterns. The Normalized Difference Vegetation Index (NDVI) was<br />applied with thresholds for deforestation (NDVI &lt; –0.3) and degradation (–0.3 ? NDVI ? –0.1).<br />Results show that 2015 was the most severe period, dominated by peatland fires, while 2019<br />recorded forest loss at a lower intensity and 2020–2024 indicated partial vegetation recovery<br />linked to restoration efforts. Pelalawan, Indragiri Hilir, and Kampar were the most affected<br />districts. Correlation analysis revealed that fire hotspots had the strongest association with land<br />cover change, while economic and social indicators showed weaker relationships. Peatland fires<br />remain the main driver of land degradation, emphasizing the need to strengthen fire management,<br />peatland protection, and sustainable plantation governance to support Sustainable Development<br />Goal (SDG) 15 on Life on Land, particularly the target of Land Degradation Neutrality (15.3.1)<br />by 2030.</p> Novri, Fitri, Muqiit Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/691 Mon, 22 Dec 2025 00:00:00 +0000 Intersectoral Linkages and Spillover Effects in South Sumatra’s Economy: Evidence from the 2016 Interregional Input–Output Table and 2024 Input–Output Table https://proceedings.stis.ac.id/icdsos/article/view/631 <p>This study examines South Sumatra’s economic structure using interregional input– output analysis to identify key sectors and quantify spillover effects. A dual-dataset approach employs the 2016 IRIO table for interprovincial trade dynamics and the 2024 IO table for current sectoral analysis. Results indicate a domestically oriented economy, with 88.45% of supply met by internal production. Manufacturing and construction emerge as central hubs with strong intersectoral linkages, supported by agriculture and mining as upstream suppliers. Interregional trade is concentrated with nearby Sumatran provinces and Java’s industrial centers. Spillover effects benefit Jambi, Bengkulu, and Banten, while feedback effects show dependency on Java. Output multipliers highlight electricity and gas as key growth drivers, whereas agriculture and real estate contribute most to local income. These patterns reveal a structural divergence between growth and inclusivity. To address this, the study recommends a dual-track strategy: scale up manufacturing and energy to drive aggregate output, while modernizing agriculture and highvalue services to support income distribution. Strengthening interprovincial corridors and deepening local supply chains can further enhance resilience and expand the province’s role in national development.</p> Marpaleni, Mardiana, Anggi Dwi Puspita, Indhira Putri Rama Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/631 Mon, 22 Dec 2025 00:00:00 +0000 Dynamic Linkages and Monetary Policy Transmission in the Cryptocurrency Market: A Vector Autoregressive Study of Bitcoin, Ethereum, and The Fed's Interest Rate https://proceedings.stis.ac.id/icdsos/article/view/727 <p>The cryptocurrency market, characterized by high volatility, has evolved into a significant financial asset class, attracting both retail and institutional investors. Understanding its interconnectedness with macroeconomic factors is crucial for risk management and financial stability. This study empirically analyzes the dynamic relationships between two primary crypto assets, Bitcoin (BTC) and Ethereum (ETH), and the monetary policy shifts of the U.S. Federal Reserve (The Fed). Using a Vector Autoregression (VAR) model on daily time-series data from January 1, 2022, to June 16, 2025, this research investigates the short-term dynamics, Granger causality, and shock transmissions within this system. The findings reveal a significant one-way causal relationship from The Fed's interest rate changes to both Bitcoin and Ethereum returns, challenging the weak-form Efficient Market Hypothesis. Furthermore, Impulse Response Function (IRF) and Forecast Error Variance Decomposition (FEVD) analyses provide robust evidence of Bitcoin's market leadership, with shocks in Bitcoin explaining nearly 70% of the variance in Ethereum's movements. These results highlight a clear hierarchical structure: The Fed influences broad market sentiment, while Bitcoin leads internal market dynamics, offering critical insights for investors and policymakers navigating the digital asset ecosystem.</p> Muhammad Zaki Azhari, M A A Ghiffari, A Ghiffari Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/727 Mon, 22 Dec 2025 00:00:00 +0000 Parameter Estimation in Hierarchical Models: A Comparison of Bayesian and SGD-Adam Approaches on Biomass Data of Lutjanidae https://proceedings.stis.ac.id/icdsos/article/view/428 <p>Hierarchical statistical models are widely used to analyse data with nested structures or repeated measurements, allowing variability across levels to be partitioned and providing more accurate parameter estimation than standard regression models. In the Bayesian framework, parameter estimation often uses Markov Chain Monte Carlo (MCMC), which accommodates complex structures and yields full posterior distributions. However, MCMC is computationally intensive, limiting scalability for large datasets. Recent advances in optimization methods, such as Hierarchical Stochastic Gradient Descent (HSGD) with Adaptive Moment Estimation (Adam), offer a faster and more efficient alternative for hierarchical models. This study applies Hierarchical Bayesian and HSGD-Adam approaches to fish biomass data of the family Lutjanidae from seven Marine Protected Areas (MPAs) in Raja Ampat, Indonesia. The model incorporates ecological predictors such as hard coral cover, distance to the nearest village and period of monitoring, with random effects for area of MPA. Comparison of predictive performance showed that the Bayesian model performed slightly better in RMSE, indicating its ability to capture extreme biomass variations, while SGD-Adam model achieved a lower MAE, reflecting greater stability in prediction. These findings demonstrate that advanced hierarchical modelling methods can enhance ecological data analysis and provide timely, data-driven insights for sustainable marine conservation policy.</p> Dariani Matualage, K Sadik, A Kurnia, H F Monim, F Pakiding Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/428 Mon, 22 Dec 2025 00:00:00 +0000 Correlation Analysis of Seasonal Changes on Aerosol Concentration Using Remote Sensing in Java Island https://proceedings.stis.ac.id/icdsos/article/view/636 <p>Aerosols are small particles in the atmosphere that affect the climate through direct and indirect mechanisms. Aerosols can influence the climate and play a role in cloud formation and precipitation. This study aims to analyze the relationship between seasonal changes and aerosol concentrations, and to identify parameters that influence aerosol concentrations in Java Island using remote sensing. The method used in this study is the Pearson correlation test to determine the relationship between seasonal changes and aerosol concentrations in the atmosphere. The results show that there is a relationship between Aerosol Optical Depth (AOD) and rainfall with a correlation value (R) of 0.8. This result indicates a significant relationship between the two variables. Meanwhile, the analysis results between Aerosol Optical Depth (AOD) and wind speed show a correlation value (R) of 0.05. This result indicates that the relationship between Aerosol Optical Depth (AOD) and wind speed is very weak between the two variables.</p> Garda Asa Muhammad, Annisa Amaanah, Vanya Chathy Kemala Dewi Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/636 Mon, 22 Dec 2025 00:00:00 +0000 Extreme Value Theory: Modelling Catastrophic Losses In Sports Injury https://proceedings.stis.ac.id/icdsos/article/view/736 <p>Using Extreme Value Theory with a peaks-over-threshold method, we modelled the top 2% of sports-injury losses from 200,000 simulated claims. A generalized Pareto fit via MLE yielded a positive shape (? = 0.783), indicating a fat tail where rare injuries dominate severity. Q–Q and P–P diagnostics show good agreement between model and data. The implied 100-year loss is round 3.31 billion (currency units), and TVaR confirms that conditional on approaching the tail, predicted losses increase quickly. These findings support need for capital buffer to mitigate costly injuries, severe-scenario stress testing, and pricing loadings that specifically consider for costly but rare injuries.</p> Adriano Juwono Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/736 Mon, 22 Dec 2025 00:00:00 +0000 Deciphering Student Academic Success: Bayesian Analytical Insights https://proceedings.stis.ac.id/icdsos/article/view/559 <p>This study delves into the factors influencing student’s academic achievement utilizing Bayesian mixed effect models. It presents five distinct models, each integrating various fixed variables such as gender, playing hours, stress level, and travelling hours, alongside random variables such as school level and type of school. These models are evaluated using the LeaveOne-Out Information Criterion (LOOIC) to gauge their adequacy in fitting the data and predicting outcomes. The findings unveil that the inclusion of additional factors, such as school characteristics and students' activities, modifies the relationship between gender and academic success, with gender exerting a diminishing influence as more variables are incorporated. Additionally, stress level and travelling hours emerge as noteworthy predictors of average marks. Among the models assessed, the one incorporating gender, playing hours, and stress level as fixed effects, alongside school level and type as random effects, demonstrates superior fit and predictive capability. This underscores the significance of considering both individual traits and contextual elements in comprehending academic performance.</p> V Suriya Kannan, S Lakshmi, Reshmavathi . Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/559 Mon, 22 Dec 2025 00:00:00 +0000 Spatial Spillover Effects in Food Security: A Spatial Lag Fixed Effects Model for Regencies and Cities in West Sumatra (2019–2023) https://proceedings.stis.ac.id/icdsos/article/view/485 <p>Food security is a key pillar of national development, reflecting a region’s ability to sustain food availability, accessibility, utilization, and stability. The Food Security Index (FSI) serves as a crucial measure of this capability. Based on 2023 data, West Sumatra Province achieved the highest FSI score on the island of Sumatra. This study analyzes food security in 19 regencies and cities of West Sumatra from 2019 to 2023 using a Spatial Lag Fixed Effects Model. The research integrates spatial analysis and panel data approaches to identify determinants of the FSI and assess spatial spillover effects between regions. Secondary data were obtained from the Statistics Agency (BPS) and the National Food Agency. The results reveal significant spatial autocorrelation in most years, except 2023. The best-fitting model is the Spatial Lag Fixed Effects Model. Changes in land area, food expenditure, and rice productivity significantly improve FSI, while non-food expenditure and economic growth do not show a positive effect. The findings emphasize the importance of incorporating spatial dependencies in regional food security policies. Moreover, significant spillover effects indicate that improvements in one area can influence neighboring regions. Therefore, inter-regional cooperation and integrated food distribution policies are essential to achieving sustainable food security.</p> Fadhel Imam Haichal Tanjung, Erwin Tanur Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/485 Mon, 22 Dec 2025 00:00:00 +0000 Spatial Modelling of the Relationship Between the Characteristics of Vegetation Index, Life Expectancy and Fertility Rate in Banten Province https://proceedings.stis.ac.id/icdsos/article/view/659 <p>Rapid urbanization in Banten Province has reduced green open spaces, impacting environmental sustainability and demographic dynamics. This study analyzes the spatial relationship between vegetation index, life expectancy (LE), and total fertility rate (TFR) using Landsat 8 imagery (2020–2024) and demographic data from the Central Bureau of Statistics (BPS). The vegetation index, measured using the Normalized Difference Vegetation Index (NDVI), was examined alongside LE and TFR through Pearson correlation and Moran’s I spatial autocorrelation. The results indicate a moderate negative correlation between NDVI and LE (r = -0.561, p &lt; 0.05) and a strong negative correlation between LE and TFR (r ? -0.94). Urban areas such as Tangerang City and South Tangerang City, despite having low vegetation cover, recorded higher LE due to adequate healthcare access. Conversely, rural areas with greater vegetation tended to have lower LE. Spatial analysis identified urban centers as hotspots with high LE, while rural regions appeared as coldspots. These findings confirm that healthcare access and socioeconomic factors can compensate for limited vegetation, while demographic transitions contribute to fertility decline, ultimately supporting sustainable development in Banten Province.</p> Ahmad Syuhada Islami Asyari, Diana Sumirah, Syaefunnisa ., Achmad Fadhilah, Andika Permadi Putra Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/659 Mon, 22 Dec 2025 00:00:00 +0000 The Influences of Climate Change and Social Vulnerability on Dengue Fever Incidence Rate in West Java Province 2019–2023 https://proceedings.stis.ac.id/icdsos/article/view/606 <p>In Indonesia, dengue fever is a serious public health problem. The increase in dengue fever cases is influenced by climate change and social vulnerability factors. This study focuses on West Java Province in 2019–2023, aiming to describe the spatial-temporal pattern of dengue fever incidence and analyze the influence of climate factors and social vulnerability using a spatial-temporal model, namely Geographically Temporally Weighted Regression (GTWR). The exploration results show a high concentration of dengue fever incidence rates in 2019, while in 2023, the intensity of dengue fever incidence decreases. The GTWR model produces local parameters across various regions and time periods, indicating that in most regencies/cities, rainfall, population density, access to inadequate sanitation, health facility ratio, and education level have a positive effect on dengue fever incidence rates, while land surface temperature and the percentage of poor people have a negative effect. From the GTWR model results, areas with high levels of dengue fever vulnerability can be identified as priorities for dengue fever management interventions. Therefore, this study contributes to early warning research and dengue fever control program planning by considering the risk of dengue fever vulnerability in each region.</p> Alwan Nabil Hanif, Gama Putra Danu Sohibien, Ika Yuni Wulansari Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/606 Mon, 22 Dec 2025 00:00:00 +0000 Applied Bayesian Analysis of Intergenerational Fingerprint Pattern Similarity https://proceedings.stis.ac.id/icdsos/article/view/611 <p>This research reports on the inheritance of fingerprint types across three generations of families. Uses of Bayesian measures of statistical analysis indicates a moderate transference of loops and whorls between generations (grandfather, father, son), with negligible transference for arches and only joint moderate evidence across all three generations. A total of 150 samples from 50 family trios were analyzed, classified fingerprints as Arch, Ulnar/Radial Loop, Composite, and Whorl. Cross-tabulation showed the highest transference in Ulnar/Radial Loops, followed by Whorls, with minimal transference for Arches and Composites. The Bayesian correlation analysis of father &amp; grandfather and son &amp; father showed strong similarities between generations (father &amp; grandfather - Pearson r = 0.283, BF?? = 44.74; Kendall’s ?B = 0.255, BF?? = 4650.48) and substantial evidence for the association between sons and fathers. The analysis showed negligible transference between sons and grandfathers. Bayesian regression and model comparisons supported the null model, with very low R² values (0.003–0.012), indicating minimal predictive influence of parental patterns on the son’s fingerprint phenotype. Overall, the findings indicate moderate hereditary continuity of fingerprint patterns between successive generations, but weak evidence for transmission across all three generations. This suggests that fingerprint inheritance is complex, influenced by both genetic and developmental-environmental factors affecting dermatoglyphic patterns.</p> Aswini N K , RUDRANK SHUKLA, M C Janaki Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/611 Mon, 22 Dec 2025 00:00:00 +0000 Implementing LSTM-Based Deep Learning for Forecasting Food Commodity Prices with High Volatility: A Case Study in East Java Province https://proceedings.stis.ac.id/icdsos/article/view/692 <p>Accurate food price forecasting is essential for maintaining market stability and food security. East Java Province was selected as the study area because it is one of Indonesia’s main food production centers and a major contributor to national inflation. This study compares three deep learning architectures LSTM, Bi-LSTM, and hybrid CNN-LSTM to forecast the prices of four key food commodities (red chili, shallots, medium-grade rice, and beef) in East Java. Hyperparameter tuning was performed using grid search, and performance was evaluated using MAPE, MAE, and RMSE. The results show that the Bi-LSTM model consistently provides the best performance compared to LSTM and CNN-LSTM across the four analyzed commodities. Based on MAPE, MAE, and RMSE values, Bi-LSTM achieved the lowest forecasting errors for all commodities. The MAPE values of Bi-LSTM were 1.73% for red chili, 0.60% for shallots, 0.23% for medium-grade rice, and 0.08% for beef, all of which were lower than those of LSTM and CNN-LSTM models. These findings highlight Bi-LSTM’s bidirectional architecture, which leverages contextual information from both past and future data sequences, making it the most robust and effective model for forecasting food prices under varying volatility. The study provides practical insights for policymakers and supply chain stakeholders in supporting price stability and food security.</p> Andi Illa Erviani Nensi, Windi Pangesti, Nabila Syukri, Mahda Al Maida, Khairil Anwar Notodiputro Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/692 Mon, 22 Dec 2025 00:00:00 +0000 Predicting Bronchopulmonary Dysplasia in Infants: A Comparative Evaluation of Probit and Machine Learning Models https://proceedings.stis.ac.id/icdsos/article/view/617 <p>This study compares the predictive performance of traditional Probit regression and several machine learning models in predicting Bronchopulmonary Dysplasia (BPD) among preterm infants. The models were evaluated using standard performance metrics, including accuracy, precision, specificity, sensitivity, F1-score, and the Area Under the Receiver Operating Characteristic Curve (AUC-ROC). Among all models, the Random Forest demonstrated superior predictive performance with the highest accuracy (86.36%), precision (85.71%), specificity (87.50%), sensitivity (85.71%), F1-score (0.8571), and AUC (0.92), indicating a strong discriminative ability. Birth weight and postnatal weight at four weeks emerged as the most significant predictors of BPD. The findings suggest that machine learning approaches, particularly the Random Forest algorithm, provide a more robust predictive framework than the conventional Probit regression model for early detection of BPD risk in preterm infants.</p> Shazali Umar Madaki, Abba Bello Muhammad , Hamisu Ahmad Hamisu Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/617 Mon, 22 Dec 2025 00:00:00 +0000 Role of Agricultural Sector and Quality of Its Production Factor in Indonesia: An Application of Input-Output Analysis and Panel Model https://proceedings.stis.ac.id/icdsos/article/view/705 <p>Indonesia has been known as the largest agricultural country in Southeast Asia. However, the sector contribution to national output has declined. This indicates a low interconnection between agriculture and the other sectors despite the sector’s significant potential to stimulate other industries’ output through strong backward and forward linkages. This condition is caused by the role of production factors that determine agricultural output. Therefore, the research aims to analyse agriculture’s linkages with other sectors and to assess the effects its production factor on agricultural output. Using Input–Output multiplier analysis, it is found the agriculture, forestry, and fisheries sector is the largest absorber of labour in Indonesia. This sector is predominantly consumed directly by households. Meanwhile, panel model results for 2010–2024 show that increases in labour without accompanying improvements in quality have a negative effect, whereas investment and credit, as manifestations of capital, have positive effects on agricultural gross value added. Policy implications include prioritizing skills development and improving access to credit and investment to foster adoption of productivity-enhancing technologies, thereby enabling the agricultural sector to grow and exert greater influence on other sectors and on the national economy.</p> Anugerah Surya Pramana, Ditto Satrio Wicaksono, Huda M. Fajar Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/705 Mon, 22 Dec 2025 00:00:00 +0000 Impact of the Family Hope Program (PKH) on Household Expenditure in East Java, 2024 https://proceedings.stis.ac.id/icdsos/article/view/522 <p>Poverty remains a development challenge in Indonesia, particularly in East Java, which contributes substantially to the national poverty rate. Household expenditure, which reflects a household’s ability to meet basic needs and maintain living standards, is widely used as a proxy for welfare and poverty. Assessing how social assistance programs influence expenditure is therefore crucial to understand their impact in improving welfare. The Family Hope Program (Program Keluarga Harapan/PKH), a conditional cash transfer initiative, aims to improve household welfare and reduce poverty. This study describes the characteristics of PKH recipients and evaluates the program’s impact on household expenditure as an indicator of welfare in East Java. This analysis uses data from the March 2024 Susenas survey on households that meet the PKH criteria, with separate analyses by household poverty levels. The Propensity Score Matching method was used to address selection bias resulting from non-random recipient selection. The results show that PKH recipients generally face limitations in housing, basic access, and socio-economic conditions. Overall, PKH has not increased total expenditures, but there has been an increase in food expenditures among extremely-poor households. Policy adjustments are needed to better align with the needs and characteristics of each group.</p> Elvika Nanda Nurdiana, Anugerah Karta Monika Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/522 Mon, 22 Dec 2025 00:00:00 +0000 Clustering of Cities/Regencies in East Java Province Based on the Number of Health Workers Using K-Means Clustering Analysis https://proceedings.stis.ac.id/icdsos/article/view/710 <p>This study aims to classify cities/regencies in East Java Province based on the availability of health workers using the K-Means clustering analysis method. Secondary data was obtained from BPS East Java for the year 2024, covering 12 variables of health worker types. The analysis process included data standardization, determination of the optimal number of clusters using the Silhouette method, and the application of the K-Means algorithm. The analysis results show that the optimal number of clusters is two. Cluster 1 exclusively consists of the City of Surabaya, characterized by a high concentration of modern and technical health workers but lower in community-based health workers. Cluster 2 includes the other 37 cities/regencies, showing a greater dependence on basic health workers such as midwives and nutritionists, with limited access to specialist medical personnel. This study recommends strengthening community health workers in Surabaya and increasing the availability of professional medical personnel in other regions to reduce health service disparities in East Java.</p> Farras Ijlal Nashir, N R Safitri, D O C Salsabilla Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/710 Mon, 22 Dec 2025 00:00:00 +0000 Analysis and Prediction of Green GRDP in Indonesia with Ecosystem Service Value Approach https://proceedings.stis.ac.id/icdsos/article/view/627 <p>Gross Regional Domestic Product (GRDP) as a measure of economic output in each region has not reflected sustainability because it overlooks the environmental impacts caused. Green GRDP is an important innovation that integrates environmental aspects into sustainable development. Indonesia has committed through TAP MPR IX/2001, Indonesia Emas 2045, and the SDGs to implement sustainable development. This study analyzes and projects Indonesia’s Green GRDP using the Ecosystem Service Value (ESV) approach. Satellite imagery data from MODIS MCD12Q1 and the Cellular Automata–Artificial Neural Network (CA-ANN) method are employed to predict land cover changes, while time series models are applied to forecast GRDP. Variations in provincial ESV are strongly influenced by land cover composition. In 2001, Papua recorded the highest Green GRDP and ESV contribution, whereas by 2020 (projected to 2030), Jakarta leads in Green GRDP but exhibits the lowest ESV contribution percentage. Throughout the period 2001–2030, Papua consistently maintains the highest ESV proportion relative to its Green GRDP. The findings highlight the importance of incorporating ecosystem service values into regional and national economic planning to ensure that economic growth inherently reflects environmental sustainability. This effort should be supported by spatially differentiated development strategies aligned with each region’s ecological capacity.</p> Ibnu Gata, Ernawati Pasaribu Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/627 Mon, 22 Dec 2025 00:00:00 +0000 The Gath–Geva Algorithm for Clustering Spatial Inequality of Stunting in East Nusa Tenggara Province https://proceedings.stis.ac.id/icdsos/article/view/634 <p>Stunting remains a critical public health issue in Indonesia, particularly in East Nusa Tenggara (NTT), where prevalence rates are among the highest nationally. This study aims to classify districts and municipalities in East Nusa Tenggara Province based on socioeconomic and health-related indicators associated with stunting vulnerability. Using the Gath–Geva (Fuzzy K-Means Entropy) clustering algorithm, four key variables were analyzed, including poverty rate, access to proper housing, open unemployment rate, and number of health facilities. The results identified three distinct clusters with different regional characteristics. Cluster 1 consists of areas with low poverty and well-developed health infrastructure but relatively high unemployment rates. Cluster 2 represents the most vulnerable regions characterized by high poverty, poor housing access, and limited health facilities, while Cluster 3 comprises more stable areas with better housing, low unemployment, and adequate healthcare services. The silhouette coefficient value of 0.41 indicates that the three-cluster structure provides a reasonably good level of separation and internal consistency. These findings highlight that stunting vulnerability is strongly influenced by socioeconomic disparities and the distribution of health infrastructure. Therefore, intervention strategies should be tailored to the characteristics of each cluster, emphasizing integrated actions in high-risk regions and preventive measures in more stable areas to accelerate stunting reduction across East Nusa Tenggara Province.</p> Mitha Rabiyatul Nufus Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/634 Mon, 22 Dec 2025 00:00:00 +0000 Mapping Regional Economic Resilience of Indonesian Provinces Through PCA and K-Means Analysis to Support Regional Development Policy Optimization https://proceedings.stis.ac.id/icdsos/article/view/430 <p>In Indonesia’s post-decentralization era, assessing regional economic resilience is critical to promoting inclusive development. This study constructs a composite resilience index using seven indicators Human Development Index (HDI), Open Unemployment Rate, GRDP per capita, Gini Ratio, Economic Growth, Capital Expenditure, and Own-Source Revenue (OSR) across 34 provinces from 2020–2024. Principal Component Analysis (PCA) and K-Means clustering are applied to identify resilience patterns and classify provinces into high, moderate, and low resilience categories. The findings reveal significant interprovincial disparities. Provinces such as DKI Jakarta (HDI: 81.65), Bali (HDI: 76.54), and DI Yogyakarta (HDI: 80.22) consistently demonstrate high resilience, supported by low unemployment (e.g., Jakarta: 5.78%) and robust fiscal capacity (e.g., OSR share: Jakarta 58.29%). In contrast, Papua and West Papua exhibit lower resilience scores, characterized by HDI below 65, limited OSR below 15%, and economic growth volatility. Correlation analysis indicates a strong positive association between HDI and fiscal indicators (r = 0.82), while OLS regression confirms OSR and Capital Expenditure as significant predictors of resilience (p &lt; 0.05). Spatial mapping highlights geographic clustering of resilience, with Western Indonesia outperforming the Eastern region— underscoring persistent spatial inequalities. These findings reinforce the necessity for regionally differentiated policies. The study recommends enhancing fiscal autonomy, investing in human capital, and integrating Fintech-based financial inclusion, especially for lagging regions. The study recommends boosting fiscal autonomy, investing in human capital, and leveraging Fintech for inclusive growth. This framework supports evidence-based policies aligned with Indonesia’s SDG and post-2024 development goals.</p> Bella Cindy Thalita, Kevina Alal A'la Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/430 Mon, 22 Dec 2025 00:00:00 +0000 The Impact of the Job Creation Law and Other Variables on Indonesia's FDI from 2018 to 2024 https://proceedings.stis.ac.id/icdsos/article/view/555 <p>Although national Foreign Direct Investment (FDI) realization in Indonesia increased following the enactment of the Job Creation Law in 2021, regional FDI realization actually showed a decline in 17 of Indonesia's 34 provinces. Reviews from international organizations such as the World Bank and the World Trade Organization (WTO) suggest the need for analysis to examine the influence of investment-supporting variables on FDI in Indonesia, including the Job Creation Law policy. Therefore, the objective of this study is to analyze the variables influencing regional FDI realization in 34 provinces for the 2018-2024 period. The method used is panel data regression with the selected Random Effect Model (REM). The results show that the Household Consumption Expenditure (HCE) as a proxy for market size, non-oil and gas exports as a proxy for openness of market access, the mining sector's GRDP as a proxy for natural resource potential, and the Job Creation Law have a positive effect on regional FDI realization. These results align with eclectic dunning theory. Disparities in FDI realization were also found, regions outside Java Island that experienced high FDI realization were partly due to internal factors such as abundant natural resources, the presence of industrial areas, and product diversification.</p> Apriani Sofiana, Gama Putra Danu Sohibien Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/555 Mon, 22 Dec 2025 00:00:00 +0000 Panel Data Regression Modelling on The Analysis of The Influence Of Fiscal Decentralization to Poverty In Maluku In 2020-2024 https://proceedings.stis.ac.id/icdsos/article/view/443 <p>Maluku Province persistently records one of the highest poverty rates in Indonesia, despite sustained fiscal transfers from the central government. This study examines the relationship between fiscal decentralization and poverty reduction in Maluku from 2020 to 2024 through a panel data regression approach, enabling simultaneous analysis of spatial and temporal variations across districts. Poverty data were sourced from Badan Pusat Statistik (BPS) and fiscal variables from Direktorat Jenderal Perimbangan Keuangan (DJPK). The empirical results demonstrate that Regional Original Revenue (PAD), general allocation funds (DAU), and village funds (DD) exert statistically significant negative effects on poverty rates, with DD showing the strongest marginal impact. By focusing on a structurally disadvantaged province, this study contributes to the empirical literature by providing region-specific evidence on the effectiveness of fiscal decentralization mechanisms in reducing poverty. The findings underscore the importance of strengthening local fiscal capacity and optimizing the allocation of intergovernmental transfers to achieve more equitable and sustainable poverty alleviation.</p> Bayu Aji Bachtiar, Miftahus Sa'adah Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/443 Mon, 22 Dec 2025 00:00:00 +0000 Strategic Expansion of Digital Payments in Papua and West Papua: Individual Character Analysis Using Random Over and Under Sampling CART https://proceedings.stis.ac.id/icdsos/article/view/651 <p class="Abstract" style="margin-bottom: 6.0pt;">This study examines the characteristics and influencing factors of digital payment usage among individuals in Papua and West Papua. Understanding these characteristics enables stakeholders to design effective strategies for promotion, socialization, and education to support the expansion of digital payment adoption. The analysis uses data from the March 2023 National Socio-Economic Survey conducted by BPS, involving 52,081 respondents aged 17 years and older. A Classification and Regression Trees (CART) approach was applied with random oversampling and undersampling techniques to handle data imbalance. The results reveal that business fields, types of residential areas, and education levels are key determinants of digital payment usage. Three primary user profiles were identified: (1) individuals aged 17+ working outside the agricultural sector with at least a high school education; (2) individuals aged 17+ working outside agriculture, with junior high school education or below, residing in urban areas; and (3) individuals aged 17+ working in agriculture or unemployed, living in urban areas, and having completed high school or higher. These findings suggest that stakeholders should tailor promotional strategies and educational programs based on individual characteristics to effectively increase digital payment adoption in Papua and West Papua.</p> Reni Amelia, Akhmad Mun'im Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/651 Mon, 22 Dec 2025 00:00:00 +0000 Spillover Impacts of Informal Employment on Indonesia's Food Security https://proceedings.stis.ac.id/icdsos/article/view/486 <p>This study analyzes the impact of informal employment on household food security in Indonesia, focusing on regional disparities in provinces with high concentrations of informal workers. Using national socioeconomic survey data, logistic regression models initially assessed the associations between informal employment and food security outcomes. To strengthen causal inferences and mitigate selection bias, a comprehensive Propensity Score Matching (PSM) analysis was subsequently conducted. The findings from both approaches consistently link informal employment to adverse food security outcomes, including food availability concerns, limited access to nutritious food, and lower dietary diversity. Provinces with a high prevalence of informal workers consistently demonstrate poorer food security metrics, with the PSM analysis revealing more pronounced negative impacts in these regions, indicating significant spillover effects. Factors such as tertiary education, internet access, and health insurance are positively associated with improved food security, highlighting the critical role of human capital and resource access. These results underscore the importance of employment stability and regional labor market structures in shaping food security. Policies promoting formal employment and stronger social safety nets are critical for equitable food security across Indonesia.</p> Rizki Tri Anggara, Elsya Gumayanti Alfahma Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/486 Mon, 22 Dec 2025 00:00:00 +0000 Shadow Economy Estimation Across ASEAN Member States: MIMIC Model Approach https://proceedings.stis.ac.id/icdsos/article/view/578 <p>As a measure of official output, GDP remains incomplete, omitting the substantial economic transactions that occur within the shadow economy. The shadow economy reduces government tax revenues and weakens fiscal capacity. It also contributes to the underestimation of macroeconomic indicators. This study estimates the size of the shadow economy in ASEAN member states (AMS) using the Multiple Indicators and Multiple Causes (MIMIC) model. The model employs three causal variables and two indicator variables to capture the latent construct. Inflation, unemployment rate, and GDP per capita growth are identified as the main causal determinants. Economic growth and M2 growth are validated as significant indicators constructed for the shadow economy. The estimation covers the period from 2000 to 2023 and reveals an upward trend in the shadow economy across ten AMS, with an average size of 37.75 percent of GDP. These findings emphasize the need for policy actions that focus on maintaining price stability, promoting inclusive economic growth, and expanding formal employment opportunities to mitigate the expansion of the shadow economy.</p> Ahmad Nadifa Al Agung, Neli Agustina Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/578 Mon, 22 Dec 2025 00:00:00 +0000 Application of K-Medoids for Regional Classification Based on Quality, Access, and Governance of Education in Indonesia https://proceedings.stis.ac.id/icdsos/article/view/682 <p>Education is a fundamental foundation for individuals, yet substantial disparities persist across Indonesia, including both 3T (Disadvantaged, Frontier, and Outermost) and non3T regions. Addressing the limited research on systematic regional mapping based on education indicators, this study analyzes 514 regencies/cities at the senior secondary level using 13 indicators covering three latent dimensions identified through Factor Analysis: education quality, quality of the learning process, and governance and educational participation. Data were processed through outlier detection, standardization, dimensionality reduction using Principal Component Analysis, factor score extraction, and K-Medoids clustering in RStudio. The optimal solution with three clusters was validated with a Davies–Bouldin Index of 1.44, confirming its effectiveness in capturing regional variation. Results reveal distinct spatial patterns in educational characteristics, where some 3T regions perform comparably to non-3T areas, while certain remote regions face challenges across all dimensions. These findings provide a basis for targeted, cluster-based policy interventions to improve education quality, expand access, and strengthen governance, supporting equitable educational development nationwide. The study demonstrates the utility of combining dimensionality reduction and clustering for evidencebased policy planning and highlights the importance of addressing regional disparities in education.</p> Silfi Robiati, Abdul Hakim, Goldy Dharmawan, Chusnul Khotimah Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/682 Mon, 22 Dec 2025 00:00:00 +0000 The Influence of Village Funds, HDI, GRDP, and Unemployment on Poverty in Sulawesi 2017-2024 Using Panel Data Regression https://proceedings.stis.ac.id/icdsos/article/view/499 <p>Poverty in Indonesia remains a significant problem. Generally, rural poverty is higher than urban poverty. Therefore, the government has enacted a village fund policy through. Law Number 6 of 2024 to assist development efforts that can reduce rural poverty. However, despite a decline in national poverty, the poverty rate in Sulawesi has fluctuated. In addition to village funds, other variables influence poverty, such as human development index (HDI), gross regional domestic product (GRDP) per capita, and unemployment rate. The purpose of this study is to determine the effect of village funds, HDI, GRDP per capita, and unemployment on poverty rates in 70 districts in Sulawesi from 2017 to 2024. Data used are sourced from directorate general of fiscal balance (DJPK) for village funds and BPS for other variables. Panel data regression analysis is used to identify variables that influence poverty rates. Based on FEM, it is known that HDI and GRDP per capita have a negative and significant effect on poverty rates in Sulawesi Island. Village funds are insignificant in reducing poverty due to differences in development levels across regions. Therefore, equitable development and incre</p> Muhammad Reza Ramadhani, Agung Priyo Utomo Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/499 Mon, 22 Dec 2025 00:00:00 +0000 Monetary Policy Analysis in Indonesia: The Dynamic Relationship Between the BI Rate, Inflation, and the Rupiah Exchange Rate https://proceedings.stis.ac.id/icdsos/article/view/689 <p>Monetary policy is crucial for sustaining Indonesia's macroeconomic stability, especially through the benchmark interest rate (BI Rate), which serves as the primary tool of Bank Indonesia. This research revisits the transmission of monetary policy within a contemporary framework marked by post-pandemic recovery, global monetary tightening, and domestic policy shifts under the new administration in 2024. Utilizing monthly time series data from January 2010 to March 2025, this study applies the Vector Autoregression (VAR) and Vector Error Correction Model (VECM) methodologies to examine the dynamic relationships among inflation, the exchange rate (USD/IDR), and the BI Rate. The results affirm the presence of long-term relationships among the three variables, aligning with earlier research, while also revealing significant short-term dynamics that indicate an increased sensitivity of the exchange rate and inflation to interest rate changes during times of global uncertainty. By extending the analysis period to 2025 and considering the context of post-pandemic recovery and policy transitions, this study offers updated empirical insights into the changing effectiveness of Indonesia's monetary policy transmission mechanism. The findings provide important implications for policymakers in developing interest rate strategies aimed at achieving a balance between inflation control, exchange rate stability, and economic recovery.</p> Izzat Muhammad Akhsan, A S Maharani, I N Baity Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/689 Mon, 22 Dec 2025 00:00:00 +0000 Spatial Determinants of CO2 Emissions on Java Island: STIRPAT Framework and SAR Model https://proceedings.stis.ac.id/icdsos/article/view/525 <p>Java Island, Indonesia’s economic and population hub, faces intense environmental pressure from CO2 concentration, exhibiting strong spatial dependence across its 118 regencies and cities. This study examines the determinants of CO2 concentration and their spillover effects using an extended STIRPAT framework and a Spatial Autoregressive (SAR) model, applied to 2024 secondary data from BPS-Statistics Indonesia and Google Earth Engine (GEE). The SAR model outperforms OLS, with lower AIC (364.8979 vs. 489.0563) and BIC (387.0634 vs. 508.4551), confirming spatial effects. In SAR models, interpretation relies on decomposing estimated coefficients into direct effects (impacts within a region) and indirect or spillover effects (impacts transmitted to neighboring regions), allowing a more nuanced understanding of spatial influence. Population density and manufacturing sector GRDP increase emissions, while NDVI and HDI reduce them. Population density and manufacturing sector GRDP increase concentration, while NDVI and HDI reduce them. Notably, indirect (spillover) effects consistently surpass direct effects, driven by commuter flows in urban hubs like Jabodetabek and industrial pollution spillovers. These findings inform regional climate strategies, emphasizing cross-regency reforestation and emission controls to support Indonesia’s Enhanced Nationally Determined Contribution (ENDC) goals.</p> M. Hafidz Habibullah, Bunga Musva Cotva, Hafidh Rean Putra, Agustin Kurnia Sari, Sarni Maniar Berliana Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/525 Mon, 22 Dec 2025 00:00:00 +0000 Determination of Inflation Sistercity in Riau Province by Using K-Means Clustering Method https://proceedings.stis.ac.id/icdsos/article/view/711 <p>At the present time, the government is placing a significant emphasis on the regulation of inflationary pressures. The government's approach is multifaceted, ranging from the Minister of Home Affairs' direct leadership of coordination meetings on Monday mornings to providing fiscal incentives for regions that can control inflation and removing local government officials who cannot. However, note that BPS-Statistics Indonesia (BPS) does not calculate inflation in all Indonesian regencies and cities. The calculation of inflation only includes four out of the 12 regencies/cities in Riau Province. Therefore, we must establish an inflation sister city to allow regencies/cities not included in BPS's calculations to independently calculate the inflation rate. This study is pioneering in its analysis of Sister City Inflation in Riau Province. The k-means cluster analysis indicates that the city of Pekanbaru and the city of Tembilahan form distinct clusters, with no regencies or cities within their respective clusters that are associated with either of the two cities. Subsequently, the Dumai cluster forms a cluster with Bengkalis, Siak, and Pelalawan. Conversely, Kampar Regency formed a cluster with Kuantan Singingi, Indragiri Hilir, Indragiri Hulu, Rokan Hulu, Rokan Hilir, and the Meranti Islands. Consequently, regions that are not included in the inflation calculation may utilize the data from the cost of living survey in inflation regencies/cities within the same cluster to perform their calculations. Furthermore, if the local government requires the inflation rate as a reference for determining the regional minimum wage, it may employ it from the sister cities that have been established.</p> M Nata Kesuma, Pedro Rahmat Yufa, Fitri Hariyanti Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/711 Mon, 22 Dec 2025 00:00:00 +0000 Unlocking the Potential of Input-Output Tables for Spatial Analysis Using the Miyazawa Model: A Case Study of East Java Province https://proceedings.stis.ac.id/icdsos/article/view/530 <p>East Java Province holds a strategic role in the national economy, serving as the second-largest contributor to GDP after Jakarta and as a key trade hub to Eastern Indonesia. Yet regional disparities remain substantial, particularly reflected in the economic underdevelopment and weak logistics connectivity of Madura Island, which lies adjacent to the Gerbangkertosusila growth corridor. Addressing this gap requires a deeper understanding of sectoral and spatial linkages that shape Madura’s growth trajectory. This study applies the Miyazawa Input-Output Model for East Java Province, integrating 17 economic sectors and 38 regencies/municipalities to enable simultaneous sectoral and regional analysis. The simulations assess the effects of increasing household income in Madura, spillover from surrounding regions, and the combined role of strengthening the Transportation and Warehousing sector alongside Agriculture and Manufacturing. The findings show that the logistics sector in Madura, when considered independently, has limited impact; however, its significance rises when complemented by productive local sectors. Moreover, spillover from surrounding regions into Madura proves weaker than spillover directed outside Madura, underscoring the island’s fragile spatial connectivity. These results highlight the urgency of affirmative policies that strengthen productive sectors, enhance interregional linkages, and ensure Madura’s integration into East Java’s broader economic development.</p> Ahmadi Murjani, Budhi Fatanza Wiratama Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/530 Mon, 22 Dec 2025 00:00:00 +0000 Analyzing Infectious Disease in Multiple District in East Nusa Tenggara (ENT) using K-Means Clustering and Correspondence Analysis https://proceedings.stis.ac.id/icdsos/article/view/426 <p>Infectious diseases remain a major public health concern in Indonesia, particularly in East Nusa Tenggara (ENT), where tuberculosis (TBC), dengue haemorrhagic fever (DHF), and HIV/AIDS are obtaining high cases. These diseases are not only influenced by individual and environmental factors but also by spatial characteristics such as population distribution and regional infrastructure. Therefore, analyzing spatial factors is crucial to better understand and manage the spread of infectious diseases in ENT. This study uses data from 2023 to 2024 across 22 districts in ENT, focusing on the prevalence of TBC, DHF, and HIV/AIDS. K-means clustering is first applied to classify the districts into three groups based on area size and population, aiming to identify spatial patterns of disease severity. The clustering process yields a silhouette coefficient of 0.48, indicating moderately valid group separation. Subsequently, correspondence analysis is used to examine the relationship between the resulting clusters and the three diseases. The result reveals that Cluster A, which has the highest population density, shows a strong association with all three infectious diseases. These findings suggest that population density plays a significant role in the transmission of infectious diseases and should be considered in future health intervention strategies.</p> Fadlan Adhari, Gabriela Lintang Sulistyoreni, Jessica Jocelyn Jakson, Angelina Sekar Larissa, Yuli Sri Afrianti, Fadhil Hanif Sulaiman Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/426 Mon, 22 Dec 2025 00:00:00 +0000 Unlocking Renewable Energy Potential: The Nexus Between Financial Inclusion and Renewable Energy in Indonesia https://proceedings.stis.ac.id/icdsos/article/view/733 <p>Indonesia has pledged to achieve net-zero emissions in 2060. The energy transition can be achieved through financial inclusion. Based on the Environmental Kuznets Curve (EKC) theory, financial inclusion can be a catalyst in reducing environmental impacts if a country has reached the EKC turning point. This study investigates the impact of financial inclusion on the consumption of renewable energy in Indonesia. The data used in this study will be the percentage of renewable energy consumption and the financial inclusion index from the International Monetary Fund 2004 to 2021. Additionally, economic growth and the number of internet users are included as control variables. This study utilizes the Error Correction Model and finds that financial inclusion and internet usage have a negative significant effect on the percentage of renewable energy consumption in the long run. Based on these findings, it can be concluded that according to EKC theory, Indonesia is still in an early stage of development, where increasing financial inclusion and technology still have a negative impact on the environment. Policymakers are encouraged to develop targeted financial inclusion strategies to enhance environmental sustainability. Green finance and green investment are critical solutions to support Indonesia's energy transition.</p> Byun Jiye Primasrani, Okta Parina Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/733 Tue, 23 Dec 2025 00:00:00 +0000 Revealing Competitiveness and Key Drivers of Nickel (HS 75) Exports: Evidence from Seven Major Destinations, 2014–2023 https://proceedings.stis.ac.id/icdsos/article/view/640 <p>The downstream policy is implemented to encourage Indonesia’s processed nickel products. Processed nickel under Harmonized System (HS) 75 is a value-added product that has potential for the Indonesian economy. Globally, Indonesia’s exports of nickel HS 75 have increased significantly. This increase occurred after the implementation of the downstream policy. However, the increase in export volume did not occur uniformly across all trading partner countries, hence further analysis of the implemented downstream policy is necessary. This study aims to analyse the effect of the down streaming policy and macroeconomic variables such as the destination country’s GDP per capita, real prices, exchange rate, and the RCA index significantly affect the export volume of nickel (HS 75), while population and the downstream policy do not have significant effect. These findings indicate that the downstream policy has not yet effectively increased export volumes to trading partner countries.</p> Vendredy P. Lucasio Siahaan, Fitri Kartiasih Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/640 Mon, 22 Dec 2025 00:00:00 +0000 Analysis of the Determinants of Poverty Among the Productive-Age Population in Rokan Hilir Regency https://proceedings.stis.ac.id/icdsos/article/view/742 <p>Poverty is a condition in which individuals are unable to meet a decent standard of living, and it remains a major development issue in many regions. Efforts to reduce poverty are often difficult to achieve if inappropriate approaches are applied. Therefore, optimizing the potential of the population, particularly those in the productive age group, is a key strategy in poverty alleviation. This study aims to analyze the effects of education level, employment status, savings ownership, mobile phone ownership, and health complaints on the poverty level of the productive-age population in Rokan Hilir Regency in 2024. The method employed is binary logistic regression. The findings reveal that employment status and health complaints have a significant influence on poverty. Individuals who work in non-casual employment are less likely to experience poverty compared to casual workers. In addition, productive-age individuals without health complaints are also less likely to fall into poverty. Based on these findings, it is recommended that the local government increase the creation of formal job opportunities and strengthen public health services, particularly for the productive-age population. Such policies are expected to sustainably reduce poverty rates in Rokan Hilir Regency.</p> Rafqi Ardiansyah Surya Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/742 Mon, 22 Dec 2025 00:00:00 +0000 Unveiling Regional Disparities in Indonesia: Clustering Provinces by Development Indicators https://proceedings.stis.ac.id/icdsos/article/view/645 <p>Indonesia’s pursuit of sustainable development—integrating economic, social, and environmental dimensions—remains challenged by persistent regional disparities. In 2022, only four of seven national priority indicators were achieved, while 21 provinces failed to meet more than three targets. To capture these disparities more precisely, this study applies hierarchical and non-hierarchical clustering to classify 34 provinces based on seven development indicators. The comparative approach enhances robustness: hierarchical clustering reveals inter-provincial linkages, while non-hierarchical clustering improves internal consistency. Validation tests identify Ward’s method as optimal, yielding four distinct clusters. Cluster 1 includes four eastern provinces with multidimensional inequality—high stunting (31.43%), early marriage (10.37%), and low literacy (36.44%). Cluster 2 comprises 20 provinces with structural stagnation, marked by persistent stunting (24.80%) and reliance on primary sectors. Cluster 3 consists of seven industrial provinces with strong economic performance (manufacturing 33.59% of GDP) and improving social indicators. Cluster 4 includes three service-based provinces excelling in social outcomes—lowest stunting (13.07%) and highest literacy (78.46%)—but facing environmental challenges. These findings highlight the urgency of region-specific, evidence-based policy interventions to promote equitable and sustainable development.</p> Akbarrullah Yusman, Sarni Maniar Berliana Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/645 Mon, 22 Dec 2025 00:00:00 +0000 Development of Best Beta CAPM with Adjustment of Sharia Elements: A Case Study on Sharia Stocks in Indonesia https://proceedings.stis.ac.id/icdsos/article/view/663 <p>This paper introduces the Best Sharia-based Capital Asset Pricing Model (BSCAPM), a modification of the BCAPM model integrating Islamic finance principles. This study focuses on optimizing the beta parameter within the model by integrating Sharia-compliant factors such as zakat and purification, while excluding short-selling practices. Using data from the Jakarta Islamic Index (JII) from June 2020 to November 2024, the BSCAPM portfolio outperforms the BCAPM portfolio in terms of the Sharpe ratio. The findings indicate that the BSCAPM serves as a viable alternative framework for Islamic investment modelling, providing Muslim investors with a Sharia-compliant, optimal portfolio formation model. The research contributes to the underexplored domain of portfolio selection modelling in the Islamic sector, enriching references on asset pricing in Sharia portfolios, particularly in the Indonesian Sharia stock market.</p> Abdul Aziz, Supriyanto ., Abdurakhman . Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/663 Mon, 22 Dec 2025 00:00:00 +0000 Application of Small Area Estimation for Estimating Households Living in Adequate Housing at the Subdistrict Level in DKI https://proceedings.stis.ac.id/icdsos/article/view/497 <p>Access to adequate housing is a right of all Indonesian citizens guaranteed by the 1945 Constitution and is part of the Sustainable Development Goals (SDGs), specifically Goal 11. DKI Jakarta is the province with the second-lowest percentage of households living in adequate housing in Indonesia. Estimation at the subdistrict level is needed to support the policy on affordable vertical housing development initiated by the DKI Jakarta Department of Public Housing and Settlement Areas. Direct estimation at the subdistrict level based on the Susenas sampling design would result in inaccurate estimators. To address this issue, this study applies the Small Area Estimation (SAE) method using the Empirical Best Linear Unbiased Prediction (EBLUP) model and the Hierarchical Bayes (HB) Beta model, which leverage auxiliary variables to improve precision. The findings reveal that the HB Beta model provides the best estimates in measuring the percentage of households living in adequate housing in DKI Jakarta in 2024, producing accurate estimates across all subdistricts</p> Muhammad Akbar, Nofita Istiana Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/497 Mon, 22 Dec 2025 00:00:00 +0000 Spatial Analysis of Pneumonia in Toddlers on Sumatra Island Using Geographically Weighted Poisson Regression https://proceedings.stis.ac.id/icdsos/article/view/500 <p>Pneumonia remains a leading cause of mortality among toddlers (aged 1 to less than 5 years) in Indonesia, with notable spatial disparities across Sumatra Island. This study examines factors influencing pneumonia incidence in toddlers using a Geographically Weighted Poisson Regression (GWPR) model to capture local variations in the effects of community health centers, complete basic immunization coverage, exclusive breastfeeding rates, and low birth weight (LBW) prevalence. Analyzing 2022 cross-sectional data from 154 districts/cities on Sumatra, the global Poisson regression model confirmed all predictors as statistically significant at the 5% level. The GWPR model with a fixed Gaussian kernel outperformed the global model, revealing five regional clusters with distinct combinations of significant variables. The dominant cluster (140 locations) showed significant effects from all predictors, while smaller clusters (14 locations) highlighted localized patterns, such as reliance on immunization and breastfeeding in rural areas like Rejang Lebong. These findings underscore the need for tailored interventions to address regional disparities in toddler pneumonia.</p> Ruth Natasya Sepbrina Br Lumban Gaol, Maura Bintang Potenza, Nur Faqih Ihsan, Galang Ali Fazral Pratama, Sarni Maniar Berliana Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/500 Mon, 22 Dec 2025 00:00:00 +0000 Public Infrastructure Accessibility and Property Price Disparities in Jakarta: A Composite Index and Spatial Regression Approach https://proceedings.stis.ac.id/icdsos/article/view/701 <p>This study analyzes spatial inequality in public infrastructure accessibility and Property price in Jakarta Province using a Composite Index and spatial econometric modeling. A data-driven spatial approach is employed to examine the distribution of property price and accessibility to health, education, and transportation facilities. Accessibility is measured using the Entropy Weight Method, while spatial inequality patterns are assessed through Moran’s I and Local Indicators of Spatial Association (LISA). Results reveal significant clustering of high property price and accessibility in central Jakarta, contrasted with low values in peripheral areas, indicating pronounced spatial disparities. Furthermore, Geographically Weighted Regression (GWR) and the Spatial Lag Model (SLM) demonstrate that improved accessibility is positively associated with higher property price, although the magnitude of this effect varies spatially. These findings provide empirical evidence to support data-based spatial planning and infrastructure development policies aimed at reducing urban spatial disparities and promoting more equitable urban growth in Jakarta.</p> Khairul Anam, Ai Sulastri, Alvin Anugrah Putra, Annisa Purnama Sari, Friscka Fitri Aditama Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/701 Mon, 22 Dec 2025 00:00:00 +0000 Forecasting Indonesian Monthly Rice Prices at Milling Level Using Google Trends and Official Statistics Data https://proceedings.stis.ac.id/icdsos/article/view/521 <p>Hunger is a very complex social issue to address. Alleviating hunger is closely related to achieving food security, which is a goal in realizing the second Sustainable Development Goals (SDGs), zero hunger. The most frequently consumed food commodity by the Indonesian population is rice, which has fluctuating prices in the market. Therefore, price forecasting is necessary so that the government can take preventive measures against rice price increases at certain times. Research on rice price forecasting using big data from Google Trends is still very rare in Indonesia, even though Google Trends has great potential to reflect the public's search popularity for certain keywords. Therefore, this study aims to forecast the monthly medium rice price in Indonesia at the milling level using exogenous variables of dried milled grain prices and the popularity index of related keywords on Google Trends. The forecasting is conducted using Seasonal Autoregressive Integrated Moving Average (SARIMA), SARIMA with Exogenous Variables (SARIMAX), and Extreme Gradient Boosting (XGBoost) models. The SARIMAX model has the best performance in forecasting rice prices, with a Root Mean Squared Error (RMSE) of 941.6933, Mean Absolute Error (MAE) of 817.9021, and Mean Absolute Percentage Error (MAPE) of 0.0620.</p> I Bagus Putu Swardanasuta, Wahyuni Andriana Sofa, Siti Muchlisoh, Arie Wahyu Wijayanto Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/521 Mon, 22 Dec 2025 00:00:00 +0000 Identifying Stratifications of Cancer Patient Visits: Approach of Clustering Using PCA of Mixed Data https://proceedings.stis.ac.id/icdsos/article/view/622 <p>Cancer is a significant contributor to the burden of non-communicable diseases and one of the diseases with the highest costs in Indonesia’s health insurance system. Understanding key factors influencing cancer patient visits and risk groups under national health insurance supports evidence-based and sustainable cancer care financing. The aim is to identify key factors influencing inpatient visits among cancer survivors and map risk patterns to improve cancer health service policies, using a 1% sample of claim data from the national health insurance (JKN) program. The PCA of mixed data analysis revealed that cost-severity level and contributionward classes shared influence of the visits. After PCA, K-Means was applied and 4 clusters were obtained. K-Means can give better understanding of the patient visits, especially the need for distinct strategies to be implemented for the groups so that the burden of cancer disease financing under the national health insurance program can be reduced.</p> Kristiana Yunitaningtyas, Herianti Herianti Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics https://proceedings.stis.ac.id/icdsos/article/view/622 Mon, 22 Dec 2025 00:00:00 +0000