https://proceedings.stis.ac.id/icdsos/issue/feedProceedings of The International Conference on Data Science and Official Statistics2025-12-22T10:12:22+00:00Open Journal Systemshttps://proceedings.stis.ac.id/icdsos/article/view/486Spillover Impacts of Informal Employment on Indonesia's Food Security2025-09-26T03:31:27+00:00Rizki Tri Anggararizki.anggara@bps.go.idElsya Gumayanti Alfahmaelsya.gumayanti@bps.go.id<p>This study analyzes the impact of informal employment on household food security in Indonesia, focusing on regional disparities in provinces with high concentrations of informal workers. Using national socioeconomic survey data, logistic regression models initially assessed the associations between informal employment and food security outcomes. To strengthen causal inferences and mitigate selection bias, a comprehensive Propensity Score Matching (PSM) analysis was subsequently conducted. The findings from both approaches consistently link informal employment to adverse food security outcomes, including food availability concerns, limited access to nutritious food, and lower dietary diversity. Provinces with a high prevalence of informal workers consistently demonstrate poorer food security metrics, with the PSM analysis revealing more pronounced negative impacts in these regions, indicating significant spillover effects. Factors such as tertiary education, internet access, and health insurance are positively associated with improved food security, highlighting the critical role of human capital and resource access. These results underscore the importance of employment stability and regional labor market structures in shaping food security. Policies promoting formal employment and stronger social safety nets are critical for equitable food security across Indonesia.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/578Shadow Economy Estimation Across ASEAN Member States: MIMIC Model Approach2025-09-12T16:20:49+00:00Ahmad Nadifa Al Agung212111861@stis.ac.idNeli Agustinaneli@stis.ac.id<p>As a measure of official output, GDP remains incomplete, omitting the substantial economic transactions that occur within the shadow economy. The shadow economy reduces government tax revenues and weakens fiscal capacity. It also contributes to the underestimation of macroeconomic indicators. This study estimates the size of the shadow economy in ASEAN member states (AMS) using the Multiple Indicators and Multiple Causes (MIMIC) model. The model employs three causal variables and two indicator variables to capture the latent construct. Inflation, unemployment rate, and GDP per capita growth are identified as the main causal determinants. Economic growth and M2 growth are validated as significant indicators constructed for the shadow economy. The estimation covers the period from 2000 to 2023 and reveals an upward trend in the shadow economy across ten AMS, with an average size of 37.75 percent of GDP. These findings emphasize the need for policy actions that focus on maintaining price stability, promoting inclusive economic growth, and expanding formal employment opportunities to mitigate the expansion of the shadow economy.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/682Application of K-Medoids for Regional Classification Based on Quality, Access, and Governance of Education in Indonesia2025-09-12T03:47:41+00:00Silfi Robiatisilfirobiati43@gmail.comAbdul Hakimabdulhakim@kemendikdasmen.go.idGoldy Dharmawangoldy.farizdharmawan01@kemendikdasmen.go.idChusnul Khotimahchusnul.khotimah@kemdikbud.go.id<p>Education is a fundamental foundation for individuals, yet substantial disparities persist across Indonesia, including both 3T (Disadvantaged, Frontier, and Outermost) and non3T regions. Addressing the limited research on systematic regional mapping based on education indicators, this study analyzes 514 regencies/cities at the senior secondary level using 13 indicators covering three latent dimensions identified through Factor Analysis: education quality, quality of the learning process, and governance and educational participation. Data were processed through outlier detection, standardization, dimensionality reduction using Principal Component Analysis, factor score extraction, and K-Medoids clustering in RStudio. The optimal solution with three clusters was validated with a Davies–Bouldin Index of 1.44, confirming its effectiveness in capturing regional variation. Results reveal distinct spatial patterns in educational characteristics, where some 3T regions perform comparably to non-3T areas, while certain remote regions face challenges across all dimensions. These findings provide a basis for targeted, cluster-based policy interventions to improve education quality, expand access, and strengthen governance, supporting equitable educational development nationwide. The study demonstrates the utility of combining dimensionality reduction and clustering for evidencebased policy planning and highlights the importance of addressing regional disparities in education.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/499The Influence of Village Funds, HDI, GRDP, and Unemployment on Poverty in Sulawesi 2017-2024 Using Panel Data Regression2025-09-22T05:46:14+00:00Muhammad Reza Ramadhanimuhammadrezaramadhani06@gmail.comAgung Priyo Utomoagung@stis.ac.id<p>Poverty in Indonesia remains a significant problem. Generally, rural poverty is higher than urban poverty. Therefore, the government has enacted a village fund policy through. Law Number 6 of 2024 to assist development efforts that can reduce rural poverty. However, despite a decline in national poverty, the poverty rate in Sulawesi has fluctuated. In addition to village funds, other variables influence poverty, such as human development index (HDI), gross regional domestic product (GRDP) per capita, and unemployment rate. The purpose of this study is to determine the effect of village funds, HDI, GRDP per capita, and unemployment on poverty rates in 70 districts in Sulawesi from 2017 to 2024. Data used are sourced from directorate general of fiscal balance (DJPK) for village funds and BPS for other variables. Panel data regression analysis is used to identify variables that influence poverty rates. Based on FEM, it is known that HDI and GRDP per capita have a negative and significant effect on poverty rates in Sulawesi Island. Village funds are insignificant in reducing poverty due to differences in development levels across regions. Therefore, equitable development and incre</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/689Monetary Policy Analysis in Indonesia: The Dynamic Relationship Between the BI Rate, Inflation, and the Rupiah Exchange Rate2025-10-04T07:31:26+00:00Izzat Muhammad Akhsanakhsanizma@gmail.comA S Maharaniakhsanizma@gmail.comI N Baityakhsanizma@gmail.com<p>Monetary policy is crucial for sustaining Indonesia's macroeconomic stability, especially through the benchmark interest rate (BI Rate), which serves as the primary tool of Bank Indonesia. This research revisits the transmission of monetary policy within a contemporary framework marked by post-pandemic recovery, global monetary tightening, and domestic policy shifts under the new administration in 2024. Utilizing monthly time series data from January 2010 to March 2025, this study applies the Vector Autoregression (VAR) and Vector Error Correction Model (VECM) methodologies to examine the dynamic relationships among inflation, the exchange rate (USD/IDR), and the BI Rate. The results affirm the presence of long-term relationships among the three variables, aligning with earlier research, while also revealing significant short-term dynamics that indicate an increased sensitivity of the exchange rate and inflation to interest rate changes during times of global uncertainty. By extending the analysis period to 2025 and considering the context of post-pandemic recovery and policy transitions, this study offers updated empirical insights into the changing effectiveness of Indonesia's monetary policy transmission mechanism. The findings provide important implications for policymakers in developing interest rate strategies aimed at achieving a balance between inflation control, exchange rate stability, and economic recovery.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/525Spatial Determinants of CO2 Emissions on Java Island: STIRPAT Framework and SAR Model2025-09-12T06:50:53+00:00M. Hafidz Habibullah212212710@stis.ac.idBunga Musva Cotva212212541@stis.ac.idHafidh Rean Putra212212631@stis.ac.idAgustin Kurnia Sari212212457@stis.ac.idSarni Maniar Berlianasarni@stis.ac.id<p>Java Island, Indonesia’s economic and population hub, faces intense environmental pressure from CO2 concentration, exhibiting strong spatial dependence across its 118 regencies and cities. This study examines the determinants of CO2 concentration and their spillover effects using an extended STIRPAT framework and a Spatial Autoregressive (SAR) model, applied to 2024 secondary data from BPS-Statistics Indonesia and Google Earth Engine (GEE). The SAR model outperforms OLS, with lower AIC (364.8979 vs. 489.0563) and BIC (387.0634 vs. 508.4551), confirming spatial effects. In SAR models, interpretation relies on decomposing estimated coefficients into direct effects (impacts within a region) and indirect or spillover effects (impacts transmitted to neighboring regions), allowing a more nuanced understanding of spatial influence. Population density and manufacturing sector GRDP increase emissions, while NDVI and HDI reduce them. Population density and manufacturing sector GRDP increase concentration, while NDVI and HDI reduce them. Notably, indirect (spillover) effects consistently surpass direct effects, driven by commuter flows in urban hubs like Jabodetabek and industrial pollution spillovers. These findings inform regional climate strategies, emphasizing cross-regency reforestation and emission controls to support Indonesia’s Enhanced Nationally Determined Contribution (ENDC) goals.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/711Determination of Inflation Sistercity in Riau Province by Using K-Means Clustering Method2025-09-16T13:06:38+00:00M Nata Kesumanatakesuma05@gmail.comPedro Rahmat Yufapedro.rahmat04@gmail.comFitri Hariyantiv3hyanti@gmail.com<p>At the present time, the government is placing a significant emphasis on the regulation of inflationary pressures. The government's approach is multifaceted, ranging from the Minister of Home Affairs' direct leadership of coordination meetings on Monday mornings to providing fiscal incentives for regions that can control inflation and removing local government officials who cannot. However, note that BPS-Statistics Indonesia (BPS) does not calculate inflation in all Indonesian regencies and cities. The calculation of inflation only includes four out of the 12 regencies/cities in Riau Province. Therefore, we must establish an inflation sister city to allow regencies/cities not included in BPS's calculations to independently calculate the inflation rate. This study is pioneering in its analysis of Sister City Inflation in Riau Province. The k-means cluster analysis indicates that the city of Pekanbaru and the city of Tembilahan form distinct clusters, with no regencies or cities within their respective clusters that are associated with either of the two cities. Subsequently, the Dumai cluster forms a cluster with Bengkalis, Siak, and Pelalawan. Conversely, Kampar Regency formed a cluster with Kuantan Singingi, Indragiri Hilir, Indragiri Hulu, Rokan Hulu, Rokan Hilir, and the Meranti Islands. Consequently, regions that are not included in the inflation calculation may utilize the data from the cost of living survey in inflation regencies/cities within the same cluster to perform their calculations. Furthermore, if the local government requires the inflation rate as a reference for determining the regional minimum wage, it may employ it from the sister cities that have been established.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/530Unlocking the Potential of Input-Output Tables for Spatial Analysis Using the Miyazawa Model: A Case Study of East Java Province2025-09-12T06:59:46+00:00Ahmadi Murjaniamurjani@bps.go.idBudhi Fatanza Wiratamabudhi.wiratama@bps.go.id<p>East Java Province holds a strategic role in the national economy, serving as the second-largest contributor to GDP after Jakarta and as a key trade hub to Eastern Indonesia. Yet regional disparities remain substantial, particularly reflected in the economic underdevelopment and weak logistics connectivity of Madura Island, which lies adjacent to the Gerbangkertosusila growth corridor. Addressing this gap requires a deeper understanding of sectoral and spatial linkages that shape Madura’s growth trajectory. This study applies the Miyazawa Input-Output Model for East Java Province, integrating 17 economic sectors and 38 regencies/municipalities to enable simultaneous sectoral and regional analysis. The simulations assess the effects of increasing household income in Madura, spillover from surrounding regions, and the combined role of strengthening the Transportation and Warehousing sector alongside Agriculture and Manufacturing. The findings show that the logistics sector in Madura, when considered independently, has limited impact; however, its significance rises when complemented by productive local sectors. Moreover, spillover from surrounding regions into Madura proves weaker than spillover directed outside Madura, underscoring the island’s fragile spatial connectivity. These results highlight the urgency of affirmative policies that strengthen productive sectors, enhance interregional linkages, and ensure Madura’s integration into East Java’s broader economic development.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/426Analyzing Infectious Disease in Multiple District in East Nusa Tenggara (ENT) using K-Means Clustering and Correspondence Analysis2025-10-04T07:18:29+00:00Fadlan Adharifadlanadhari20@gmail.comGabriela Lintang Sulistyorenigab.lintang@gmail.comJessica Jocelyn Jaksonjessicajocelyn2113@gmail.comAngelina Sekar Larissaangelinasekarr12@gmail.comYuli Sri Afriantiyuli.afrianti@itb.ac.idFadhil Hanif Sulaiman20924011@mahasiswa.itb.ac.id<p>Infectious diseases remain a major public health concern in Indonesia, particularly in East Nusa Tenggara (ENT), where tuberculosis (TBC), dengue haemorrhagic fever (DHF), and HIV/AIDS are obtaining high cases. These diseases are not only influenced by individual and environmental factors but also by spatial characteristics such as population distribution and regional infrastructure. Therefore, analyzing spatial factors is crucial to better understand and manage the spread of infectious diseases in ENT. This study uses data from 2023 to 2024 across 22 districts in ENT, focusing on the prevalence of TBC, DHF, and HIV/AIDS. K-means clustering is first applied to classify the districts into three groups based on area size and population, aiming to identify spatial patterns of disease severity. The clustering process yields a silhouette coefficient of 0.48, indicating moderately valid group separation. Subsequently, correspondence analysis is used to examine the relationship between the resulting clusters and the three diseases. The result reveals that Cluster A, which has the highest population density, shows a strong association with all three infectious diseases. These findings suggest that population density plays a significant role in the transmission of infectious diseases and should be considered in future health intervention strategies.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/733Unlocking Renewable Energy Potential: The Nexus Between Financial Inclusion and Renewable Energy in Indonesia2025-09-11T08:17:33+00:00Byun Jiye Primasranibyunjiye2605@gmail.comOkta Parinaoktaparina3@gmail.com<p>Indonesia has pledged to achieve net-zero emissions in 2060. The energy transition can be achieved through financial inclusion. Based on the Environmental Kuznets Curve (EKC) theory, financial inclusion can be a catalyst in reducing environmental impacts if a country has reached the EKC turning point. This study investigates the impact of financial inclusion on the consumption of renewable energy in Indonesia. The data used in this study will be the percentage of renewable energy consumption and the financial inclusion index from the International Monetary Fund 2004 to 2021. Additionally, economic growth and the number of internet users are included as control variables. This study utilizes the Error Correction Model and finds that financial inclusion and internet usage have a negative significant effect on the percentage of renewable energy consumption in the long run. Based on these findings, it can be concluded that according to EKC theory, Indonesia is still in an early stage of development, where increasing financial inclusion and technology still have a negative impact on the environment. Policymakers are encouraged to develop targeted financial inclusion strategies to enhance environmental sustainability. Green finance and green investment are critical solutions to support Indonesia's energy transition.</p>2025-12-23T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/640Revealing Competitiveness and Key Drivers of Nickel (HS 75) Exports: Evidence from Seven Major Destinations, 2014–20232025-09-12T01:46:05+00:00Vendredy P. Lucasio Siahaan112212906@stis.ac.idFitri Kartiasihfkartiasih@stis.ac.id<p>The downstream policy is implemented to encourage Indonesia’s processed nickel products. Processed nickel under Harmonized System (HS) 75 is a value-added product that has potential for the Indonesian economy. Globally, Indonesia’s exports of nickel HS 75 have increased significantly. This increase occurred after the implementation of the downstream policy. However, the increase in export volume did not occur uniformly across all trading partner countries, hence further analysis of the implemented downstream policy is necessary. This study aims to analyse the effect of the down streaming policy and macroeconomic variables such as the destination country’s GDP per capita, real prices, exchange rate, and the RCA index significantly affect the export volume of nickel (HS 75), while population and the downstream policy do not have significant effect. These findings indicate that the downstream policy has not yet effectively increased export volumes to trading partner countries.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/742Analysis of the Determinants of Poverty Among the Productive-Age Population in Rokan Hilir Regency2025-09-11T08:27:50+00:00Rafqi Ardiansyah Suryarafqiardiansyahsurya@gmail.com<p>Poverty is a condition in which individuals are unable to meet a decent standard of living, and it remains a major development issue in many regions. Efforts to reduce poverty are often difficult to achieve if inappropriate approaches are applied. Therefore, optimizing the potential of the population, particularly those in the productive age group, is a key strategy in poverty alleviation. This study aims to analyze the effects of education level, employment status, savings ownership, mobile phone ownership, and health complaints on the poverty level of the productive-age population in Rokan Hilir Regency in 2024. The method employed is binary logistic regression. The findings reveal that employment status and health complaints have a significant influence on poverty. Individuals who work in non-casual employment are less likely to experience poverty compared to casual workers. In addition, productive-age individuals without health complaints are also less likely to fall into poverty. Based on these findings, it is recommended that the local government increase the creation of formal job opportunities and strengthen public health services, particularly for the productive-age population. Such policies are expected to sustainably reduce poverty rates in Rokan Hilir Regency.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/645Unveiling Regional Disparities in Indonesia: Clustering Provinces by Development Indicators2025-09-12T02:12:23+00:00Akbarrullah Yusman112212471@stis.ac.idSarni Maniar Berlianasarni@stis.ac.id<p>Indonesia’s pursuit of sustainable development—integrating economic, social, and environmental dimensions—remains challenged by persistent regional disparities. In 2022, only four of seven national priority indicators were achieved, while 21 provinces failed to meet more than three targets. To capture these disparities more precisely, this study applies hierarchical and non-hierarchical clustering to classify 34 provinces based on seven development indicators. The comparative approach enhances robustness: hierarchical clustering reveals inter-provincial linkages, while non-hierarchical clustering improves internal consistency. Validation tests identify Ward’s method as optimal, yielding four distinct clusters. Cluster 1 includes four eastern provinces with multidimensional inequality—high stunting (31.43%), early marriage (10.37%), and low literacy (36.44%). Cluster 2 comprises 20 provinces with structural stagnation, marked by persistent stunting (24.80%) and reliance on primary sectors. Cluster 3 consists of seven industrial provinces with strong economic performance (manufacturing 33.59% of GDP) and improving social indicators. Cluster 4 includes three service-based provinces excelling in social outcomes—lowest stunting (13.07%) and highest literacy (78.46%)—but facing environmental challenges. These findings highlight the urgency of region-specific, evidence-based policy interventions to promote equitable and sustainable development.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/663Development of Best Beta CAPM with Adjustment of Sharia Elements: A Case Study on Sharia Stocks in Indonesia2025-09-19T03:26:24+00:00Abdul Azizabdul.math@unsoed.ac.idSupriyanto .abdul.math@unsoed.ac.idAbdurakhman .abdul.math@unsoed.ac.id<p>This paper introduces the Best Sharia-based Capital Asset Pricing Model (BSCAPM), a modification of the BCAPM model integrating Islamic finance principles. This study focuses on optimizing the beta parameter within the model by integrating Sharia-compliant factors such as zakat and purification, while excluding short-selling practices. Using data from the Jakarta Islamic Index (JII) from June 2020 to November 2024, the BSCAPM portfolio outperforms the BCAPM portfolio in terms of the Sharpe ratio. The findings indicate that the BSCAPM serves as a viable alternative framework for Islamic investment modelling, providing Muslim investors with a Sharia-compliant, optimal portfolio formation model. The research contributes to the underexplored domain of portfolio selection modelling in the Islamic sector, enriching references on asset pricing in Sharia portfolios, particularly in the Indonesian Sharia stock market.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/497Application of Small Area Estimation for Estimating Households Living in Adequate Housing at the Subdistrict Level in DKI 2025-09-16T07:53:38+00:00Muhammad Akbar212112202@stis.ac.idNofita Istiananofita@stis.ac.id<p>Access to adequate housing is a right of all Indonesian citizens guaranteed by the 1945 Constitution and is part of the Sustainable Development Goals (SDGs), specifically Goal 11. DKI Jakarta is the province with the second-lowest percentage of households living in adequate housing in Indonesia. Estimation at the subdistrict level is needed to support the policy on affordable vertical housing development initiated by the DKI Jakarta Department of Public Housing and Settlement Areas. Direct estimation at the subdistrict level based on the Susenas sampling design would result in inaccurate estimators. To address this issue, this study applies the Small Area Estimation (SAE) method using the Empirical Best Linear Unbiased Prediction (EBLUP) model and the Hierarchical Bayes (HB) Beta model, which leverage auxiliary variables to improve precision. The findings reveal that the HB Beta model provides the best estimates in measuring the percentage of households living in adequate housing in DKI Jakarta in 2024, producing accurate estimates across all subdistricts</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/500Spatial Analysis of Pneumonia in Toddlers on Sumatra Island Using Geographically Weighted Poisson Regression2025-09-22T05:44:08+00:00Ruth Natasya Sepbrina Br Lumban Gaol212212864@stis.ac.idMaura Bintang Potenza212212723@stis.ac.idNur Faqih Ihsan212212806@stis.ac.idGalang Ali Fazral Pratama212212622@stis.ac.idSarni Maniar Berlianasarni@stis.ac.id<p>Pneumonia remains a leading cause of mortality among toddlers (aged 1 to less than 5 years) in Indonesia, with notable spatial disparities across Sumatra Island. This study examines factors influencing pneumonia incidence in toddlers using a Geographically Weighted Poisson Regression (GWPR) model to capture local variations in the effects of community health centers, complete basic immunization coverage, exclusive breastfeeding rates, and low birth weight (LBW) prevalence. Analyzing 2022 cross-sectional data from 154 districts/cities on Sumatra, the global Poisson regression model confirmed all predictors as statistically significant at the 5% level. The GWPR model with a fixed Gaussian kernel outperformed the global model, revealing five regional clusters with distinct combinations of significant variables. The dominant cluster (140 locations) showed significant effects from all predictors, while smaller clusters (14 locations) highlighted localized patterns, such as reliance on immunization and breastfeeding in rural areas like Rejang Lebong. These findings underscore the need for tailored interventions to address regional disparities in toddler pneumonia.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/701Public Infrastructure Accessibility and Property Price Disparities in Jakarta: A Composite Index and Spatial Regression Approach2025-10-02T04:39:40+00:00Khairul Anamkhairulanam@upi.eduAi Sulastriai.sulastrii@upi.eduAlvin Anugrah Putraalvinanugrahputra18@upi.eduAnnisa Purnama Sariannisapurnamasari88@upi.eduFriscka Fitri Aditamafrisckaa22@upi.edu<p>This study analyzes spatial inequality in public infrastructure accessibility and Property price in Jakarta Province using a Composite Index and spatial econometric modeling. A data-driven spatial approach is employed to examine the distribution of property price and accessibility to health, education, and transportation facilities. Accessibility is measured using the Entropy Weight Method, while spatial inequality patterns are assessed through Moran’s I and Local Indicators of Spatial Association (LISA). Results reveal significant clustering of high property price and accessibility in central Jakarta, contrasted with low values in peripheral areas, indicating pronounced spatial disparities. Furthermore, Geographically Weighted Regression (GWR) and the Spatial Lag Model (SLM) demonstrate that improved accessibility is positively associated with higher property price, although the magnitude of this effect varies spatially. These findings provide empirical evidence to support data-based spatial planning and infrastructure development policies aimed at reducing urban spatial disparities and promoting more equitable urban growth in Jakarta.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/521Forecasting Indonesian Monthly Rice Prices at Milling Level Using Google Trends and Official Statistics Data2025-09-12T06:30:40+00:00I Bagus Putu Swardanasuta222112096@stis.ac.idWahyuni Andriana Sofaanasofa@stis.ac.idSiti Muchlisohsitim@stis.ac.idArie Wahyu Wijayantoariewahyu@stis.ac.id<p>Hunger is a very complex social issue to address. Alleviating hunger is closely related to achieving food security, which is a goal in realizing the second Sustainable Development Goals (SDGs), zero hunger. The most frequently consumed food commodity by the Indonesian population is rice, which has fluctuating prices in the market. Therefore, price forecasting is necessary so that the government can take preventive measures against rice price increases at certain times. Research on rice price forecasting using big data from Google Trends is still very rare in Indonesia, even though Google Trends has great potential to reflect the public's search popularity for certain keywords. Therefore, this study aims to forecast the monthly medium rice price in Indonesia at the milling level using exogenous variables of dried milled grain prices and the popularity index of related keywords on Google Trends. The forecasting is conducted using Seasonal Autoregressive Integrated Moving Average (SARIMA), SARIMA with Exogenous Variables (SARIMAX), and Extreme Gradient Boosting (XGBoost) models. The SARIMAX model has the best performance in forecasting rice prices, with a Root Mean Squared Error (RMSE) of 941.6933, Mean Absolute Error (MAE) of 817.9021, and Mean Absolute Percentage Error (MAPE) of 0.0620.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/622Identifying Stratifications of Cancer Patient Visits: Approach of Clustering Using PCA of Mixed Data2025-09-22T14:27:36+00:00Kristiana Yunitaningtyaskristianatyas@gmail.comHerianti Heriantiheriantisamsu@yahoo.com<p>Cancer is a significant contributor to the burden of non-communicable diseases and one of the diseases with the highest costs in Indonesia’s health insurance system. Understanding key factors influencing cancer patient visits and risk groups under national health insurance supports evidence-based and sustainable cancer care financing. The aim is to identify key factors influencing inpatient visits among cancer survivors and map risk patterns to improve cancer health service policies, using a 1% sample of claim data from the national health insurance (JKN) program. The PCA of mixed data analysis revealed that cost-severity level and contributionward classes shared influence of the visits. After PCA, K-Means was applied and 4 clusters were obtained. K-Means can give better understanding of the patient visits, especially the need for distinct strategies to be implemented for the groups so that the burden of cancer disease financing under the national health insurance program can be reduced.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/631Intersectoral Linkages and Spillover Effects in South Sumatra’s Economy: Evidence from the 2016 Interregional Input–Output Table and 2024 Input–Output Table2025-09-11T08:06:41+00:00Marpalenimarpaleni@gmail.comMardianamardiana@bps.go.idAnggi Dwi Puspitaanggidp17@gmail.comIndhira Putri Ramaindhira.rama@gmail.com<p>This study examines South Sumatra’s economic structure using interregional input– output analysis to identify key sectors and quantify spillover effects. A dual-dataset approach employs the 2016 IRIO table for interprovincial trade dynamics and the 2024 IO table for current sectoral analysis. Results indicate a domestically oriented economy, with 88.45% of supply met by internal production. Manufacturing and construction emerge as central hubs with strong intersectoral linkages, supported by agriculture and mining as upstream suppliers. Interregional trade is concentrated with nearby Sumatran provinces and Java’s industrial centers. Spillover effects benefit Jambi, Bengkulu, and Banten, while feedback effects show dependency on Java. Output multipliers highlight electricity and gas as key growth drivers, whereas agriculture and real estate contribute most to local income. These patterns reveal a structural divergence between growth and inclusivity. To address this, the study recommends a dual-track strategy: scale up manufacturing and energy to drive aggregate output, while modernizing agriculture and highvalue services to support income distribution. Strengthening interprovincial corridors and deepening local supply chains can further enhance resilience and expand the province’s role in national development.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/727Dynamic Linkages and Monetary Policy Transmission in the Cryptocurrency Market: A Vector Autoregressive Study of Bitcoin, Ethereum, and The Fed's Interest Rate2025-10-04T07:40:31+00:00Muhammad Zaki Azharimuhzaki46@gmail.comM A A Ghiffarimuhzaki46@gmail.comA Ghiffarimuhzaki46@gmail.com<p>The cryptocurrency market, characterized by high volatility, has evolved into a significant financial asset class, attracting both retail and institutional investors. Understanding its interconnectedness with macroeconomic factors is crucial for risk management and financial stability. This study empirically analyzes the dynamic relationships between two primary crypto assets, Bitcoin (BTC) and Ethereum (ETH), and the monetary policy shifts of the U.S. Federal Reserve (The Fed). Using a Vector Autoregression (VAR) model on daily time-series data from January 1, 2022, to June 16, 2025, this research investigates the short-term dynamics, Granger causality, and shock transmissions within this system. The findings reveal a significant one-way causal relationship from The Fed's interest rate changes to both Bitcoin and Ethereum returns, challenging the weak-form Efficient Market Hypothesis. Furthermore, Impulse Response Function (IRF) and Forecast Error Variance Decomposition (FEVD) analyses provide robust evidence of Bitcoin's market leadership, with shocks in Bitcoin explaining nearly 70% of the variance in Ethereum's movements. These results highlight a clear hierarchical structure: The Fed influences broad market sentiment, while Bitcoin leads internal market dynamics, offering critical insights for investors and policymakers navigating the digital asset ecosystem.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/428Parameter Estimation in Hierarchical Models: A Comparison of Bayesian and SGD-Adam Approaches on Biomass Data of Lutjanidae2025-09-22T06:06:05+00:00Dariani Matualagebiakdariani@apps.ipb.ac.idK Sadikbiakdariani@apps.ipb.ac.idA Kurniabiakdariani@apps.ipb.ac.idH F Monimbiakdariani@apps.ipb.ac.idF Pakidingbiakdariani@apps.ipb.ac.id<p>Hierarchical statistical models are widely used to analyse data with nested structures or repeated measurements, allowing variability across levels to be partitioned and providing more accurate parameter estimation than standard regression models. In the Bayesian framework, parameter estimation often uses Markov Chain Monte Carlo (MCMC), which accommodates complex structures and yields full posterior distributions. However, MCMC is computationally intensive, limiting scalability for large datasets. Recent advances in optimization methods, such as Hierarchical Stochastic Gradient Descent (HSGD) with Adaptive Moment Estimation (Adam), offer a faster and more efficient alternative for hierarchical models. This study applies Hierarchical Bayesian and HSGD-Adam approaches to fish biomass data of the family Lutjanidae from seven Marine Protected Areas (MPAs) in Raja Ampat, Indonesia. The model incorporates ecological predictors such as hard coral cover, distance to the nearest village and period of monitoring, with random effects for area of MPA. Comparison of predictive performance showed that the Bayesian model performed slightly better in RMSE, indicating its ability to capture extreme biomass variations, while SGD-Adam model achieved a lower MAE, reflecting greater stability in prediction. These findings demonstrate that advanced hierarchical modelling methods can enhance ecological data analysis and provide timely, data-driven insights for sustainable marine conservation policy.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/636Correlation Analysis of Seasonal Changes on Aerosol Concentration Using Remote Sensing in Java Island2025-09-17T07:48:40+00:00Garda Asa Muhammadgardaasamuhammad@upi.eduAnnisa Amaanahannisaamaanah@upi.eduVanya Chathy Kemala Dewivanyakemala@upi.edu<p>Aerosols are small particles in the atmosphere that affect the climate through direct and indirect mechanisms. Aerosols can influence the climate and play a role in cloud formation and precipitation. This study aims to analyze the relationship between seasonal changes and aerosol concentrations, and to identify parameters that influence aerosol concentrations in Java Island using remote sensing. The method used in this study is the Pearson correlation test to determine the relationship between seasonal changes and aerosol concentrations in the atmosphere. The results show that there is a relationship between Aerosol Optical Depth (AOD) and rainfall with a correlation value (R) of 0.8. This result indicates a significant relationship between the two variables. Meanwhile, the analysis results between Aerosol Optical Depth (AOD) and wind speed show a correlation value (R) of 0.05. This result indicates that the relationship between Aerosol Optical Depth (AOD) and wind speed is very weak between the two variables.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/736Extreme Value Theory: Modelling Catastrophic Losses In Sports Injury2025-09-15T02:08:35+00:00Adriano Juwono23102210001@student.prasetiyamulya.ac.id<p>Using Extreme Value Theory with a peaks-over-threshold method, we modelled the top 2% of sports-injury losses from 200,000 simulated claims. A generalized Pareto fit via MLE yielded a positive shape (? = 0.783), indicating a fat tail where rare injuries dominate severity. Q–Q and P–P diagnostics show good agreement between model and data. The implied 100-year loss is round 3.31 billion (currency units), and TVaR confirms that conditional on approaching the tail, predicted losses increase quickly. These findings support need for capital buffer to mitigate costly injuries, severe-scenario stress testing, and pricing loadings that specifically consider for costly but rare injuries.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/559Deciphering Student Academic Success: Bayesian Analytical Insights2025-10-04T07:25:48+00:00V Suriya Kannansuriya.v@sdnbvc.edu.inS Lakshmisuriya.v@sdnbvc.edu.inReshmavathi .suriya.v@sdnbvc.edu.in<p>This study delves into the factors influencing student’s academic achievement utilizing Bayesian mixed effect models. It presents five distinct models, each integrating various fixed variables such as gender, playing hours, stress level, and travelling hours, alongside random variables such as school level and type of school. These models are evaluated using the LeaveOne-Out Information Criterion (LOOIC) to gauge their adequacy in fitting the data and predicting outcomes. The findings unveil that the inclusion of additional factors, such as school characteristics and students' activities, modifies the relationship between gender and academic success, with gender exerting a diminishing influence as more variables are incorporated. Additionally, stress level and travelling hours emerge as noteworthy predictors of average marks. Among the models assessed, the one incorporating gender, playing hours, and stress level as fixed effects, alongside school level and type as random effects, demonstrates superior fit and predictive capability. This underscores the significance of considering both individual traits and contextual elements in comprehending academic performance.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/485Spatial Spillover Effects in Food Security: A Spatial Lag Fixed Effects Model for Regencies and Cities in West Sumatra (2019–2023)2025-09-16T07:31:30+00:00Fadhel Imam Haichal Tanjungfadhel.tanjung@bps.go.idErwin Tanurwintanoer@bps.go.id<p>Food security is a key pillar of national development, reflecting a region’s ability to sustain food availability, accessibility, utilization, and stability. The Food Security Index (FSI) serves as a crucial measure of this capability. Based on 2023 data, West Sumatra Province achieved the highest FSI score on the island of Sumatra. This study analyzes food security in 19 regencies and cities of West Sumatra from 2019 to 2023 using a Spatial Lag Fixed Effects Model. The research integrates spatial analysis and panel data approaches to identify determinants of the FSI and assess spatial spillover effects between regions. Secondary data were obtained from the Statistics Agency (BPS) and the National Food Agency. The results reveal significant spatial autocorrelation in most years, except 2023. The best-fitting model is the Spatial Lag Fixed Effects Model. Changes in land area, food expenditure, and rice productivity significantly improve FSI, while non-food expenditure and economic growth do not show a positive effect. The findings emphasize the importance of incorporating spatial dependencies in regional food security policies. Moreover, significant spillover effects indicate that improvements in one area can influence neighboring regions. Therefore, inter-regional cooperation and integrated food distribution policies are essential to achieving sustainable food security.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/659Spatial Modelling of the Relationship Between the Characteristics of Vegetation Index, Life Expectancy and Fertility Rate in Banten Province2025-09-17T07:36:52+00:00Ahmad Syuhada Islami Asyarisyuhadahmad97@upi.eduDiana Sumirahdianasumirah08@upi.eduSyaefunnisa .syaefunnisa.18@upi.eduAchmad Fadhilahachmadfadhilah@upi.eduAndika Permadi Putraandikapp@upi.edu<p>Rapid urbanization in Banten Province has reduced green open spaces, impacting environmental sustainability and demographic dynamics. This study analyzes the spatial relationship between vegetation index, life expectancy (LE), and total fertility rate (TFR) using Landsat 8 imagery (2020–2024) and demographic data from the Central Bureau of Statistics (BPS). The vegetation index, measured using the Normalized Difference Vegetation Index (NDVI), was examined alongside LE and TFR through Pearson correlation and Moran’s I spatial autocorrelation. The results indicate a moderate negative correlation between NDVI and LE (r = -0.561, p < 0.05) and a strong negative correlation between LE and TFR (r ? -0.94). Urban areas such as Tangerang City and South Tangerang City, despite having low vegetation cover, recorded higher LE due to adequate healthcare access. Conversely, rural areas with greater vegetation tended to have lower LE. Spatial analysis identified urban centers as hotspots with high LE, while rural regions appeared as coldspots. These findings confirm that healthcare access and socioeconomic factors can compensate for limited vegetation, while demographic transitions contribute to fertility decline, ultimately supporting sustainable development in Banten Province.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/606The Influences of Climate Change and Social Vulnerability on Dengue Fever Incidence Rate in West Java Province 2019–20232025-09-12T16:26:37+00:00Alwan Nabil Hanif212111879@stis.ac.idGama Putra Danu Sohibiengamaputra@stis.ac.idIka Yuni Wulansariikayuni@stis.ac.id<p>In Indonesia, dengue fever is a serious public health problem. The increase in dengue fever cases is influenced by climate change and social vulnerability factors. This study focuses on West Java Province in 2019–2023, aiming to describe the spatial-temporal pattern of dengue fever incidence and analyze the influence of climate factors and social vulnerability using a spatial-temporal model, namely Geographically Temporally Weighted Regression (GTWR). The exploration results show a high concentration of dengue fever incidence rates in 2019, while in 2023, the intensity of dengue fever incidence decreases. The GTWR model produces local parameters across various regions and time periods, indicating that in most regencies/cities, rainfall, population density, access to inadequate sanitation, health facility ratio, and education level have a positive effect on dengue fever incidence rates, while land surface temperature and the percentage of poor people have a negative effect. From the GTWR model results, areas with high levels of dengue fever vulnerability can be identified as priorities for dengue fever management interventions. Therefore, this study contributes to early warning research and dengue fever control program planning by considering the risk of dengue fever vulnerability in each region.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/611Applied Bayesian Analysis of Intergenerational Fingerprint Pattern Similarity2025-09-22T02:18:48+00:00 Aswini NKAswinink3@gmail.comRUDRANK SHUKLArudrankshukla9@gmail.comJanaki M C drjanakimc@gmail.com<p>This research reports on the inheritance of fingerprint types across three generations of families. Uses of Bayesian measures of statistical analysis indicates a moderate transference of loops and whorls between generations (grandfather, father, son), with negligible transference for arches and only joint moderate evidence across all three generations. A total of 150 samples from 50 family trios were analyzed, classified fingerprints as Arch, Ulnar/Radial Loop, Composite, and Whorl. Cross-tabulation showed the highest transference in Ulnar/Radial Loops, followed by Whorls, with minimal transference for Arches and Composites. The Bayesian correlation analysis of father & grandfather and son & father showed strong similarities between generations (father & grandfather - Pearson r = 0.283, BF?? = 44.74; Kendall’s ?B = 0.255, BF?? = 4650.48) and substantial evidence for the association between sons and fathers. The analysis showed negligible transference between sons and grandfathers. Bayesian regression and model comparisons supported the null model, with very low R² values (0.003–0.012), indicating minimal predictive influence of parental patterns on the son’s fingerprint phenotype. Overall, the findings indicate moderate hereditary continuity of fingerprint patterns between successive generations, but weak evidence for transmission across all three generations. This suggests that fingerprint inheritance is complex, influenced by both genetic and developmental-environmental factors affecting dermatoglyphic patterns.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/692Implementing LSTM-Based Deep Learning for Forecasting Food Commodity Prices with High Volatility: A Case Study in East Java Province2025-09-16T06:35:12+00:00Andi Illa Erviani Nensiandiillaervianinensi@apps.ipb.ac.idWindi Pangestiwindipangesti7@gmail.comNabila Syukrinabilasyukrilela@gmail.comMahda Al Maidamahda.almaidah@gmail.comKhairil Anwar Notodiputrokhairil@apps.ipb.ac.id<p>Accurate food price forecasting is essential for maintaining market stability and food security. East Java Province was selected as the study area because it is one of Indonesia’s main food production centers and a major contributor to national inflation. This study compares three deep learning architectures LSTM, Bi-LSTM, and hybrid CNN-LSTM to forecast the prices of four key food commodities (red chili, shallots, medium-grade rice, and beef) in East Java. Hyperparameter tuning was performed using grid search, and performance was evaluated using MAPE, MAE, and RMSE. The results show that the Bi-LSTM model consistently provides the best performance compared to LSTM and CNN-LSTM across the four analyzed commodities. Based on MAPE, MAE, and RMSE values, Bi-LSTM achieved the lowest forecasting errors for all commodities. The MAPE values of Bi-LSTM were 1.73% for red chili, 0.60% for shallots, 0.23% for medium-grade rice, and 0.08% for beef, all of which were lower than those of LSTM and CNN-LSTM models. These findings highlight Bi-LSTM’s bidirectional architecture, which leverages contextual information from both past and future data sequences, making it the most robust and effective model for forecasting food prices under varying volatility. The study provides practical insights for policymakers and supply chain stakeholders in supporting price stability and food security.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/617Predicting Bronchopulmonary Dysplasia in Infants: A Comparative Evaluation of Probit and Machine Learning Models2025-09-11T07:33:55+00:00Shazali Umar Madakishazaliumar6@gmail.comAbba Bello Muhammad mrwudil@gmail.comHamisu Ahmad Hamisu ahamisu437@gmail.com<p>This study compares the predictive performance of traditional Probit regression and several machine learning models in predicting Bronchopulmonary Dysplasia (BPD) among preterm infants. The models were evaluated using standard performance metrics, including accuracy, precision, specificity, sensitivity, F1-score, and the Area Under the Receiver Operating Characteristic Curve (AUC-ROC). Among all models, the Random Forest demonstrated superior predictive performance with the highest accuracy (86.36%), precision (85.71%), specificity (87.50%), sensitivity (85.71%), F1-score (0.8571), and AUC (0.92), indicating a strong discriminative ability. Birth weight and postnatal weight at four weeks emerged as the most significant predictors of BPD. The findings suggest that machine learning approaches, particularly the Random Forest algorithm, provide a more robust predictive framework than the conventional Probit regression model for early detection of BPD risk in preterm infants.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/705Role of Agricultural Sector and Quality of Its Production Factor in Indonesia: An Application of Input-Output Analysis and Panel Model2025-10-04T07:35:17+00:00Anugerah Surya Pramanaanugerah.surya@bps.go.idDitto Satrio Wicaksonodittosatrio@bps.go.idHuda M. Fajarhudamfajar@gmail.com<p>Indonesia has been known as the largest agricultural country in Southeast Asia. However, the sector contribution to national output has declined. This indicates a low interconnection between agriculture and the other sectors despite the sector’s significant potential to stimulate other industries’ output through strong backward and forward linkages. This condition is caused by the role of production factors that determine agricultural output. Therefore, the research aims to analyse agriculture’s linkages with other sectors and to assess the effects its production factor on agricultural output. Using Input–Output multiplier analysis, it is found the agriculture, forestry, and fisheries sector is the largest absorber of labour in Indonesia. This sector is predominantly consumed directly by households. Meanwhile, panel model results for 2010–2024 show that increases in labour without accompanying improvements in quality have a negative effect, whereas investment and credit, as manifestations of capital, have positive effects on agricultural gross value added. Policy implications include prioritizing skills development and improving access to credit and investment to foster adoption of productivity-enhancing technologies, thereby enabling the agricultural sector to grow and exert greater influence on other sectors and on the national economy.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/522Impact of the Family Hope Program (PKH) on Household Expenditure in East Java, 20242025-09-12T06:33:01+00:00Elvika Nanda Nurdianaelvikanurdiana@gmail.comAnugerah Karta Monikaak.monika@stis.ac.id<p>Poverty remains a development challenge in Indonesia, particularly in East Java, which contributes substantially to the national poverty rate. Household expenditure, which reflects a household’s ability to meet basic needs and maintain living standards, is widely used as a proxy for welfare and poverty. Assessing how social assistance programs influence expenditure is therefore crucial to understand their impact in improving welfare. The Family Hope Program (Program Keluarga Harapan/PKH), a conditional cash transfer initiative, aims to improve household welfare and reduce poverty. This study describes the characteristics of PKH recipients and evaluates the program’s impact on household expenditure as an indicator of welfare in East Java. This analysis uses data from the March 2024 Susenas survey on households that meet the PKH criteria, with separate analyses by household poverty levels. The Propensity Score Matching method was used to address selection bias resulting from non-random recipient selection. The results show that PKH recipients generally face limitations in housing, basic access, and socio-economic conditions. Overall, PKH has not increased total expenditures, but there has been an increase in food expenditures among extremely-poor households. Policy adjustments are needed to better align with the needs and characteristics of each group.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/710Clustering of Cities/Regencies in East Java Province Based on the Number of Health Workers Using K-Means Clustering Analysis2025-09-16T06:50:14+00:00Farras Ijlal Nashirfinchipernoi@gmail.comN R Safitrifinchipernoi@gmail.comD O C Salsabillafinchipernoi@gmail.com<p>This study aims to classify cities/regencies in East Java Province based on the availability of health workers using the K-Means clustering analysis method. Secondary data was obtained from BPS East Java for the year 2024, covering 12 variables of health worker types. The analysis process included data standardization, determination of the optimal number of clusters using the Silhouette method, and the application of the K-Means algorithm. The analysis results show that the optimal number of clusters is two. Cluster 1 exclusively consists of the City of Surabaya, characterized by a high concentration of modern and technical health workers but lower in community-based health workers. Cluster 2 includes the other 37 cities/regencies, showing a greater dependence on basic health workers such as midwives and nutritionists, with limited access to specialist medical personnel. This study recommends strengthening community health workers in Surabaya and increasing the availability of professional medical personnel in other regions to reduce health service disparities in East Java.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/627Analysis and Prediction of Green GRDP in Indonesia with Ecosystem Service Value Approach2025-10-03T09:30:16+00:00Ibnu Gataibnugata.27@gmail.comErnawati Pasaribuernapasaribu@stis.ac.id<p>Gross Regional Domestic Product (GRDP) as a measure of economic output in each region has not reflected sustainability because it overlooks the environmental impacts caused. Green GRDP is an important innovation that integrates environmental aspects into sustainable development. Indonesia has committed through TAP MPR IX/2001, Indonesia Emas 2045, and the SDGs to implement sustainable development. This study analyzes and projects Indonesia’s Green GRDP using the Ecosystem Service Value (ESV) approach. Satellite imagery data from MODIS MCD12Q1 and the Cellular Automata–Artificial Neural Network (CA-ANN) method are employed to predict land cover changes, while time series models are applied to forecast GRDP. Variations in provincial ESV are strongly influenced by land cover composition. In 2001, Papua recorded the highest Green GRDP and ESV contribution, whereas by 2020 (projected to 2030), Jakarta leads in Green GRDP but exhibits the lowest ESV contribution percentage. Throughout the period 2001–2030, Papua consistently maintains the highest ESV proportion relative to its Green GRDP. The findings highlight the importance of incorporating ecosystem service values into regional and national economic planning to ensure that economic growth inherently reflects environmental sustainability. This effort should be supported by spatially differentiated development strategies aligned with each region’s ecological capacity.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/634The Gath–Geva Algorithm for Clustering Spatial Inequality of Stunting in East Nusa Tenggara Province2025-09-19T03:27:41+00:00Mitha Rabiyatul Nufusmhytha.nufus88@gmail.com<p>Stunting remains a critical public health issue in Indonesia, particularly in East Nusa Tenggara (NTT), where prevalence rates are among the highest nationally. This study aims to classify districts and municipalities in East Nusa Tenggara Province based on socioeconomic and health-related indicators associated with stunting vulnerability. Using the Gath–Geva (Fuzzy K-Means Entropy) clustering algorithm, four key variables were analyzed, including poverty rate, access to proper housing, open unemployment rate, and number of health facilities. The results identified three distinct clusters with different regional characteristics. Cluster 1 consists of areas with low poverty and well-developed health infrastructure but relatively high unemployment rates. Cluster 2 represents the most vulnerable regions characterized by high poverty, poor housing access, and limited health facilities, while Cluster 3 comprises more stable areas with better housing, low unemployment, and adequate healthcare services. The silhouette coefficient value of 0.41 indicates that the three-cluster structure provides a reasonably good level of separation and internal consistency. These findings highlight that stunting vulnerability is strongly influenced by socioeconomic disparities and the distribution of health infrastructure. Therefore, intervention strategies should be tailored to the characteristics of each cluster, emphasizing integrated actions in high-risk regions and preventive measures in more stable areas to accelerate stunting reduction across East Nusa Tenggara Province.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/430Mapping Regional Economic Resilience of Indonesian Provinces Through PCA and K-Means Analysis to Support Regional Development Policy Optimization2025-09-16T10:30:57+00:00Bella Cindy Thalitabellacindyt22@gmail.comKevina Alal A'lakevinaakla15@gmail.com<p>In Indonesia’s post-decentralization era, assessing regional economic resilience is critical to promoting inclusive development. This study constructs a composite resilience index using seven indicators Human Development Index (HDI), Open Unemployment Rate, GRDP per capita, Gini Ratio, Economic Growth, Capital Expenditure, and Own-Source Revenue (OSR) across 34 provinces from 2020–2024. Principal Component Analysis (PCA) and K-Means clustering are applied to identify resilience patterns and classify provinces into high, moderate, and low resilience categories. The findings reveal significant interprovincial disparities. Provinces such as DKI Jakarta (HDI: 81.65), Bali (HDI: 76.54), and DI Yogyakarta (HDI: 80.22) consistently demonstrate high resilience, supported by low unemployment (e.g., Jakarta: 5.78%) and robust fiscal capacity (e.g., OSR share: Jakarta 58.29%). In contrast, Papua and West Papua exhibit lower resilience scores, characterized by HDI below 65, limited OSR below 15%, and economic growth volatility. Correlation analysis indicates a strong positive association between HDI and fiscal indicators (r = 0.82), while OLS regression confirms OSR and Capital Expenditure as significant predictors of resilience (p < 0.05). Spatial mapping highlights geographic clustering of resilience, with Western Indonesia outperforming the Eastern region— underscoring persistent spatial inequalities. These findings reinforce the necessity for regionally differentiated policies. The study recommends enhancing fiscal autonomy, investing in human capital, and integrating Fintech-based financial inclusion, especially for lagging regions. The study recommends boosting fiscal autonomy, investing in human capital, and leveraging Fintech for inclusive growth. This framework supports evidence-based policies aligned with Indonesia’s SDG and post-2024 development goals.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/555The Impact of the Job Creation Law and Other Variables on Indonesia's FDI from 2018 to 20242025-09-29T15:38:26+00:00Apriani Sofiana212112295@stis.ac.idGama Putra Danu Sohibiengamaputra@stis.ac.id<p>Although national Foreign Direct Investment (FDI) realization in Indonesia increased following the enactment of the Job Creation Law in 2021, regional FDI realization actually showed a decline in 17 of Indonesia's 34 provinces. Reviews from international organizations such as the World Bank and the World Trade Organization (WTO) suggest the need for analysis to examine the influence of investment-supporting variables on FDI in Indonesia, including the Job Creation Law policy. Therefore, the objective of this study is to analyze the variables influencing regional FDI realization in 34 provinces for the 2018-2024 period. The method used is panel data regression with the selected Random Effect Model (REM). The results show that the Household Consumption Expenditure (HCE) as a proxy for market size, non-oil and gas exports as a proxy for openness of market access, the mining sector's GRDP as a proxy for natural resource potential, and the Job Creation Law have a positive effect on regional FDI realization. These results align with eclectic dunning theory. Disparities in FDI realization were also found, regions outside Java Island that experienced high FDI realization were partly due to internal factors such as abundant natural resources, the presence of industrial areas, and product diversification.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/443Panel Data Regression Modelling on The Analysis of The Influence Of Fiscal Decentralization to Poverty In Maluku In 2020-20242025-09-22T06:01:08+00:00Bayu Aji Bachtiarbachtiarbayy15@gmail.comMiftahus Sa'adahmiftahus786@gmail.com<p>Maluku Province persistently records one of the highest poverty rates in Indonesia, despite sustained fiscal transfers from the central government. This study examines the relationship between fiscal decentralization and poverty reduction in Maluku from 2020 to 2024 through a panel data regression approach, enabling simultaneous analysis of spatial and temporal variations across districts. Poverty data were sourced from Badan Pusat Statistik (BPS) and fiscal variables from Direktorat Jenderal Perimbangan Keuangan (DJPK). The empirical results demonstrate that Regional Original Revenue (PAD), general allocation funds (DAU), and village funds (DD) exert statistically significant negative effects on poverty rates, with DD showing the strongest marginal impact. By focusing on a structurally disadvantaged province, this study contributes to the empirical literature by providing region-specific evidence on the effectiveness of fiscal decentralization mechanisms in reducing poverty. The findings underscore the importance of strengthening local fiscal capacity and optimizing the allocation of intergovernmental transfers to achieve more equitable and sustainable poverty alleviation.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/651Strategic Expansion of Digital Payments in Papua and West Papua: Individual Character Analysis Using Random Over and Under Sampling CART2025-09-12T02:36:05+00:00Reni Ameliareniamelia3006@gmail.comAkhmad Mun'imakhmad.munim@gmail.com<p class="Abstract" style="margin-bottom: 6.0pt;">This study examines the characteristics and influencing factors of digital payment usage among individuals in Papua and West Papua. Understanding these characteristics enables stakeholders to design effective strategies for promotion, socialization, and education to support the expansion of digital payment adoption. The analysis uses data from the March 2023 National Socio-Economic Survey conducted by BPS, involving 52,081 respondents aged 17 years and older. A Classification and Regression Trees (CART) approach was applied with random oversampling and undersampling techniques to handle data imbalance. The results reveal that business fields, types of residential areas, and education levels are key determinants of digital payment usage. Three primary user profiles were identified: (1) individuals aged 17+ working outside the agricultural sector with at least a high school education; (2) individuals aged 17+ working outside agriculture, with junior high school education or below, residing in urban areas; and (3) individuals aged 17+ working in agriculture or unemployed, living in urban areas, and having completed high school or higher. These findings suggest that stakeholders should tailor promotional strategies and educational programs based on individual characteristics to effectively increase digital payment adoption in Papua and West Papua.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/662Clustering of Junior High School Education in West Java Based on Density and Dropout Ratios Using Quartile and KMeans Methods2025-09-12T02:51:18+00:00Eva Nurkhofifahevaanurkhofifah@gmail.comDwilaras Athinadwilaras@outlook.comArna Ristiyanti Taridaarna.ristiyanti@kemendikdasmen.go.idFriska Amelia Pratiwifriska.amelia@kemendikdasmen.go.id<div>Education disparities across regions often reflect differences in school density, teacher availability, and student dropout rates. This study aims to classifies junior high school education in West Java into more homogeneous groups to better understand these disparities. Two clustering approaches were applied: quartile grouping and the K-Means algorithm. Quartile grouping provided a simple categorization of each indicator into four levels (very high, high, low, very low), while K-Means offers a more flexible and data-driven segmentation. K-Means algorithm produced three distinct clusters: (1) Balanced and Stable regions with proportional ratios and low dropout rates, (2) High-Density but Stable regions concentrated in urban and periurban areas with high student-teacher and student-school ratios but controlled dropout levels, and (3) Elevated Dropout Risk regions, mostly in rural and southern areas, with lower density but higher dropout rates. The comparison shows that quartile grouping is easy to interpret for individual indicators, while K-Means provides more comprehensive insights into multidimensional patterns. This research highlights the potential of clustering methods to guide policymakers in designing differentiated strategies, from infrastructure expansion in dense regions to social support programs in dropout-prone areas. </div>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/496Forecasting Composite Stock Price Index on Indonesia Stock Exchange Using Extreme Learning Machine2025-09-22T05:49:11+00:00Bony Parulian Josaphatbonyp@stis.ac.idDhevri Leonardo Hutajulu222111987@stis.ac.id<p>Technological advances have driven active participation in digital economic activities, including capital market investment. Stocks remain a dominant instrument, with the Composite Stock Price Index or Indeks Harga Saham Gabungan (IHSG) serving as a primary benchmark for investment decisions in Indonesia. However, its high volatility—driven by economic, political, global, and market sentiment factors—demands accurate forecasting methods. Traditional approaches such as ARIMA and linear regression are limited in capturing the non-linear and complex patterns of stock market data. This study proposes the use of the Extreme Learning Machine (ELM), an artificial intelligence method considered more adaptive to market dynamics. To enhance prediction accuracy, hyperparameter optimization was performed using the grid search method. The research forecasts IHSG performance by incorporating exogenous variables, namely gold prices, the US dollar to rupiah exchange rate, and a COVID-19 dummy variable. The optimal model utilized a hidden layer configuration of nine neurons. Evaluation results indicate that the ELM models effectively perform multi horizon forecasting (t+1 to t+5), as evidenced by low MAE, MAPE, and RMSE values across horizons. The five-day IHSG forecasts are 7,242.28, 7,228.42, 7,211.02, 7,192.67, and 7,174.06, demonstrating the model’s potential in supporting investment decision-making with high accuracy.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/591The Application of Retrieval-Augmented Generation (RAG) in Developing an Intelligent Risk Management Platform: A Case Study at Statistics Jawa Timur2025-09-29T15:45:46+00:00I Putu Agus Wahyu Dupayanawahyu.dupayana@bps.go.idEko Hardiyantoeko.hardi@bps.go.id<div>Risk management is a crucial element in the governance of modern organizations, especially for public institutions such as Statistics Indonesia (BPS), which is responsible for providing official state statistics. Currently, the conventional methodology at Statistics Jawa Timur remains manual, relying on spreadsheet software, which results in slow and unresponsive processes for addressing dynamic risks. This condition reduces the effectiveness of internal controls, particularly with a massive strategic agenda like the 2026 Economic Census (SE2026) approaching. To address these limitations, this research proposes the development of Kadiri-A Risk Management Information System and Worksheet, an intelligent system that integrates Artificial Intelligence (AI) technology, specifically Large Language Models using the RetrievalAugmented Generation (RAG) method. The Kadiri system is designed to transform risk management from a reactive to an initiative-taking process, accelerating the identification, analysis, and mitigation recommendations by leveraging BPS internal knowledge base. The RAG methodology enables an AI model, such as Google Gemini, to provide contextual and relevant suggestions based on the organization's historical data. The outcome of this development is a digital platform that speeds up risk analysis, enhances accountability, and aligns with the bureaucracy reform agenda.</div>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/607Impact of Land Use Changes Due to Tourism on Ecosystem Services Using InVEST2025-09-12T16:08:00+00:00Atanasius Alfandi222111929@stis.ac.idYuliagnis Transver Wijayayuliagnis@stis.ac.id<p>Ecosystem services play a vital role in supporting human life and environmental sustainability. However, tourism activities in Badung Regency, Bali, have led to significant changes in land cover and use, impacting the function of ecosystem services. This study integrates remote sensing, machine learning, and InVEST technology to understand the impact of Land Use/Land Cover (LULC) changes on ecosystem services in Badung Regency. The results show a decrease in non agricultural vegetation area from 17659.65 hectares in 2014 to 11405.84 hectares in 2024. Meanwhile, built-up land experienced a drastic increase from 15074.47 hectares in 2014 to 22134.06 hectares in 2024. In addition, the InVEST model shows a decrease in carbon stock by 1379,841.68 tons in the period 2014 to 2024. Meanwhile, water yield, nitrogen export, and sediment export increased, reflecting a relationship between tourism development and the decline in ecosystem services. Correlation analysis shows a consistent negative correlation between water yield and carbon stock, as well as a positive correlation between nitrogen export and sediment export. The results of this study are expected to serve as a reference for further studies on the dynamics of ecosystem services and support sustainable environmental management efforts in areas with rapidly growing tourism activity.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/613Machine Learning Framework for Early Detection of Mental Health Conditions from Textual Data2025-10-04T07:27:31+00:00Basheer Riskhanb.riskhan@aiu.edu.myAbdullah Al Hadiabdullah.hadi@student.aiu.edu.myS M Asiful Islam Sakysaky.aiu22@gmail.comMd Saiful Arefinsaiful.arefin@student.aiu.edu.myKhalid Hussainkhalid.hussain@aiu.edu.my<p>Mental health disorders significantly affect global populations, placing heavy burdens on healthcare systems worldwide. Traditional diagnostic methods, mainly clinical assessments and self-reports, lack real-time monitoring, are prone to biases, and often result in delayed interventions. Recent advancements in machine learning (ML) offer promising opportunities to enhance mental health detection through behavioural and physiological data analysis. This study evaluates four widely used machine learning algorithms—Support Vector Machines (SVM), Logistic Regression, Naïve Bayes, and Random Forests—in identifying early indicators of mental health conditions from textual data. A dataset of 27,978 textual records from the “Analysis and Modelling on Mental Health Corpus” was analysed. Data preprocessing involved normalization, stop word removal, lemmatization, and TF–IDF vectorization to prepare robust features for model training. Model performance was assessed using accuracy, precision, recall, and F1-score metrics. Results showed that SVM and Logistic Regression outperformed other models, achieving accuracy rates of 92% and 91%. These findings demonstrate the potential of ML-based frameworks to support earlier and more accurate mental health interventions. Integrating such techniques into clinical practice can improve diagnostic accuracy, reduce healthcare workload, and enhance patient outcomes.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/624Equipment Borrowing and Room Booking Information System at the Politeknik Statistika STIS2025-09-20T14:39:19+00:00Setya Hadi Nugroho222112358@stis.ac.idWaris Marsisnowaris@stis.ac.id<p>The management of goods and space lending services at the Politeknik Statistika STIS is currently still done manually, resulting in various operational constraints such as limited access to information, inefficient processes, and potential errors in recording. This impacts the quality of service and the effectiveness of campus asset utilization. This study aims to design and build a website-based goods and space lending information system to address these issues. The system developers aimed to provide users with access to information on goods and space availability, simplify the loan application process, and improve the accuracy of inventory data. The system was developed using the SDLC method with a prototyping approach, while The researchers carried out the evaluation process using Black Box Testing and a PSSUQ survey survey to measure ease of use and user satisfaction. The developers successfully built the system and confirmed through Black Box Testing that all features operate correctly, and the PSSUQ evaluation shows an average score of 1.69, indicating that this system is well received and provides a high level of satisfaction for users.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/549Comparative Study of Autoencoder and LSTM-AE for Extreme Temperature Anomaly Detection in Semarang2025-09-12T06:13:16+00:00Galih Kusuma Wijayagalihkjaya15@students.unnes.ac.idAliyya Anggraenialiyyaanggraeni@students.unnes.ac.idTsalisa Chulaili Sahri Novasasanovacollage@students.unnes.ac.idMuhammad Alifian yusufalfinyus01@students.unnes.ac.idIqbal Kharisudiniqbalkharisudin@mail.unnes.ac.id<p>Climate change has increased the frequency and intensity of extreme weather events, including heatwaves and cold spells, posing critical risks to public health and urban infrastructure. This study proposes and compares two deep learning frameworks based on Autoencoders, namely the Long Short-Term Memory Autoencoder (LSTM-AE) and the standard Autoencoder (AE), for detecting extreme temperature anomalies using historical daily data from 2005 to 2025 in Semarang City. Unlike conventional anomaly detection methods, the LSTM-AE introduces temporal learning through recurrent memory cells, enabling it to capture sequential temperature dependencies that static AE models cannot. Both models are trained to reconstruct “normal” temperature patterns, with anomalies identified when reconstruction errors exceed the 95th percentile threshold. The results demonstrate that the LSTM-AE more consistently identifies significant heatwave and cold spell events, with seasonal alarm rates that closely align with local climatic transitions. Several detected peaks coincide with historically documented events such as the 2015–2019 El Niño and 2019–2020 transition periods reported by BMKG, confirming climatological relevance. In contrast, the standard AE detects a higher number of anomalies (726 vs 366 from the LSTM AE) but tends to generate false alarms outside transitional periods. Model performance is evaluated using reconstruction error distributions, Jaccard similarity indices, and monthly alarm rates. This study highlights the potential of LSTM-based architectures for improving anomaly detection in climate data and contributes to developing data-driven strategies for urban climate resilience in tropical regions.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/635Spatio-Temporal Modeling of Agricultural Drought in Indramayu Using the NDDI Index (2015-2024)2025-09-17T07:41:25+00:00Sypa Septianisyfaa.septiani@gmail.comIrene Siahanireneshaann@gmail.comHilya Talitha Aqilahhilyaaqilah63@gmail.comDela Oktavianidelaoktavia3@upi.eduFifin Trisulistianififintrisulistiani21@upi.eduSalwa Alifiasalwaalifia@upi.eduTiara Handayanihandayani@upi.eduSiti Zahrotunnisasitizahrotunnisa@upi.edu<div>This study examines the spatio-temporal patterns of agricultural drought in Indramayu Regency, Indonesia, using the Normalized Difference Drought Index (NDDI) derived from Landsat imagery between 2015 and 2024. The analysis employed spatial autocorrelation techniques, including Global Moran’s I and Local Indicators of Spatial Association (LISA), to identify spatial clustering and persistence of drought conditions. The results show consistent spatial vulnerability, with the southern region forming stable High-High drought clusters across multiple years, while the northern region remains dominated by LowLow clusters. These findings indicate that drought distribution in Indramayu demonstrates strong spatial persistence and temporal continuity, reflecting long-term environmental and landuse characteristics. A supporting correlation analysis between NDDI and rice productivity (? = 0.164; p-value = 0.651) revealed no significant relationship, suggesting that effective irrigation systems have mitigated the impact of meteorological drought on agricultural output. Overall, the study highlights the need for location-specific drought management in spatially vulnerable southern areas to enhance agricultural resilience and regional food security.</div>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/431Evaluating the Impact of Ibu Kota Nusantara (IKN) Development on Land Cover Using Machine Learning-Based Sentinel-2A Satellite Image Classification2025-09-11T01:21:14+00:00Wisnu Aimariyadi212212918@stis.ac.idAdinda Batrisybazla212212444@stis.ac.idVanessa Ruth Evelyn Tobing212212904@stis.ac.idRobert Kurniawanrobertk@stis.ac.id<p>The development of Ibu Kota Nusantara (IKN) in East Kalimantan as Indonesia's new capital city has the potential to cause significant changes to land cover patterns, especially in tropical rainforest areas. This study aims to evaluate the impact of IKN development on land cover using Sentinel-2A satellite image data and a machine learning approach. The study area is focused on the IKN Core Urban Area by comparing land cover conditions in 2022 before development and 2024 after development. Three classification methods were used including Random Forest, Support Vector Machines, and Classification and Regression Trees. The results showed that the RF model had the best accuracy with an overall accuracy value above 93% in both time periods. Spatial analysis showed a decrease in vegetation area and an increase in open land as an indication of intensive land clearing activities. These findings emphasize the importance of continuous land cover monitoring to support IKN's vision as a green city and achieve sustainable development targets (SDGs 11 and 15). This research is expected to serve as a reference for the formulation of adaptive and environmentally friendly spatial policies.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/566Real-Time Vibration Fault Detection in Rotating Machines Using Transformers to Minimize Production Losses in Industry 5.0: VIBT2025-09-11T07:03:48+00:00FERNAND JOSEPH TOUKAP NONOtoukap_nono@enspd-udo.cmDIANORE TOKOUE NGATCHA nonojoseph18@yahoo.comFlorence OFFOLEflorenceoffole33@gmail.comSteyve Nyattessteyve@gmail.comMarcelin MOUZONG PEMInonofernand18@gmail.com<p>Quickly identifying anomalies in rotating machinery is crucial for safety and profitability in contemporary industry (Industry 5.0). Unidentified failures can cause costly malfunctions and production interruptions. This research proposes an innovative strategy based on Transformer for the analysis of multidimensional vibration events (VIBT), with a view to early and accurate detection of anomalies in rotating machinery. The goal is to minimize production interruptions in Industry 5.0. The study highlights the limitations of conventional vibration analysis approaches and traditional deep learning techniques, emphasizing the need for innovative solutions. VIBT incorporates transformers and a filter bank convolution (FBC) module for effective denoising, as well as an adaptive wavelet transformation (WTA) mechanism for dynamic feature fusion at various scales, thereby addressing the challenges posed by non-stationary and noisy signals. Extensive testing on the Mafaulda dataset reveals that VIBT achieves 98.1% precision and 98.8% accuracy, significantly outperforming existing standard models. The results suggest that VIBT not only improves fault detection capabilities but also optimizes maintenance strategies in industrial applications, paving the way for future research on semi-supervised learning based on transformers and the integration of intermodal data.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/573Satellite-Based Detection of Floating Plastic Debris in Jakarta Bay (2021–2024)2025-09-12T06:25:09+00:00Marchadha Santi Wildamarchadhasantiwilda@gmail.comErnawati Pasaribuernapasaribu@stis.ac.id<p data-start="237" data-end="746">Plastic waste is a critical environmental issue in Jakarta Bay, causing ecosystem degradation and challenging coastal management. This study analyzes seasonal dynamics and spatial impacts of floating plastic debris using Sentinel-2 imagery from July 2021 to November 2024. The Floating Debris Index (FDI) and Normalized Difference Vegetation Index (NDVI) were applied, with optimum thresholds determined through ROC curve analysis. Monthly median composites were processed to minimize atmospheric noise. The results show a recurring seasonal pattern, with debris consistently peaking in June, likely influenced by monsoon driven runoff and human activities. A clear increasing trend from 2021 to 2023 was followed by a decline in 2024, coinciding with the implementation of the National Ocean Love Month program. Buffer analysis indicated that most debris accumulates within 500 m of the shoreline, particularly near river mouths, ports, and settlements, while Thiessen Polygon analysis revealed hotspots concentrated along the eastern and western coasts. These findings highlight that floating plastic debris in Jakarta Bay is strongly shaped by seasonal cycles and land-based inputs, providing critical insights for designing targeted, evidence-based waste management policies.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/652Spatial Analysis of Food Security Index and Its Factor to Support Program Priority Area in Central Java, Indonesia2025-09-12T02:25:04+00:00Saskia Syafinda Fyndianissyafindaorfyn@upi.eduHanung Putri Titisarihanungtitisari05@upi.eduMuhammad Fadhiil Al-Ghifaaryfadilfary17@upi.eduTiara Handayanihandayani@upi.eduAchmad Fadhilahachmadfadhilah@upi.edu<p>Food security Index (FSI) is a global issue influenced by ecological and socio-economic factors. Food security is a condition in which humans can meet their food needs. Therefore, it is necessary to identify the conditions of food security and the factors that can influence it as a first step in overcoming food insecurity. The study area of this research is Central Java. This study uses spatial autocorrelation method. This method can determine patterns or correlations between study locations using Moran’s I and LISA. This method also provides information related to the relationship between poverty distribution characteristics between locations in Central Java. This study also analyzes the Food Security Index (FSI) in Central Java Province by integrating drought parameters (Normalized Difference Drought Index), poverty levels, food expenditure, and open unemployment rates. The results of the analysis show a correlation between ecological conditions and FSI achievements. These results confirm that the FSI level in the study area does not only depend on natural resources but is also influenced by socioeconomic factors. Thus, the results of this analysis may be beneficial as recommendations for policymakers through a spatial-based approach to provide strategies for improving food security, especially in Central Java.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/493Predictive Insights: Unmasking Breast Cancer Biomarkers through machine learning and Systems Biology2025-09-15T01:54:19+00:00A A Zainulabidinaaliyuzainulabidin@gmail.comA J Sufyant.muthukumar1996@gmail.comM K Thirunavukkarasut.muthukumar1996@gmail.com<p>Breast cancer is a complex and heterogeneous disease in nature with quite high rates<br />of metastasis and recurrence that cause significant morbidity and mortality. Despite the<br />improved treatment options with new medical therapies, a proper understanding of the molecular mechanism in breast cancer development and its progression is of utmost necessity. Hence, we conducted a comprehensive analysis on transcriptomic profiling combined with SHAP feature importance calculation in an attempt to find potential molecular targets. Among the 9 machine learning models generated, random forest model displayed an accuracy value of 0.96 for breast cancer prediction. KRT17, KRT5 and FABP5 were the commonly resulted prognostic biomarkers during the DGE and feature selection approaches. Furthermore, gene enrichment and functional annotations of key genes reveals the importance of these key genes in breast cancer progression. The survival analysis confirms the risk associate with key genes in breast cancer patients. Therefore, this finding show the effectiveness of machine learning combine with DGE in Biomarkers discovery and experimental validation of these genes would be a promising approach to eliminate the clinical complications during the breast cancer treatment.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/580Detection and Mapping of Invasive Alien Plant Water Hyacinth using Satellite Imagery and Machine Learning (Case Study: Rawa Pening Lake, Indonesia)2025-09-20T14:59:45+00:00Adib Sulthon Muammal222111840@stis.ac.idwaris marsisnowaris@stis.ac.id<p>Rawa Pening Lake, one of the 15 national priority lakes in Indonesia, faces a significant threat from invasive water hyacinth (Eichhornia crassipes). This plant once covered up to 70% of the lake's surface and continued to cause ecological and socio-economic impacts as of 2024, necessitating periodic monitoring to prevent future blooms. This study aimed to identify the optimal features to characterize water hyacinth, determine the most effective classification model, and map the plant’s distribution. Adopting the CRISP-DM framework, the study utilized Sentinel-1 (radar) and Sentinel-2 (optical) satellite imagery with multispectral band features, radar bands, and composite indexes. Feature selection was performed using Jenks Natural Breaks, and classification modeling was conducted using Random Forest and Convolutional Neural Network (CNN). The results demonstrated that the CNN achieved higher accuracy in distinguishing among land cover classes. The final mapping identified water hyacinth covering 34,775 pixels, 32,627 pixels, and 34,175 pixels in June, July, and August, respectively. This approach offers a reliable method for periodic monitoring of water hyacinths in Rawa Pening Lake.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/601Extracting Information on Aspects of Sustainable Tourism in ASEAN Using Named Entity Recognition (NER)2025-09-12T16:03:45+00:00Sisilia Manalu222112372@stis.ac.idYuliagnis Transver Wijayayuliagnis@stis.c.id<p>Sustainable tourism is an important issue in the ASEAN region, which has experienced rapid growth in the tourism sector but faces challenges in maintaining a balance between economic, social, and environmental aspects. Information on sustainability practices is scattered across various forms of text, making it difficult to analyze manually. This study aims to extract information on aspects of sustainability in tourism using a transformer-based Named Entity Recognition (NER) approach. Three data sources were used: government websites, online news, and travel reviews on TripAdvisor. Five transformer models were compared, namely BERT, ALBERT, DistilBERT, ELECTRA, and RoBERTa, to evaluate entity extraction performance. The dataset was divided using an 80:10:10 ratio for training, validation, and testing. The results showed that DistilBERT provided the best performance with a balance of accuracy and computational efficiency. In addition, an analysis of the distribution of sustainability aspects in ASEAN countries and Indonesia in particular was conducted to identify practices that have already been implemented. These findings are expected to contribute to the development of more sustainable tourism policies and practices in the ASEAN region and Indonesia.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/684Job Competency Extraction in Information and Technology Sector Using K-Means and Non-Negative Matrix Factorization (NMF) Algorithms2025-09-22T01:48:08+00:00Alfitra Rifa Geandra222212476@stis.ac.idAmir Mumtaz Siregar222212493@stis.ac.idRani Nooraeniraninoor@stis.ac.id<div>The advancement of information technology has led to a surge in online job vacancy data, which contains valuable information about the skill demands in the digital labor market. This study aims to extract job competency in the information and technology sector using a combination of KMeans clustering and Non-Negative Matrix Factorization (NMF). A total of 350 job postings were collected from the Kalibrr platform and processed through web scraping, text preprocessing, and feature representation using TF-IDF. The clustering results indicate that the optimal configuration consists of 10 clusters, as evaluated using the Silhouette Score and Davies-Bouldin Index. Each cluster represents a specific job topic, such as backend development, data science, QA automation, cybersecurity, and digital marketing. The results offer a structured overview of digital skill demands and can be utilized by educational institutions, training providers, and labor policy makers. However, the dataset’s limited size, reliance on a single job platform, and the use of traditional machine learning techniques may not capture all semantic variations and complexities present in the broader</div> <div>job market. Consequently, future work should involve larger and more diverse datasets as well as advanced deep learning text representation approaches to enhance the robustness and generalizability of the results. </div>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/610Enhanced EV Battery Degradation Modeling in Tropical Environments via CVAE-GRU for Sustainable Transportation2025-09-15T02:02:46+00:00Hervé LOTCHOUANG FUSTEhervefustelotchouang2015@yahoo.frKibong Mariuskibongmariustony1@gmail.comNyatte Steyvessteyve@gmail.comSapnken Emmanuelsapnken.emmanuel@gmail.comMewoli Edwigemewoliarmel2@yahoo.frTamba Gastontambajeangaston1@yahoo.fr<div>Electric Vehicle (EV) battery degradation in tropical environments remains poorly understood, with traditional linear models like OLS facing significant challenges such as multicollinearity, leading to unreliable insights into influential factors. This study aims to experimentally characterize lithium-ion battery degradation and comprehensively evaluate the influence of local climatic (temperature, humidity, dust) and driving conditions (road quality, mileage) in a Cameroonian tropical context, addressing the limitations of conventional statistical approaches. Our unique contribution involves providing empirical real-world data from a subSaharan environment and applying a novel hybrid CVAE-GRU methodology to capture complex non-linear and temporal dependencies. An embedded system continuously collected battery parameters (SoH, internal resistance) alongside environmental and driving data. The CVAE learns robust latent representations from these correlated inputs, while the GRU models their temporal dynamics for degradation prediction. Results confirm progressive SoH degradation, significantly accelerated by high temperatures, humidity, dust, and poor road quality. The CVAE-GRU approach effectively mitigates multicollinearity, offering superior accuracy and deeper insights into these influences. This work highlights the critical impact of tropical conditions on EV battery aging, providing crucial findings for developing adapted Battery Management Systems and fostering sustainable mobility in similar regions.</div>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/510Analysis of the Effectiveness of Iterative Prompts in the Integration of Classification and Summarization of User Reports Based on NLP2025-09-22T05:41:59+00:00Sulisetyo Puji Widodosulisetyo.widodo@gmail.comIlmi Aulia Akbarakbar.ilmiaulia@gmail.comWaiz Al Qorniwaiz.alqorni@bps.go.idRifqi Ramadhan rifqi.ramadhan@bps.go.idFebi Dwi Haryono febi.dwi@bps.go.id<p>User reports submitted through feedback features or ticketing systems provide valuable insights for improving mobile applications. However, the high volume of reports creates challenges for review and decision-making. Effective classification and summarization are therefore essential to manage this information efficiently, allowing developers to quickly identify recurring issues and support data-driven development strategies. This study automates large-scale user feedback processing using Natural Language Processing (NLP) and evaluates multiple language models. The Bigbird-Small model achieved the highest agreement with the majority (81.51%) due to its ability to process long-text contexts. XLM-R-Base performed competitively (78.08%), while BERT-Base and Roberta-Base showed stable performance (75.68% and 74.32%). Distilbert-Base, though more computationally efficient, had slightly lower accuracy (74.32%). For summarization, Simple Prompt and Iterative Prompt approaches were compared. The Iterative Prompt with four iterations performed best, achieving similarity 0.911, compression 0.846, keyword overlap 0.624, and redundancy 0.070. These results demonstrate that combining automated classification with iterative summarization can significantly improve both efficiency and accuracy in managing user reports, supporting better decision-making and enhanced mobile app development.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/614From Noisy Data to Insight: SOM Filtering Implementation For Improving the Machine Learning Model2025-09-12T16:10:45+00:00Achmad Firmansyahachmad.firmansyah@bps.go.id<p>The filtering of representative training data from Big Data are critical steps in developing machine learning models, particularly for official statistics. This study demonstrates the application of Self-Organizing Map (SOM) filtering for enhancing training data quality in remote sensing-based classification of paddy phenological stages using satellite data. By clustering the data, SOM identifies and filters representative samples, which further removing noise and irrelevancy. Following the filtering, comparison is conducted between several purity threshold scheme and non-filtering dataset during model development. Findings reveal that increasing the purity threshold consistently improves classification performance and accuracy respectively, as filtering becomes stricter. The results demonstrate SOM filtering as an effective strategy for improving the representativeness and reliability of training datasets in remote sensing applications, while emphasizing the trade-offs when optimizing machine learning model robustness and generalizability.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/708Harnessing the Potential of the Blue Economy in Central Java2025-09-16T09:24:17+00:00Almira Ajeng Pangestikaaapangestika@gmail.comDwi Wahyudidwi.wahyudi@bps.go.idRidson Al Farizal Pulunganridsonap@bps.go.id<p>This study pioneers the mapping and analysis of the blue economy's potential across the 35 regencies/municipalities of Central Java by constructing a novel Blue Economy Index (BEI). Notably, this research is among the first in Indonesia to build the BEI using granular satellite data and digital sensor information, and to apply the Two-Step System GMM approach to dynamically analyze the factors influencing its development. This combination provides unprecedented sub national detail and robust insights into effective policy levers. The findings reveal significant disparities among the southern coastal, northern coastal, and non-coastal areas. The southern coastal regions exhibit higher BEI values compared to their northern coastal and non-coastal counterparts, which fall below the average. Results from the Two-Step System GMM regression analysis indicate that internet usage, infrastructure, and the COVID-19 period exert significant effects on the BEI. Specifically, infrastructure development, proxied by Nighttime Light (NTL), demonstrates a negative impact on the BEI, suggesting that environmentally unsustainable infrastructure may undermine the sustainability of the blue economy. Meanwhile, access to digital technology through internet usage plays a crucial role in fostering inclusive blue economy growth. Based on these findings, the proposed policy recommendations include optimizing environmentally friendly infrastructure development, leveraging digital technology to expand market access, and strengthening the resilience of the blue economy through Adaptive-Responsive-Innovative (ARI) crisis policies. Consequently, the development of the blue economy in Central Java is expected to enhance the sustainable welfare of coastal communities while fully optimizing the potential of coastal areas.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/528Correlation Analysis of Land Surface Temperature (LST) and Vegetation Density Using Landsat 8 and 5 Imagery in Purwakarta Regency2025-09-12T05:58:44+00:00Aida Ainulmilaaida.amila19@upi.eduS Tianataniasepti@upi.eduK N Mumtaztaniasepti@upi.eduD S F Azharitaniasepti@upi.eduF Ibrahimtaniasepti@upi.eduT S Anggrainitaniasepti@upi.edu<p>Urbanization and industrial development in urban areas have led to a decrease in vegetation and an increase in land surface temperature. This phenomenon impacts microclimate change and environmental quality, as seen in Purwakarta Regency. The conversion of vegetated land into industrial and residential areas reduces the vegetation index. This vegetation index can be measured using the Normalized Difference Vegetation Index (NDVI) method. Meanwhile, monitoring the increase in surface temperature can be calculated using the Land Surface Temperature (LST) method, which can indicate physical changes on the Earth's surface. The purpose of this study is to analyze the relationship between vegetation density and the increase in surface temperature using remote sensing and Geographic Information System (GIS) methods. The analysis results show that vegetated land area decreased significantly from 67,564.8 ha (2004) to 44,970 ha (2024), while built-up land increased threefold. In the same period, the average surface temperature increased from 37.31°C to 40.41°C. The correlation analysis shows a strong positive correlation between the decrease in NDVI and the increase in LST, with a correlation coefficient of 0.707 in 2024.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/626Forest and Land Fire Severity Analysis in 2022-2023 in Hulu Sungai Selatan Regency Using the NBR (Normalized Burn Ratio) Method2025-09-20T14:35:45+00:00Desti Meirisa Putridestimeirisa30@upi.eduMuhammad Refamuhammadrefa@upi.eduSheren Siti Salamahsherens3@upi.eduTania Septi Anggrainitaniasepti@upi.eduShafira Himayahshafirahimayah@upi.edu<p>Forest and land fires are recurring disasters in Indonesia that cause environmental, health, and socio-economic losses. Hulu Sungai Selatan Regency, South Kalimantan, is among the affected regions, particularly during 2022–2023 when the El Niño phenomenon and flammable peatlands increased fire risk. This study analyzes the spatial extent and severity of fires and their potential impact on local communities by integrating remote sensing and demographic data. The Normalized Burn Ratio (NBR) and Difference Normalized Burn Ratio (dNBR) derived from Landsat 8 and 9 imagery (2021–2023) were used to map fire severity, supported by hotspot data from the Ministry of Environment and Forestry and settlement data from the Geospatial Information Agency. Population data from the Central Bureau of Statistics (BPS) were incorporated to develop a Fire Vulnerability Index (FVI) representing community exposure to fire-prone areas. The results show that burned areas in 2023 expanded compared to 2022, with increasing low to moderate severity classes. Subdistricts with dense populations, such as Kandangan and Angkinang, showed higher fire vulnerability values, indicating potential socio environmental risks. These findings emphasize the importance of integrating remote sensing and statistical data to support effective fire mitigation and risk reduction in vulnerable regions.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/552Detecting Marine Debris Using Sentinel-2 Satellite Images2025-09-29T01:59:13+00:00Fadiah Faradinah Nasir222112030@stis.ac.idRobert Kurniawanrobertk@stis.ac.id<p>Plastic waste pollution in the oceans remains a global problem. Kuta Beach is one of Bali's tourist destinations that has been affected by plastic waste pollution. This is not in line with the 14th SDGs, which is to prevent and reduce marine debris pollution. However, the marine debris monitoring process carried out by the Ministry of Environment and Forestry requires officers to conduct direct monitoring in the field, which incurs higher costs. Therefore, satellite imagery can be an alternative option for more effective and efficient marine debris detection. This study aims to detect marine debris on Kuta Beach using machine learning algorithms, namely Random Forest (RF), XGBoost, and LightGBM. This study uses the Marine Debris Archive (MARIDA) dataset, which has marine debris labels, and Sentinel-2 images of Kuta Beach from 2019–2023. The LightGBM algorithm provided the best performance in detecting marine debris with an F1-score of 95.16%. The area detected as marine debris on Kuta Beach in 2019–2023 was 500 m<sup>2</sup>, 0 m<sup>2</sup>, 100 m<sup>2</sup>, 300 m<sup>2</sup>, and 400 m<sup>2</sup>, respectively. Based on these results, marine debris is generally detected around the coastline, particularly in the southern area of Kuta Beach, which is located near a shopping center.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/569Fault Modeling to Determine the Reliability Status of Rotating Machines Using Deep Learning Methods Based on Vibrations from Acoustic Emissions from Cooling Fans2025-09-11T07:06:37+00:00FERNAND JOSEPH TOUKAP NONOnonofernand18@gmail.comDIANORE TOKOUE NGATCHA nonojoseph18@yahoo.comFlorence OFFOLEflorenceoffole33@gmail.comFRANCELIN NDIedgarfrancelin307@gmail.comMarcelin MOUZONG PEMImouz55@gmail.com<p>Modern industrial production acknowledges the increasing significance of maintenance. As of right now, maintenance is seen as a service that aims to maintain the effectiveness of systems and installations while adhering to quality, energy efficiency, and protection standards. An inventive technique to automate rotating machine maintenance procedures has been created in this study. To identify failures and flaws in the motors through their supports, where the fan blades are attached, a technique based on capturing the noises produced by their cooling fans and utilizing deep learning to diagnose problems was investigated. Two operational circumstances were envisioned: the absence of fault and the presence of fault. The machine is correctly powered and running in ideal circumstances when it is not having any issues. In contrast, failures were gradually created purposefully and then documented in order to better understand the faults. Utilizing a pre-trained network (SqueezeNet) built on the ImageNet database, the convolutional neural network (CNN)-based technique was constructed. Applying transfer learning to the spectrograms obtained from the sound emission recordings of our machine's fan in both working modes demonstrated outstanding performance (accuracy = 0.987), confirming the methodology's outstanding quality.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/649The Impact of Training-Testing Proportion on Forecasting Accuracy: A Case of Agricultural Export in Indonesia2025-09-12T02:19:04+00:00Tri Wijayanti Septiarinitri.wijayanti@ecampus.ut.ac.idMade Diyah Putri Martinasarimade.diyah@ecampus.ut.ac.idEka Pariyantieka.pariyanti@ecampus.ut.ac.id<p>Accurate forecasting of agricultural exports is crucial for supporting trade policy and ensuring economic stability in Indonesia. This study investigates the impact of training–testing proportions on the forecasting accuracy of six models: linear regression, decision tree, optimized decision tree, neural network, Auto Regressive Integrated Moving Average (ARIMA), and exponential smoothing. Using Indonesia’s agricultural export data, model performance was evaluated under two data-splitting schemes (80%:20% and 75%:25%) with error metrics including MAE, MSE, RMSE, and MAPE. The results consistently show that statistical time series models outperform regression-based and machine learning approaches. In particular, SES achieved the lowest forecasting errors across all evaluation criteria, with MAPE values as low as 0.93%, followed by ARIMA as the second-best performer. Machine learning models, on the other hand, produced relatively higher error values, suggesting their limited ability to capture temporal dependencies in the data. Importantly, the choice of training–testing proportion did not significantly alter the ranking of model performance, indicating that model selection plays a more critical role than data partitioning. Overall, this study highlights the robustness of exponential smoothing methods as reliable forecasting tools for Indonesia’s agricultural exports and provides evidence-based insights for policymakers in designing effective trade strategies.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/575An Intelligent Conversational Agent Using Self-Reflective Retrieval-Augmented Generation for Enhanced Large Language Model Support in National Accounts Learning2025-09-12T06:26:41+00:00Muhammad Farhanfarhan082002@gmail.comYunofri .yunofri@bps.go.idEtjih Tasriahtasriah@bps.go.idLya Hulliyyatus Suadaalya@stis.ac.idSetia Pramanasetia.pramana@stis.ac.id<p>BPS Statistics Indonesia plays a strategic role in compiling balance sheet statistics as the foundation for national policy analysis. This role requires a deep understanding of the concepts, definitions, and compilation standards outlined in the System of National Accounts (SNA) manual. However, in practice, comprehending such complex technical documents is not always straightforward. To address this challenge, this study proposes the development of an intelligent conversational agent in the form of a chatbot that implements the Self-Multimodal RAG approach. This approach integrates self-reflection mechanisms to generate more accurate and relevant responses. The evaluation was conducted using the LLM-as-a-Judge framework across four metrics: answer correctness, answer relevancy, context relevancy, and context faithfulness. Experimental results demonstrate that the Self-Reflective RAG achieved a score of 80% on the answer correctness metric, with competitive performance in terms of relevancy and faithfulness. From the chatbot implementation perspective, black-box testing confirmed that all functionalities operated as expected, while system usability testing using the CSUQ instrument yielded a score of 74.704%, indicating that the chatbot is well-accepted by users.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/495Classification of Urban and Rural Villages with Machine Learning on Satellite Image Data and Points of Interest2025-09-19T05:37:21+00:00Bony Parulian Josaphatbonyp@stis.ac.idAlvandi Syukur Rahmat Zega222111878@stis.ac.id<p>An evaluation of the Sustainable Development Goals with data disaggregated by residential area, namely urban and rural areas, is essential. This study proposes the use of satellite imagery and point of interest (POI) data with machine learning methods to classify urban and rural villages, specifically in North Sumatra Province. The data used includes satellite imagery from various sources, such as NOAA-20, Sentinel-2, Sentinel-5P, and Terra, as well as Google Maps, covering various variables including NTL, NDVI, NDBI, NDWI, NO?, CO, and LST, along with POIs categorized under education, economy, health, and entertainment. The machine learning methods used were Decision Tree and Support Vector Machine, with data imbalance addressed through resampling techniques such as Random Under sampling (RUS). The results of the study show that the Support Vector Machine model with RUS produced the best weighted average F1-score of 87.74% for the classification of urban and rural villages, with NTL being the most important feature in the model formation. This study is expected to be an alternative for BPS in the classification of urban and rural villages.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/680GIS-Based Analytical Hierarchy Process Flood Hazard Mapping in Deli Serdang, Indonesia Using Satellite Images2025-09-17T07:34:17+00:00Zaidan Hafizhahurrahmanzaipanadansonic@gmail.comShafnanda Aulia Kamal222212878@stis.ac.id<p class="Abstract" style="margin-bottom: 28.35pt;">As of the regions with a high frequency and significant impact of flood disasters, Deli Serdang in North Sumatera, Indonesia highly requires spatial-based hazard mapping as a foundation for mitigation efforts. This study aims to map the flood hazard levels by integrating the Analytical Hierarchy Process (AHP) and Geographic Information Systems (GIS). Five parameters were analyzed to construct the model: elevation, slope, rainfall, Normalized Difference Vegetation Index (NDVI), and Normalized Difference Built-up Index (NDBI), with data acquired through the Google Earth Engine platform. The AHP weighting results indicate that rainfall is the most dominant factor (40%) influencing the hazard level. The resulting hazard map identifies a clear spatial pattern with a north-to-south gradation, where 50.17% of the total area falls into the high-hazard category, 47.57% into the moderate category, and the remainder into the low-hazard category. A significant finding reveals that all sub-districts within the study area are classified as either moderate or high hazard, confirming the northern coastal zone as the most critical area. The results of this research can serve as a scientific basis for local government in formulating more adaptive and targeted disaster mitigation policies and spatial planning.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/504Development of Village Administrative Data Management System Through PAPEDA (Village Population Administration Development Application) in Pitu Village, North Halmahera Regency2025-09-15T05:41:11+00:00R A D Ikramriva.adli@bps.go.idA M Kaharipin@bps.go.idGusrizal .gusrizal060862@gmail.com<p>This study discusses the utilization of technology in managing village administrative data, improving public service systems, and providing base data for local government decisionmaking. Using qualitative methods for data collection and the SDLC Waterfall Model for system development, this research analyzes the benefits of PAPEDA (Aplikasi Pembangunan Administrasi Kependudukan Desa), an output of the Desa CANTIK program, on village administrative data management and public services. Based on the evaluation results using Black Box Testing and User Satisfaction Surveys, this study shows that technology utilization in villages positively impacts the community. The use of PAPEDA not only makes it easier for village officials to manage village administrative data but also accelerates the public service process in the village. Residents can access various administrative services online, anytime, and anywhere. Additionally, village monographs and stunting monitoring enable local governments to use them as a basis for development. However, uneven internet connectivity hinders technology utilization, emphasizing the need for local governments to improve internet infrastructure.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/516AI-Driven Transformation in the Textile Industry: A Bibliometric Analysis and Scoping Review2025-09-29T16:18:10+00:00Fajar Pitarsi Dharmafajarpd93@gmail.comMoses Laksono Singgihmoseslsinggih@its.ac.idDedy Dwi Prastyodd.prastyo@its.ac.id<p>Artificial Intelligence (AI) is rapidly reshaping the global textile industry, driving efficiency, precision, and sustainability across its value chain. Yet despite growing enthusiasm, the integration of AI remains fragmented, with limited statistical understanding of where, how, and why these technologies take root. This study addresses that gap by combining bibliometric network analysis and systematic scoping review to map and statistically interpret two decades (2003–2023) of research on AI applications in textiles. Using association strength normalization, VOS modularity clustering, and thematic centrality density mapping, we identified eight manufacturing clusters ranging from fabric defect detection and supply chain optimization to textile waste management and sustainability that structure the field. The novelty of this work lies in repositioning bibliometric analysis as a statistical instrument, not merely a descriptive tool. Keyword co-occurrence networks and citation trajectories are translated into evidence-based research agendas, connecting cluster signals to methodological pathways such as regression modeling, support vector machines, neural networks, and hybrid ML-statistical frameworks. This statistical logic is used to surface gaps. Particularly in empirical validation, predictive modeling, and cross-cluster integration and to chart future directions for data-driven textile innovation. By grounding future agendas in measurable statistical patterns rather than narrative interpretation alone, this study offers a rigorous analytical framework that links research structure to methodological opportunity. The resulting roadmap invites scholars and practitioners to bridge AI, textile engineering, and applied statistics, shifting the field from fragmented experimentation toward coherent, evidence-based innovation.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/529Optimized Feature Engineering for Transaction Fraud Detection Using Sequential and HMM-Based Features2025-09-11T06:59:58+00:00Kaung Wai Tharkaungwai.thar@outlook.comThinn Thinn Waithinnthinnwai@uit.edu.mm<p>Fraud detection in financial transactions remains a major challenge because fraudulent activities are extremely rare—often described as finding a “needle in a haystack”— and must be detected in real time. This study presents a hybrid feature engineering framework that integrates lightweight sequential indicators with Hidden Markov Model (HMM)-based behavioural features to improve accuracy and interpretability. Using the PaySim dataset containing 2.77 million transactions (0.2965% fraud), we extracted 22 sequential and 14 HMMbased features, from which 28 highly discriminative variables were retained. To address class imbalance, a batch-wise SMOTETomek approach was applied, expanding 1.94 million clean samples to 3.86 million balanced samples. Experimental results show that HMM-based features alone yield moderate performance (ROC AUC = 0.778, F2 = 0.051), but the combined ensemble of tuned XGBoost and LightGBM achieves superior accuracy (ROC AUC = 0.9983, F2 = 0.8431, MCC = 0.827). SHAP analysis identifies HMM-derived entropy and state likelihoods, together with transaction amount dynamics, as key predictors. The results demonstrate that optimized feature engineering plays a crucial role in achieving accurate, scalable, and interpretable fraud detection.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/719Business Description Categorization to the Five-Digit Indonesian Standard Classification of Business Field (KBLI) Using Machine Learning and Transfer Learning2025-09-16T05:49:27+00:00Muh. Alfian Amnur222212736@stis.ac.idLa Ode Muhammad Gazali222212696@stis.ac.idAmir Mumtaz Siregar222212493@stis.ac.idFaruq Ariya Jalaksana222212600@stis.ac.idMade Nisa Rahayu Ananda Suwendra222212718@stis.ac.idNurul Fadila Utami222212810@stis.ac.idAlif Median Ramadhan222212480@stis.ac.idElisse Krisela Fabrianne222212580@stis.ac.idEurorea Wirata Raja Panjaitan222212586@stis.ac.idFitri Aini Izzati222212614@stis.ac.idJernita Bintang Yuliani Manalu222212680@stis.ac.idMuhammad Gilang Hidayat222212752@stis.ac.idLya Hulliyyatus Suadaalya@stis.ac.idBudi Yuniartobyuniarto@stis.ac.idSetia Pramanasetia.pramana@stis.ac.id<div>The Indonesian Standard Classification of Business Fields (KBLI) is essential for economic statistics, yet manual classification of business descriptions to five-digit KBLI codes is time-consuming and prone to inconsistencies. This study aims to develop and compare machine learning (Support Vector Machine and Random Forest) and transfer learning </div> <div>(IndoBERT) models for automating KBLI classification, supported by the preparation of synthetic and real-world datasets for model training. The synthetic data were generated using large language models, validated through human majority voting and complemented with realworld data from the National Labor Force Survey (Sakernas) and the Micro and Small Industry Survey (IMK). The findings indicate that Fine-tuned IndoBERT achieved superior performance, achieving an F1-score of 92.99% and an accuracy of 93.40% on synthetic data, alongside top-1, top-5, and top-10 accuracies of 32.93%, 54.71%, and 63.24% on real-world data. The deployment of fine-tuned IndoBERT as a RESTful API demonstrates its scalability and efficiency, presenting a reliable solution for large-scale KBLI classification in official statistics. </div>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/545Investigating the Profile of Digital Readiness and Sustainability Development: An Explainable Clustering2025-09-12T06:09:09+00:00Agus Pamujiaguspamuji@students.undip.ac.idAries Susantyariessusanty@lecturer.undip.ac.idBudi Warsitobudiwarsito@lecturer.undip.ac.id<p>The level of digital readiness within Islamic Higher Education Institutions (IHEIs) has emerged as a critical concern, drawing increasing scholarly and institutional attention over the past five years. This study aims to examine the empirical relationship between two key dimensions: digital readiness, as reflected by the National Readiness Index (NRI), and progress toward the Sustainable Development Goals (SDGs). Data were collected from more than 20 IHEIs between 2023 and 2024 to support a sequential analytical approach. Pearson’s correlation coefficient was employed to identify associations between NRI-based digital readiness and SDG performance within the IHEI context. Subsequently, cluster analysis was conducted using the Duda–Hart Index, while the Pseudo T² statistic was applied to validate the robustness of the clustering outcomes. A cartographic visualization was also generated to illustrate variations across readiness and sustainability clusters. The results indicate a considerable disparity between digital readiness and sustainability among IHEIs. Only a limited number of institutions demonstrate consistent performance in both areas, suggesting that effective leadership and strategic investment in digital infrastructure are essential prerequisites for achieving sustainable institutional transformation.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/730The Digital Footprint of Public Attention: Forecasting Indonesian Gold Prices using Google Trends Index and Optimized Support Vector Regression2025-10-04T07:42:10+00:00Muhammad Restu Ilahi222112222@stis.ac.idArie Wahyu Wijayantoariewahyu@stis.ac.id<p>To provide actionable forecasting insights for gold prices in Indonesia’s public sentiment-driven market, this study developed a machine learning framework using the Google Trends Index (GTI) as a sentiment proxy. We employed an Optuna-optimized Support Vector Regression (SVR) model to comparatively evaluate three feature sets (GTI, historical Lag, and a Mix) across seven forecasting horizons (t+1 to t+30). A key advantage of our approach was the identification of horizon-dependent predictor dynamics: results revealed that while historical data excelled for short-term forecasts (MAPE 0.50% at t+5), the contribution of GTI became vital for long-term accuracy, where the hybrid model achieved its peak performance (MAPE 1.92% at t+30). Notably, the GTI-only model showed solid standalone potential (MAPE < 20%). We conclude that a hybrid approach is most effective, validating GTI as a relevant predictor for Indonesia. Furthermore, the proposed SVR-Optuna framework offers a generalizable methodology for forecasting other sentiment-driven assets, providing a clear, actionable guide for model selection based on forecasting horizons.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/639Unsupervised YouTube Video Segmentation of “Bendera One Piece” Content Using Medoid-Based Clustering with Statistical Significance Testing2025-09-11T04:38:31+00:00Weksi Budiajibudiaji@untirta.ac.idPatricia Kumenappingkankumenap.04@gmail.comM Fabian Delanof.reinhard0502@gmail.comFerdian Wijayaferdian.bangkit@untirta.ac.idRifqi Riyantorifqi.ar@untirta.ac.id<p>The curse of dimensionality and sparsity are well-documented phenomena in applied statistics where the data’s dimensionality (number of features) far outnumbers the observations. This work aims to present an integrated applied statistics framework to distill semantic structure from high-dimensional data by combining pre-processing, dimensionality reduction via principal component analysis, medoid-based clustering (partitioning around medoids and simple k medoids), and a modified Statistical Significance Clustering (SigClust) test for validation and inference in the context of viral media. In this case study, we demonstrate an approach that segments and interprets YouTube videos from the lens of the Indonesian viral media “Bendera One Piece” through its user commentary. The PCA-based dimensionality reduction helped resolve the curse of dimensionality, where the first principal component alone explained 80% of the variance in text-based features and captured a dominant socio-political pattern. Internal validation and the SigClust test agreed on the presence of a statistically significant three-cluster solution that could be labelled as the audiences of “Pop-Culture Enthusiasts”, “Cautious Observers”, and “Political Protesters”. The study presents the utility of integrating established statistical methods with a modified validation step for high-dimensional text data analysis and pattern recognition.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/738The Digital Frontline: A Thematic Analysis of User Grievances and Satisfaction Drivers for Indonesian Public Service Apps2025-09-15T02:09:35+00:00Ferdian Bangkit Wijayaferdian.bangkit@untirta.ac.idWeksi Budiajibudiaji@untirta.ac.idRafly Priyantama Ramadhan Bagaskarapriyantamarafly@gmail.comZilda Ainun Tazkiazildaainun@gmail.comDinda Dwi Anugrah Pertiwidindanugrah16@gmail.com<div>This research assesses Indonesia's digital public service ecosystem by analyzing 50 mobile applications from a wide range of state agencies. Using a computational content analysis of metadata and user reviews from the Google Play Store, this study presents a dual-faceted evaluation. First, a thematic analysis of negative reviews (1-2 stars) reveals that user grievances are overwhelmingly dominated by foundational issues, such as login/access problems, slow performance, and technical glitches, rather than a lack of advanced features. Second, a corresponding analysis of positive reviews (5 stars) identifies that user satisfaction is primarily driven by high-quality features, ease of use, and overall application reliability. Quantitative findings show significant performance disparities across institutional categories, with Ministrydeveloped apps receiving the lowest average user satisfaction. An Importance-Performance Quadrant Analysis further uncovers a critical paradox: many high-download, mandatory apps suffer from low user ratings, indicating a clear disconnect between enforced adoption and usercentric quality. The research concludes that enhancing digital public services requires a strategic shift from feature proliferation to foundational reliability. Ensuring robust core functionalities is paramount to building citizen trust and achieving a successful digital transformation.</div>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/642Two-Stage RFM and Macroeconomics Interaction Model for Accurate CLV Prediction in Direct Sales2025-09-19T03:26:07+00:00Unung Istopo Hartanto24051905001@mhs.unesa.ac.idI Gusti Putu Asto Buditjahjantoasto@unesa.ac.idWiyli Yustantiwiyliyustanti@unesa.ac.id<p>This study introduces a two-stage predictive model integrating Recency, Frequency, Monetary (RFM) metrics with macroeconomic indicators to estimate Customer Lifetime Value (CLV) in direct sales, addressing dynamic customer behavior in volatile markets. Data from the Halalmart Sales Integrated System (January 2023–July 2025, 29,893 transactions, ~431 unique customers monthly) were combined with Indonesian macroeconomic indicators (Consumer Confidence Index, Consumer Expectation Index) from Bank Indonesia and inflation data from the Central Bureau of Statistics (BPS). The first stage uses CatBoost classification, achieving 89.3% accuracy to identify active customers, followed by an ensemble regression (CatBoost, XGBoost, LightGBM, Ridge, RandomForest), yielding an R<sup>2</sup> of 0.894 for CLV prediction. RFM features contribute 40.3% to classification and 16.2% to regression variance, while macroeconomic interactions dominate, contributing 59.7% and 83.8%, respectively. A key interaction, Monetary and Consumer Confidence Index, shows a 0.773 correlation with CLV. SHAP analysis enhances model interpretability. Despite a skewed dataset with approximately 65% zero CLV, the model supports targeted marketing strategies, offering valuable insights for strategic decision-making in direct sales environments</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/570Water Quality Measurement in Illegal Gold Mining Areas Using Sentinel-2A MSI Satellite Images of the Batanghari River, Tebo Tengah District2025-09-16T10:18:29+00:00Baginda Sinagabagindasng@gmail.comRobert Kurniawanrobertk@stis.ac.id<p>Water quality in Indonesian rivers has declined due to pollution from solid and liquid waste from industrial and domestic sources. The Batanghari River, the longest river on the island of Sumatra, faces various environmental problems, including pollution from illegal mining activities. Artisanal and small-scale gold mining (ASGM) contributes to mercury release, contaminating water and soil and posing health risks to communities. Conventional monitoring methods have limitations in coverage and efficiency. Therefore, this study utilizes Sentinel-2A MSI satellite imagery to assess and map water quality conditions around illegal gold mining areas along the Batanghari River in Tebo Tengah District. The developed model uses K- Means, Fuzzy C-Means (FCM), Principal Component Analysis (PCA), and Weighted Arithmetic Water Quality Index (WAWQI) to extract water quality features. The findings indicate that WAWQI provides a more representative quantitative assessment, revealing that areas near illegal gold mining sites in Batanghari river exhibit moderately to heavily polluted water quality. This approach is expected to support water quality monitoring and assist policymakers in managing water resources and the environment.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/508Promoting Peaceful and Inclusive Information Security Compliance: A Systematic Review of Assurance Behavior in IT Employees within the Context of SDG-16 in Malaysia2025-10-04T07:24:10+00:00Aziela Isma Zarillaazielaisma@gmail.com<p>This systematic review examines the alignment between IT employees' desire,<br />intention, and compliance with information security protocols, a critical issue in Malaysia where<br />human error is a leading cause of data breaches. Situated within the context of Sustainable<br />Development Goal 16 (SDG-16), the study analyzes 30 peer-reviewed articles to identify key<br />behavioral factors. Findings indicate that while training improves knowledge, its impact on longterm behavior is limited. A significant compliance gap is driven by psychological factors like<br />work overload and optimism bias, as well as organizational elements such as culture and<br />management support. The review concludes that effective information security assurance<br />requires a holistic strategy integrating tailored, ethical training with strong organizational support<br />to mitigate psychological strain and foster a robust security culture. This approach is essential<br />not only for strengthening cybersecurity but also for supporting Malaysia's commitment to digital<br />resilience and the principles of SDG-16.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/695Enhancing Poverty Rates Reliability Using Small Area Estimation2025-09-29T15:24:37+00:00Novia Permatasarinoviaprmatasari@gmail.com<p>This study systematically compares the performance of three Small Area Estimation<br />(SAE) methods—Empirical Best Linear Unbiased Predictor (EBLUP), Hierarchical Bayes (HB)<br />Beta, and HB Flexible Beta—using two different auxiliary data sources-Village Potential<br />(Podes) and Socio-Economic Registration data (Regsosek). The SAE methodologies were<br />applied in a case study focusing on Java Island, Indonesia. Direct estimates remain has high<br />Relative Standard Errors (RSE) above 25%, indicating low reliability. EBLUP methods<br />improved estimate reliability but still produced some unreliable estimates. The HB Beta method<br />further reduced RSE values, while the HB Flexible Beta model achieved the lowest RSE,<br />eliminating all unreliable estimates. Moreover, Socio-Economic Registration data consistently<br />resulted in lower RSE values compared to Village Potential data, particularly when used with<br />the HB Flexible Beta model. These result highlight that integrating advanced SAE models such<br />as HB Flexible Beta with high-quality administrative data such as Socio-Economic Registration<br />data is crucial for producing reliable and precise poverty estimates for more targeted and<br />effective poverty alleviation policies.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/518Estimating the Unemployment Rate at Sub-District Level in West Java Province in 2024 Using Hierarchical Bayesian Approach with Cluster Information2025-09-12T07:13:10+00:00Randy Daffa Adityarandydaffaa@gmail.comAwika Zukhrufahawikayz@gmail.comEksis Auliyaauliyaeksis@gmail.comDyah Widyastutidyahwdy22@gmail.comAdrian Lubislubisadrian06@gmail.comAnggie Nugrahanugrahanggie48@gmail.comSiti Muchlisohsitim@stis.ac.id<p>Unemployment is a substantial obstacle to growth in Indonesia, affecting both social<br />and economic stability. The Unemployment Rate is a crucial metric that quantifies the proportion<br />of the labor force actively pursuing work opportunities. The unemployment rate serves as a<br />critical indicator of labor market imbalances, essential for labor policy formulation and<br />assessment. Nonetheless, unemployment data has limitations, particularly at the micro-level,<br />owing to sample constraints. Small Area Estimation (SAE) can address these constraints. This<br />study estimates the unemployment rate at the sub-district level in West Java province for 2024<br />utilizing the Hierarchical Bayes Beta methodology and clustering techniques. The modeling<br />results indicate that most sub-districts exhibit a low to medium unemployment rate, however 21<br />locations demonstrate a very high unemployment rate, ranging from 23.00 percent to 48.06<br />percent.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/618A Hybrid Method for Standardising Civil Registration and Vital Statistics (CRVS) Location Data2025-09-11T08:14:34+00:00Ignatius Sandyawanisandyawan@student.unimelb.edu.auYeni Rimawatiyeni.rima@bps.go.idAri Rismansyahari.rismansyah@bps.go.id<p> Civil Registration and Vital Statistics (CRVS) systems in archipelagic contexts like<br />Indonesia face persistent challenges in location data standardisation due to free-text entries that<br />vary in spelling, formatting, and granularity. This study introduces a multi-stage hybrid<br />framework that systematically converts these unstructured entries into official administrative<br />codes using deterministic matching, fuzzy probabilistic matching, and geocoding. This study<br />processed 841,126 birth and death records using Python (Pandas, RapidFuzz, Geopy).<br />Cumulatively, all stages achieved a combined match rate of 85.44% for births and 67.12% for<br />deaths. The layered pipeline ensured speed, precision, and coverage for real-world CRVS data.<br />The findings demonstrate enhanced geographic precision in vital statistics, enabling more<br />reliable public health and demographic applications. Future improvements may include<br />transformer-based embeddings, active learning for ambiguous records, and uncertainty-aware<br />geocoding techniques. This framework establishes a scalable, robust pathway for elevating the<br />granularity and reliability of geolocated vital event data.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/707Comparison of Imputation Methods: Traditional, Machine Learning, and Deep Learning on Multivariate Time Series with MCAR and MNAR2025-10-04T07:37:02+00:00Ferigo Taufani Tri Hakikifferigo36@gmail.comNaufal Luthfan Tasbihi6003242014@student.its.ac.idAkila Akhtar El Dafi6003251026@student.its.ac.idNurfaudzan .faudzan.46@gmail.comAndi Shahifah Muthahharahshahifahm@gmail.com<p>This study compares the methods of Linear Interpolation, Kalman Filtering, SVR, and RNN-GRU for multivariate time series that exhibit linear trends and seasonality. Synthetic data for three variables were generated for small, medium, and large sample sizes. Missing values were systematically inserted using Missing Completely at Random (MCAR) and Missing Not at Random (MNAR) patterns with proportions of 10%, 20%, and 35%. The accuracy of imputation was evaluated using RMSE, MAPE, and R² over 150 simulation repetitions per scenario. The results indicate that each method has advantages under certain conditions. Linear Interpolation is suitable for data with linear trends, small sample sizes, and low to moderate missingness levels, and is effective for both MCAR and MNAR patterns. Kalman Filtering is optimal for medium to large datasets, particularly in handling linear and seasonal trend patterns with high proportions of missing data due to MCAR. SVR excels in large seasonal data scenarios with MNAR missingness patterns. RNN-GRU performs well under low missingness conditions, particularly for small seasonal datasets with MNAR patterns. These findings emphasise that the choice of imputation method should consider data size, trend patterns, and the missing data mechanism to minimise bias and preserve the integrity of the temporal structure.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/629The Influence of Child, Households, and Villages/Sub-Districts Characteristics on The Working Status of Children in East Nusa Tenggara Province 20242025-09-20T13:57:05+00:00Angga Prayogaanggaprayoga1019@gmail.comBudiasih .budiasih@stis.ac.id<p>The percentage of the poor population in East Nusa Tenggara Province is being the<br />fourth highest in Indonesia in 2024, but the highest percentage of child labor in Indonesia. The<br />purpose of this study is to find out the picture, influencing factors, and trends of factors affecting<br />child labor in East Nusa Tenggara Province in 2024. The unit of analysis was children aged 10-<br />17 years who were unmarried and not as head of household with a sample of 9,117 children from<br />6,123 households and 1,165 villages/sub-districts. The data used are Susenas Kor and Modules<br />March, as well as Podes 2024 sourced from BPS. The analysis method in this study is multilevel<br />binary logistics regression. The results of the study show that children who work are boys aged<br />15-17 years. The child lives in households with a low level of head of households’ education and<br />household work in the agricultural sector, a small number member of productive age, and have<br />micro and small enterprises, and live in villages/sub-districts with many micro and small<br />industries and the main source of income for most of the population in the agricultural sector.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/724Estimation of Energy Transition Index based on Official Statistics and Satellite Imagery Data 2025-09-22T07:30:16+00:00Sabilla Hamda Syahputri222112344@stis.ac.idWaris Marsisnowaris@stis.ac.id<p>Energy has a crucial role in sustaining human life, its implementation should be optimized based on the principles of sustainable development through a shift from non-renewable to renewable sources. To monitor this shift, the World Economic Forum (WEF) developed the Energy Transition Index (ETI), which measures national-level transitions using conventional statistical data. However, the ETI is limited to the country level, while more detailed assessments are needed at smaller administrative scales such as regencies and cities to capture regional specificities. This study addresses the gap by constructing an energy transition index at the regency/city level in Indonesia for 2024. The analysis integrates official statistics with satellite imagery data to overcome limitations in subnational data availability. Methodologically, Exploratory Factor Analysis and uncertainty analysis were applied. Among five scenario of uncertaincy analysis tested, scenario 1 featuring min-max normalization, unequal weighting across indicators and factors, and linear aggregation produced the most reliable results. The findings reveal that the index is composed of four main factors. Overall, Indonesia’s energy transition index values show a relatively even distribution, yet disparities remain evident across islands and between regencies/cities. Higher scores are concentrated in the western regions, while lower scores dominate the eastern parts of the country.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/556Analysis of Factors Affecting Deforestation in Riau From 2001 To 2023 Using The ARDL Approach2025-09-12T07:17:02+00:00I Wayan Divandra Maharesandya Sukajayaiwayandivandra@gmail.comEfri Diah Utamiefridiah@stis.ac.id<p>Forests are one of the most important elements for human life. One of Indonesia's<br />problems for decades has been high rates of deforestation. Riau is the province with the highest<br />total deforestation in Indonesia in the last 23 years. The government has implemented various<br />measures to achieve both short-term and long-term targets related to reducing deforestation.<br />Therefore, this study aims to analyze the variables suspected of influencing deforestation in the<br />short and long term using the Autoregressive Distributed Lag. The results of the study indicate<br />that the variables influencing deforestation in Riau Province in the short term are the GDP of the<br />agriculture, forestry, and fisheries sectors and forest and land fires. In the long term, the<br />significant influencing variables are the GDP of the agriculture, forestry, and fisheries sectors,<br />the implementation of Law No. 18 of 2013, and the extent of forest and land fires. Based on<br />these findings, in the short term, the government is expected to transform the agricultural sector<br />economy toward a more sustainable direction and halt the clearing of forest areas for oil palm<br />plantations, especially those conducted through forest burning. In the long term, the government<br />should further strengthen the implementation of the law.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/468Spatial Model for Food Security in Eastern Indonesia 20242025-10-04T07:29:35+00:00Fathiyah Nur Shohwah212212601@stis.ac.idImam Fathoni Arufi212212662@stis.ac.idMohammad Iqbal Wicaksono212212734@stis.ac.idNadia Lutfi Meilawati212212779@stis.ac.idNilam Cahya Meilani212212795@stis.ac.idGama Putra Danu Sohibiengamaputra@stis.ac.id<p>Food security is the condition of meeting food needs for the country down to the individual level, as measured by the availability, affordability, utilization, and stability of food. Despite being a basic human need, food security in Indonesia is not evenly distributed, especially in Eastern Indonesia. Based on these findings, this study aims to determine the general picture of food security and the factors influencing it in districts/cities in Eastern Indonesia in 2024. The method used is the Spatial Durbin Model (SDM) with an inverse distance weighting matrix. The results show that the variables Distribution of GRDP of Sector Agriculture, Forestry and Fishing, Poverty Rate, Average Years of Schooling, Lag of Food Security Index, Lag of Open Unemployment Rate, and Lag of Poverty Rate have a significant influence on the Food Security Index variable in districts/cities in Eastern Indonesia in 2024.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/477Improving The Accuracy of Area Sampling Frame Estimators for Agricultural Surveys Using Unequal Clustered Segment Sampling: The Case of Indonesia2025-09-15T06:11:16+00:00Hazanul Zikrahazanul.zikra@bps.go.idWidyo Pura Buanawiwid@bps.go.idYocco Bimartayocco.bimarta@bps.go.idNurina Paramitasarinurina@bps.go.id<p>Accurate rice production data are vital for maintaining national food security and formulating effective agricultural policies. In Indonesia, the Area Sampling Frame (KSA) method has been widely implemented to estimate rice harvest areas using segments of 300 meters×300 meters represented by nine observation points. However, this approach faces limitations, particularly the risk of undercoverage bias when estimating areas across different rice growth stages, especially if the observation points fall outside the target rice-growing regions as population area. To address this issue, the present study introduces the Unequal Clustered Segment Sampling method as an alternative to the traditional KSA approach. The Unequal Clustered Segment Sampling method improves estimation accuracy by refining the sampling frame and excluding non-target segments, spatial points located outside actual rice-growing regions. Through a design-based estimation framework, the proposed method accounts for unequal cluster sizes, allowing a more representative depiction of field conditions. The empirical results demonstrate that the Unequal Clustered Segment Sampling method significantly reduces bias and enhances the precision of rice area estimates compared to the conventional KSA. These findings suggest that incorporating unequal clustered segment sampling designs into KSA-based surveys can yield more reliable and representative estimates, particularly in heterogeneous or fragmented agricultural landscapes.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/691A Multi-Temporal Remote Sensing Approach to Quantify Land Cover Change and its Impact on Ecosystem Sustainability in Riau, Indonesia2025-09-16T08:11:00+00:00Novrian Maria Purba212212803@stis.ac.idFitri Hariyantiv3hyanti@gmail.comAndriansyah Muqiit Wardoyo Saputramuqiitsaputra@bps.go.id<p>This study analyzes land cover change in Riau Province from 2015 to 2024, focusing<br />on deforestation and degradation as indicators of ecosystem sustainability. Landsat 8 OLI/TIRS<br />and Landsat 9 OLI-2 imagery processed in Google Earth Engine (GEE), combined with MODIS<br />hotspot data (MOD14A1) and socioeconomic indicators—Gross Regional Domestic Product<br />(GRDP) and Open Unemployment Rate (OUR) from Statistics Indonesia (BPS)—were used to<br />assess spatiotemporal patterns. The Normalized Difference Vegetation Index (NDVI) was<br />applied with thresholds for deforestation (NDVI < –0.3) and degradation (–0.3 ? NDVI ? –0.1).<br />Results show that 2015 was the most severe period, dominated by peatland fires, while 2019<br />recorded forest loss at a lower intensity and 2020–2024 indicated partial vegetation recovery<br />linked to restoration efforts. Pelalawan, Indragiri Hilir, and Kampar were the most affected<br />districts. Correlation analysis revealed that fire hotspots had the strongest association with land<br />cover change, while economic and social indicators showed weaker relationships. Peatland fires<br />remain the main driver of land degradation, emphasizing the need to strengthen fire management,<br />peatland protection, and sustainable plantation governance to support Sustainable Development<br />Goal (SDG) 15 on Life on Land, particularly the target of Land Degradation Neutrality (15.3.1)<br />by 2030.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/714Small Area Estimation of Extreme Poverty Using Zero-Inflated Binomial GLMM: A District-Level Case Study in North Sumatra 20242025-09-16T07:06:07+00:00Marta Desna Fitria Br. Lumban Gaolmartadesnafitria@gmail.comBeta Septi Iryanibeta@bps.go.idEni Lestariningsihelen@bps.go.id<p>Eradicating extreme poverty is a key objective of Sustainable Development Goal (SDG) 1, with a global benchmark of reducing the proportion of people living below the US$1.90 PPP poverty line. However, in 2024, Indonesia—particularly North Sumatra Province—continues to face persistent challenges in achieving this target. Direct estimation based on the Foster-Greer-Thorbecke (FGT) formula using SUSENAS microdata suffers from large sampling errors (RSE > 25 percent) and zero estimates in multiple districts due to small or absent samples, indicating serious issues of zero inflation and overdispersion. To overcome these limitations, this study applies a model-based Small Area Estimation (SAE) approach using the Zero-Inflated Binomial Generalized Linear Mixed Model (ZIB-GLMM). This method incorporates auxiliary variables from the 2024 PODES dataset and effectively addresses the dual complexities of excess zeros and inter-district variability. Simulation results show that ZIB-GLMM outperforms conventional SAE models in terms of predictive accuracy and model stability. The proposed method offers realistic and policy-relevant district-level estimates of extreme poverty, providing robust evidence to inform targeted interventions and strengthen Indonesia’s national agenda to eradicate extreme poverty.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/541Data Collection for Nearest Public Facility Using Ball Tree Algorithm and Google Maps API2025-09-12T07:14:53+00:00Handika Ramadhanhandikaramadhan7@gmail.com<p>Accessibility to public facilities is a crucial factor in regional development, including<br />at the village level as the smallest administrative unit. The Central Bureau of Statistics (BPS)<br />currently collects data on public facilities and their distances to village offices through<br />interviews, making the results dependent on respondents’ perceptions. This research aims to<br />measure the nearest distance from village offices to public schools by utilizing the BallTree<br />algorithm and the Google Maps API. The dataset consists of 128 village offices and a list of<br />public schools classified into four categories. BallTree was used to filter the nearest school<br />candidates within a given radius, after which the route distance of the ten nearest candidates was<br />calculated using the Google Maps Distance Matrix API to identify the school with the nearest<br />route distance based on the road network. The findings show that straight-line distance often<br />aligns with route distance, although not at all, highlighting the importance of Google Maps route<br />calculation. This research concludes that combining BallTree and the Google Maps API<br />improves computational efficiency while providing objective and reliable information.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/433Disaggregating the Hidden: Small Area Estimates of Child Labor in Bali Province2025-09-19T04:51:52+00:00Ahmad Nadifa Al Agung212111861@stis.ac.idArlita Dwina Firlana Sari212111923@stis.ac.idClarissa Azarine212111973@stis.ac.idLisda Oktaviana212112158@stis.ac.idZidan Akbar Al Aqsha212112432@stis.ac.idNofita Istiananofita@stis.ac.id<p><span class="NormalTextRun SCXW145894874 BCX0">Child labor remains a critical concern in Indonesia, including in Bali Province, which exhibits a higher prevalence than the national average. However, efforts to formulate effective local policies are often hindered by the unreliability of child labor statistics at the regency/municipality level, primarily due to high Relative Standard Error (RSE) values. This study seeks to estimate more reliable proportion of child labor at the regency level in Bali through the application of Small Area Estimation (SAE). The analysis utilizes data from the August 2024 Sakernas survey, supplemented with contextual variables from the 2024 PODES dataset. The SAE approach employed was the Hierarchical Bayes method with a Beta distribution (HB-Beta). The findings indicate that the HB-Beta model yields better accurate estimates, as evidenced by RSE values below 25% across all regencies. This demonstrates the potential of the HB-Beta model produces more accurate estimates than direct estimates, as it can better reflect differences between regency and help design more effective local policies to reduce child labor.</span></p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/641Logical Modelling of Statistical Data Using the SDMX Standard: Case Study on the Quarterly Gross Regional Domestic Product Table2025-09-12T01:56:30+00:00Kartika Amandasarikartikamandasari49@gmail.comNano Yulian Pratamanano@bps.go.idFarhan Satria Aditamafarhan.satria@bps.go.idWaris Marsisnowaris@stis.ac.id<p>Poverty, as a national issue, necessitates data-driven policy planning informed by<br />accurate and consistent statistics. To ensure the optimal quality and consistency of statistical data<br />reporting across diverse regions, the adoption of an international standard is crucial. The<br />Statistical Data and Metadata Exchange (SDMX) standard facilitates the structured exchange of<br />data and metadata. This study aims to design and implement a statistical indicator data model<br />using the SDMX standard to improve table consistency. We utilized Quarterly Provincial Gross<br />Regional Domestic Product (GRDP) data as a case study and applied the Design Science<br />Research Method (DSRM) as the methodology. The results demonstrate that modeling the<br />GRDP data using SDMX yields a uniform and highly consistent table structure, significantly<br />enhancing the consistency of statistical data reporting across regions.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/474Application of the Geographically Weighted Negative Binomial Regression (GWNBR) Method to Tuberculosis Cases in North Sumatra Province in 20242025-09-16T07:54:49+00:00Titin Julianti Br Tinambunan212212899@stis.ac.idNisa Hayatun Nufus212212798@stis.ac.idNadia Lutfi Meilawati212212779@stis.ac.idRezky Rahma212212847@stis.ac.idFebri Wicaksonofebri@stis.ac.id<p>Tuberculosis is one of the leading causes of death worldwide. Approximately 1.2 million deaths occur annually due to tuberculosis. According to the World Health Organization (WHO), Indonesia is the second-largest tuberculosis country after India, with a 10% prevalence rate (WHO, 2024). According to Ministry of Health data, in 2024, North Sumatra was the province with the highest number of TB cases on Sumatra Island, with several cases above the national average, ranking third in Indonesia. The number of tuberculosis cases in North Sumatra is census data and is overdispersed, with spatial influences. Therefore, the method used is Geographically Weighted Negative Binomial Regression (GWNBR), which produces local parameters. The results show that GWNBR forms eight regional groups based on significant variables. Rainfall and per capita expenditure variables have a significant influence in all districts/cities, and the percentage of BCG immunizations and the percentage of smoking population have a significant influence in almost all regions. Meanwhile, health fund allocation only shows a significant influence in several districts/cities. The AIC value of the GWNBR is not smaller than the AIC value of the negative binomial regression. However, the GWNBR model can be used to examine the influence of independent variables on tuberculosis cases spatially in North Sumatra.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/581Development of Portal Pintar Utilization Evaluation Dashboard (Case Study: BPS Province of Bengkulu)2025-10-04T07:45:30+00:00Bony Parulian Josaphatbonyp@stis.ac.idRifka Humaira222112321@stis.ac.id<p>BPS Statistics Province of Bengkulu (BPS Provinsi Bengkulu) plays a role in<br />supporting statistical operations in Province of Bengkulu. As a vertical agency of Statistics<br />Indonesia (BPS), BPS Province of Bengkulu also holds an important role in providing statistical<br />data at the regional level. Naturally, BPS Province of Bengkulu also requires an integrated<br />system to facilitate all activities, such as providing easier and faster access to information for all<br />employees—both in reporting work progress and in monitoring the implementation of activities<br />such as agenda planning, facility usage, facility loan management, and cross-unit coordination.<br />Portal Pintar is a portal used to facilitate the management of various activities in BPS Province<br />of Bengkulu. By using Portal Pintar, users can access and manage various types of information<br />and documents, such as activity agendas, correspondence, and facility loan applications. BPS<br />Province of Bengkulu then produces periodic evaluations of Portal Pintar’s utilization, which are<br />distributed to all employees. However, the evaluations conducted are not yet visualized<br />automatically and in real time, hence the need to develop a Portal Pintar Utilization Evaluation<br />Dashboard in which visualizations are generated automatically and connected to Portal Pintar’s<br />API. Through the development of this dashboard, it is expected that the evaluation of Portal<br />Pintar’s utilization will become more integrated.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/498Do Extracurricular Activities give ‘Extra’ on Academic Performance? Evidence from Propensity Score Matching Methods2025-10-04T07:22:04+00:00Bryan Nozaledabmnozaleda@up.edu.ph<p>This study compares different statistical methods to determine whether participating<br />in extracurricular activities helps improve students’ academic performance. Utilizing a dataset<br />of 1,000 students, the study balances students who did and did not take part in extracurriculars<br />by adjusting for factors like study hours and attendance. It compares Nearest Mahalanobis<br />Distance, Nearest Neighbor Matching (with and without a caliper), Optimal Pair Matching,<br />Optimal Full Matching, Coarsened Exact Matching (CEM), and Inverse Probability Weighting<br />(IPW) based on covariate balance, sample retention, and average treatment effect. Results reveal<br />that IPW performs best in the covariates balance, reducing nearly all standardized mean<br />differences to near zero while retaining the majority of the dataset. Nearest Neighbor Matching<br />with Caliper and Optimal Pair Matching also perform well with significant treatment effect<br />estimates and relatively strong model fits. However, each method involves trade-offs in which<br />IPW excels in covariate balance but has a higher AIC, a slight compromise in model fit, while<br />Nearest Neighbor Matching with Caliper offers a balance between precision, model fit, and<br />sample retention. In contrast, CEM provides strong covariate balance for categorical variables<br />but results in significant sample loss, demonstrating the trade-off between strict matching criteria<br />and practical applicability. Conversely, Nearest Neighbor Matching without Caliper performed<br />poorly in balancing covariates. As evidenced by the average treatment effect estimates derived<br />from the propensity score matching (PSM) methods, this study concludes that participation in<br />extracurricular activities has a positive and significant impact on students' academic<br />performance, with study hours, attendance, and resource accessibility emerging as critical factors<br />as well. The novelty of this study is in comparing multiple statistical matching approaches side<br />by side in an educational context, providing guidance for researchers and policymakers.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/688Determinants of Comprehensive Understanding of Stunting among Indonesian Pregnant Women and Mothers of Toddlers Aged 0–23 Months in 20232025-09-16T07:19:36+00:00Agnes Rosihan Kristianti Silalahi112212456@stis.ac.idRini Rahanirinirahani@stis.ac.id<p>Stunting is a chronic nutritional disorder that remains a priority in Indonesia. As with<br />the second goal of the SDGs (zero hunger), the Ministry of Health (MoH) has implemented a<br />communication strategy for behavioural change and community empowerment through a class<br />program for pregnant women and mothers of toddlers class using the Maternal and Child Health<br />(MCH) book. However, it is still not optimal to increase the understanding of stunting. The 2023<br />Indonesian Health Survey (IHS) shows that women in Indonesia still have a poor comprehensive<br />understanding of stunting. It has includes pregnant women and breastfeeding mothers as key<br />target groups for stunting reduction. This study aims to describe and analyse the characteristics<br />of Indonesian pregnant women and mothers of toddlers aged 0–23 months that significantly<br />influence their comprehensive understanding levels of stunting. Data from 2023 IHS were<br />analysed using descriptive statistics with graph and table, together with inferential analysis<br />through ordinal logistic regression using the Proportional Odds Model (POM). The result shows<br />that the majority of these mothers have a poor level of comprehensive understanding of stunting,<br />with five variables having a significant influence, namely: access to information, education level,<br />employment status, socioeconomic status, and residence area.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/623Did the Digital Push Last? E-Commerce and Rural Agricultural Earnings in Indonesia During and After COVID-19, Evidence from Sakernas2025-09-22T13:47:53+00:00Kadir Ruslankadirsst@gmail.comWeni Lidya Sukmawenilidya@bps.go.id<p>This paper examines the impact of e-commerce adoption on earnings and income<br />distribution among rural agricultural employers in Indonesia, both during and after the COVID19 pandemic. Using microdata from the National Labour Force Survey/Sakernas (2018–2024)<br />and applying probit, OLS, Propensity Score Matching, and quantile regression models, we<br />identify the determinants of adoption and its impact on earnings. Adoption was strongly driven<br />by education, training, and enterprise characteristics, while older age and reliance on unpaid<br />household labor constrained uptake. Results show that e-commerce adopters earned substantially<br />higher than non-adopters (more than 30 percent) both during and after the pandemic, confirming<br />sustained income gains beyond the crisis. Quantile regressions reveal that the lowest-income<br />employers benefited most, with earnings gains exceeding 50 percent at the bottom quantile<br />during the pandemic. Although relative advantages shifted toward higher earners after the<br />pandemic, large and significant effects remained for the lowest-income groups. These findings<br />indicate that e-commerce not only enhances market access but also contributes to improving<br />income distribution. Policy interventions to strengthen digital literacy, rural infrastructure, and<br />financial access are essential to preserve its inclusive role and ensure that vulnerable agricultural<br />employers continue to benefit disproportionately.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/561The Individual and Contextual Factors of Precarious Employee Status of Youth Workers in Indonesia 2024: Application Multilevel Binary Logistic Regression2025-09-12T07:20:55+00:00Arya Samuel Mandy112212519@stis.ac.idSugiarto .soegie@stis.ac.id<p>Human resources are a strategic component for countries in achieving development<br />goals and promoting progress. Among age groups, youth play an important role as drivers of a<br />country's development. However, the challenge of obtaining decent work is a serious problem<br />that causes many youth people in Indonesia to be forced into precarious employment. In the last<br />four years, the Precarious Employment Rate (PER) of youth people in Indonesia in 2024 has<br />increased dramatically compared to the previous year, even becoming the highest among all age<br />groups. This study aims to determine the general picture and analyze the individual and<br />contextual factors that influence the status of precarious employees among youth workers in<br />Indonesia. The analysis method used is multilevel binary logistic regression. The results of the<br />study show that 85.97 percent of youth workers in Indonesia have precarious employee status.<br />The analysis shows that individual factors such as gender, marital status, education level,<br />participation in training, regional classification, employment sector, labor union membership,<br />and contextual factors such as the provincial minimum wage have a significant effect on the<br />precarious employee status of youth workers in Indonesia in 2024.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statisticshttps://proceedings.stis.ac.id/icdsos/article/view/475Regional Clustering of Food Insecurity to Support the Attainment of SDG 2: Zero Hunger through Machine Learning Approaches2025-09-15T14:30:13+00:00Siti Nuradillasiti.nuradilla@apps.ipb.ac.idWawan Saputraw2nwawan@apps.ipb.ac.idMuhammad Rizalmmmdrizal@apps.ipb.ac.id<p>Food security remains a persistent development challenge in Indonesia, with regional disparities posing significant barriers to achieving equitable access to nutritious and sufficient food. This study aims to classify and cluster districts and cities in Indonesia based on their food security vulnerability levels, thereby supporting the attainment of SDG 2: Zero Hunger. We employed a machine learning approach using a dataset of 514 regions and nine food security indicators sourced from national databases. The classification phase compared three algorithms, Random Forest, XGBoost, and LightGBM, under multiple data preprocessing scenarios, including outlier handling (IQR and Isolation Forest) and class balancing (SMOTE). LightGBM with IQR preprocessing delivered the best performance, achieving an accuracy and F1-score of 0.984. For clustering, DBSCAN and HDBSCAN were applied using the six most important features identified by the classifier. DBSCAN showed slightly better performance based on Silhouette Score (0.5639), resulting in three regional groupings: food-secure, highly vulnerable, and outlier regions. The analysis revealed that socio-economic factors and access to basic infrastructure remain critical determinants of food insecurity. The results underscore the importance of data-driven approaches in policy formulation and highlight the value of machine learning in producing more targeted, efficient, and adaptive food security interventions in Indonesia.</p>2025-12-22T00:00:00+00:00Copyright (c) 2025 Proceedings of The International Conference on Data Science and Official Statistics