Performance Comparison of Hot-Deck Imputation, K-Nearest Neighbor Imputation, and Predictive Mean Matching in Missing Value Handling, Case Study: March 2019 SUSENAS Kor Dataset

Tsasya Raudhatunnisa; Nori Wilantika

doi:10.34123/icdsos.v2021i1.93

Performance Comparison of Hot-Deck Imputation, K-Nearest Neighbor Imputation, and Predictive Mean Matching in Missing Value Handling, Case Study: March 2019 SUSENAS Kor Dataset

Authors

Tsasya Raudhatunnisa Polytechnic of Statistic STIS
Nori Wilantika Polytechnic of Statistic STIS

DOI:

https://doi.org/10.34123/icdsos.v2021i1.93

Keywords:

Missing value, Hot-Deck Imputation, KNNI, PMM, comparison

Abstract

Missing value can cause bias and makes the dataset not represent the actual situation. The selection of methods for handling missing values is important because it will affect the estimated value generated. Therefore, this study aims to compare three imputation methods to handle missing values—Hot-Deck Imputation, K-Nearest Neighbor Imputation (KNNI), and Predictive Mean Matching (PMM). The difference in the way the three methods work causes the estimation results to be different. The criteria used to compare the three methods are the Root Mean Squared Error (RMSE), Unsupervised Classification Error (UCE), Supervised Classification Error (SCE), and the time used to run the algorithm. This study uses two pieces of analysis, comparison analysis, and scoring analysis. The comparative analysis applying a simulation that pays attention to the mechanism of missing value. The mechanism of the missing value used in the simulation is Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR). Then, scoring analysis aims to narrow down the results of comparative analysis by giving a score on the results of the imputation of the three methods. The result suggests Hot-Deck Imputation is the most excellent in dealing with a missing value based on the score.

Downloads

Published

2022-01-04

How to Cite

Raudhatunnisa, T., & Wilantika, N. (2022). Performance Comparison of Hot-Deck Imputation, K-Nearest Neighbor Imputation, and Predictive Mean Matching in Missing Value Handling, Case Study: March 2019 SUSENAS Kor Dataset. Proceedings of The International Conference on Data Science and Official Statistics, 2021(1), 753–770. https://doi.org/10.34123/icdsos.v2021i1.93

Download Citation

Issue

Vol. 2021 No. 1 (2021): Proceedings of 2021 International Conference on Data Science and Official Statistics (ICDSOS)

Section

Official Statistics

Performance Comparison of Hot-Deck Imputation, K-Nearest Neighbor Imputation, and Predictive Mean Matching in Missing Value Handling, Case Study: March 2019 SUSENAS Kor Dataset

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

SUPPORTED BY

SITE LINKS

CONTACT US