A Hybrid Method for Standardising Civil Registration and Vital Statistics (CRVS) Location Data
DOI:
https://doi.org/10.34123/icdsos.v2025i1.618Keywords:
CRVS standardisation, fuzzy record linkage, geocoding, spatial proximity analysis, vital statisticsAbstract
Civil Registration and Vital Statistics (CRVS) systems in archipelagic contexts like
Indonesia face persistent challenges in location data standardisation due to free-text entries that
vary in spelling, formatting, and granularity. This study introduces a multi-stage hybrid
framework that systematically converts these unstructured entries into official administrative
codes using deterministic matching, fuzzy probabilistic matching, and geocoding. This study
processed 841,126 birth and death records using Python (Pandas, RapidFuzz, Geopy).
Cumulatively, all stages achieved a combined match rate of 85.44% for births and 67.12% for
deaths. The layered pipeline ensured speed, precision, and coverage for real-world CRVS data.
The findings demonstrate enhanced geographic precision in vital statistics, enabling more
reliable public health and demographic applications. Future improvements may include
transformer-based embeddings, active learning for ambiguous records, and uncertainty-aware
geocoding techniques. This framework establishes a scalable, robust pathway for elevating the
granularity and reliability of geolocated vital event data.