Big Data for Small Area Estimation: Happiness Index with Twitter Data


  • Sheerin Dahwan Aziz Badan Pusat Statistik
  • Azka Ubaidillah



Data availability for small area level is one of the keys to the success of regional development. However, direct estimation of small areas can produce high error due to inadequate sample sizes so the estimation is not reliable. One of alternative solution to this problem is to use the Small Area Estimation (SAE) method which can improve precision by "borrows strength" of the corresponding region information or auxiliary variable information that is strongly related to the response variable. This study uses two SAE models, namely SAE EBLUP Fay-Herriot model with auxiliary variables Podes data and SAE with Error Measurement with auxiliary variable Twitter data. Estimation results using the SAE method are better than direct estimates. This is shown by the RSE value which produced from SAE method, both the EBLUP model and Measurement Error, is smaller than the direct estimate. Therefore, big data can be used as an alternative variable in the SAE model because the data is available in real-time, covers up to the smallest area, and relatively low cost.