Entity Matching of Shop Accounts in Online Commerce Portals
Currently, online marketplace data are valuable data sources to be analyzed for
various purposes. In the data collecting phases, duplication of shop accounts was found, resulting in biased analysis. This study examines the development of a mechanism to identify duplicate entities, i.e. store accounts, between different online marketplaces, or commonly known as entity matching. Word similarity algorithms were adopted as the core elements of our approach. Additionally, we present an entity matching model by examining logistic
regression, naive Bayes, and random forest to find the best model for classifying store account similarities. Top online marketplaces in Indonesia are the object of our study, limited to one developing municipality, i.e. Sleman, DI Yogyakarta. The results show the best model has an accuracy value of 0.961, precision of 0.963, a recall of 0.958, and an F1-score of 0.962. Therefore, these results are acceptable for duplicate identification.