Optimized Feature Engineering for Transaction Fraud Detection Using Sequential and HMM-Based Features

Authors

  • Kaung Wai Thar Faculty of Computer Science, University of Information Technology, Yangon, Myanmar
  • Thinn Thinn Wai Faculty of Computer Science, University of Information Technology, Yangon, Myanmar

DOI:

https://doi.org/10.34123/icdsos.v2025i1.529

Keywords:

Ensemble Methods, Explainable AI, Feature Engineering, Fraud Detection, Hidden Markov Model, Imbalanced Learning, Sequential Features

Abstract

Fraud detection in financial transactions remains a major challenge because fraudulent activities are extremely rare—often described as finding a “needle in a haystack”— and must be detected in real time. This study presents a hybrid feature engineering framework that integrates lightweight sequential indicators with Hidden Markov Model (HMM)-based behavioural features to improve accuracy and interpretability. Using the PaySim dataset containing 2.77 million transactions (0.2965% fraud), we extracted 22 sequential and 14 HMMbased features, from which 28 highly discriminative variables were retained. To address class imbalance, a batch-wise SMOTETomek approach was applied, expanding 1.94 million clean samples to 3.86 million balanced samples. Experimental results show that HMM-based features alone yield moderate performance (ROC AUC = 0.778, F2 = 0.051), but the combined ensemble of tuned XGBoost and LightGBM achieves superior accuracy (ROC AUC = 0.9983, F2 = 0.8431, MCC = 0.827). SHAP analysis identifies HMM-derived entropy and state likelihoods, together with transaction amount dynamics, as key predictors. The results demonstrate that optimized feature engineering plays a crucial role in achieving accurate, scalable, and interpretable fraud detection.

Downloads

Published

2025-12-22

How to Cite

Wai Thar, K., & Thinn Wai, T. (2025). Optimized Feature Engineering for Transaction Fraud Detection Using Sequential and HMM-Based Features. Proceedings of The International Conference on Data Science and Official Statistics, 2025(1), 126–140. https://doi.org/10.34123/icdsos.v2025i1.529