Automated Indonesian Text Augmentation with Web-Based Application Using Flask Framework


  • Iftitah Athiyyah Rahma Politeknik Statistika STIS
  • Lya Hulliyyatus Suadaa Politeknik Statistika STIS



text augmentation, text classification, imbalanced data, web application


In real world, data and resources available for text classification are limited. One of issues on labelled data is imbalanced data. Problem of imbalanced data affects performance and accuracy of model because the model only focuses on data with majority label. Therefore, the measure of model accuracy cannot describe the true quality of model. To overcome this, an oversampling approach is carried out. Text-based oversampling is known as text augmentation. However, NLP resources for Indonesian, especially in performing text augmentation, are still limited. Therefore, this research conducts development of a web application to augment Indonesian text automatically. The application was bulit using prototype method. The application was successfully built and can facilitate users to perform augmentation automatically for all texts in the dataset. Users can select preferred augmentation technique and are required to upload datasets as input. The output of application is same dataset file as input with an additional column containing synthetic text augmented by the application. This application can contribute to further research in performing text augmentation for Indonesians.




How to Cite

Rahma, I. A., & Suadaa, L. H. (2023). Automated Indonesian Text Augmentation with Web-Based Application Using Flask Framework. Proceedings of The International Conference on Data Science and Official Statistics, 2023(1), 96–108.