Automated Indonesian Text Augmentation with Web-Based Application Using Flask Framework

Authors

  • Iftitah Athiyyah Rahma Politeknik Statistika STIS
  • Lya Hulliyyatus Suadaa Politeknik Statistika STIS

DOI:

https://doi.org/10.34123/icdsos.v2023i1.324

Keywords:

text augmentation, text classification, imbalanced data, web application

Abstract

In real world, data and resources available for text classification are limited. One of issues on labelled data is imbalanced data. Problem of imbalanced data affects performance and accuracy of model because the model only focuses on data with majority label. Therefore, the measure of model accuracy cannot describe the true quality of model. To overcome this, an oversampling approach is carried out. Text-based oversampling is known as text augmentation. However, NLP resources for Indonesian, especially in performing text augmentation, are still limited. Therefore, this research conducts development of a web application to augment Indonesian text automatically. The application was bulit using prototype method. The application was successfully built and can facilitate users to perform augmentation automatically for all texts in the dataset. Users can select preferred augmentation technique and are required to upload datasets as input. The output of application is same dataset file as input with an additional column containing synthetic text augmented by the application. This application can contribute to further research in performing text augmentation for Indonesians.

Downloads

Published

2023-12-29

How to Cite

Rahma, I. A., & Suadaa, L. H. (2023). Automated Indonesian Text Augmentation with Web-Based Application Using Flask Framework. Proceedings of The International Conference on Data Science and Official Statistics, 2023(1), 96–108. https://doi.org/10.34123/icdsos.v2023i1.324