Application of Resampling and Boosting Methods Using the C5.0 Algorithm
Case Study Indonesia Family Survey Data
Hypertension is a non-communicable disease that is characterized by an increase in systolic and diastolic blood pressure of more than 140 mmHg and or 90 mmHg. Hypertension needs to get more attention the condition is because hypertension will cause complications in the target organs and this disease does not appear to show significant symptoms at the beginning of the disease because it is called "silent disease". The study discusses the integration method of resampling and boosting in predicting hypertension status using the C5.0 algorithm. Classification of the C5.0 Algorithm by applying to resample increases performance specificity and AUC. Random oversampling (ROS) increased the specificity by 95.67% and AUC increased by 91.11%. Random over-under sampling (ROUS) increased specificity by 88.84% and AUC increased by 87.13%. In addition, applying boosting to the C5.0 algorithm that has been reapplied increases the accuracy performance. Random oversampling (ROS) increased accuracy by 93.86% and random over-under sampling (ROUS) increased accuracy by 89.98%. The response variables that contributed the most were high cholesterol and heart problems. The application of resampling and boosting to the contribution of high cholesterol and heart problems always topped the list.