Evaluasi Teknik Resampling untuk Class Balancing dalam Analisis Sentimen Kesehatan Mental Berbasis Bi-LSTM

Authors

  • Cindy Sintiya Mikroskil University
  • Grace Helena Hutagaol Mikroskil University
  • David Bate`e Mikroskil University
  • Syanti Irviantina Mikroskil University

DOI:

https://doi.org/10.55601/jsm.v26i2.1799

Keywords:

Class Balancing, Resampling, Analisis Sentimen, Kesehatan Mental, LSTM

Abstract

Ketidakseimbangan data (imbalanced data) sering menjadi tantangan utama dalam analisis sentimen, terutama ketika model pembelajaran mesin cenderung mengabaikan kelas minoritas yang justru memuat informasi kritis. Penelitian ini mengevaluasi efektivitas teknik class balancing dengan metode resampling, yaitu Random Under-Sampling (RUS) dan Synthetic Minority Over-sampling Technique (SMOTE), dalam meningkatkan performa model LSTM (Long Short-Term Memory) untuk analisis sentimen kesehatan mental. Hasil penelitian menunjukkan bahwa metode SMOTE dapat meningkatkan F1-score pada kelas “Suicidal” dari 64,4% menjadi 72,6%, meskipun terjadi penurunan pada kelas “Depression” dari 73,4% menjadi 59,9%. Sementara itu, metode RUS cenderung menurunkan performa secara keseluruhan, dengan akurasi model turun dari 80,5% menjadi 77,8% akibat penghapusan data secara acak yang mengurangi kualitas representasi data. Penelitian ini menyimpulkan bahwa meskipun teknik resampling dapat membantu menyeimbangkan data, namun penerapannya pada data teks memerlukan kehati-hatian untuk menghindari efek negatif pada performa model.

References

T. Zhang, K. Yang, S. Ji, and S. Ananiadou, “Emotion fusion for mental illness detection from social media: A survey,” Information Fusion, vol. 92, pp. 231–246, Apr. 2023, doi: 10.1016/j.inffus.2022.11.031.

M. Wankhade, A. C. S. Rao, and C. Kulkarni, “A survey on sentiment analysis methods, applications, and challenges,” Artif Intell Rev, vol. 55, no. 7, pp. 5731–5780, Oct. 2022, doi: 10.1007/s10462-022-10144-1.

P. Shamsolmoali, M. Zareapoor, L. Shen, A. H. Sadka, and J. Yang, “Imbalanced data learning by minority class augmentation using capsule adversarial networks,” Neurocomputing, vol. 459, pp. 481–493, Oct. 2021, doi: 10.1016/j.neucom.2020.01.119.

E. Gentili et al., “Machine learning from real data: A mental health registry case study,” Computer Methods and Programs in Biomedicine Update, vol. 5, p. 100132, 2024, doi: 10.1016/j.cmpbup.2023.100132.

P. Zweifel, “Mental health: The burden of social stigma,” Int J Health Plann Manage, vol. 36, no. 3, pp. 813–825, May 2021, doi: 10.1002/hpm.3122.

M. Nurul Puji, “Imbalanced Data - Data yang Tidak Seimbang pada Machine Learning.” Accessed: Jan. 05, 2025. [Online]. Available: https://base.binus.ac.id/automotive-robotics-engineering/2024/10/07/imbalanced-data-data-yang-tidak-seimbang-pada-machine-learning/

P. Kumar, R. Bhatnagar, K. Gaur, and A. Bhatnagar, “Classification of Imbalanced Data:Review of Methods and Applications,” IOP Conf Ser Mater Sci Eng, vol. 1099, no. 1, p. 012077, Mar. 2021, doi: 10.1088/1757-899X/1099/1/012077.

Z. A. Sayyed, “Study of sampling methods in sentiment analysis of imbalanced data,” Jun. 2021.

T. Wongvorachan, S. He, and O. Bulut, “A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining,” Information, vol. 14, no. 1, p. 54, Jan. 2023, doi: 10.3390/info14010054.

A. Hanafiah, Y. Arta, H. O. Nasution, and Y. D. Lestari, “Penerapan Metode Recurrent Neural Network dengan Pendekatan Long Short-Term Memory (LSTM) Untuk Prediksi Harga Saham,” Bulletin of Computer Science Research, vol. 4, no. 1, pp. 27–33, Dec. 2023, doi: 10.47065/bulletincsr.v4i1.321.

Joyeeta Dey and Dhyani Desai, “NLP Based Approach for Classification of Mental Health Issues using LSTM and GloVe Embeddings,” International Journal of Advanced Research in Science, Communication and Technology, pp. 347–354, Jan. 2022, doi: 10.48175/IJARSCT-2296.

K. Mao et al., “Prediction of Depression Severity Based on the Prosodic and Semantic Features With Bidirectional LSTM and Time Distributed CNN,” IEEE Trans Affect Comput, vol. 14, no. 3, pp. 2251–2265, Jul. 2023, doi: 10.1109/TAFFC.2022.3154332.

D. Setiawan, K. Stefani, Y. J. Shandy, and C. A. F. Patra, “Sistem Analisa Harga Saham Menggunakan Algoritma Long Short Term Memory (LSTM),” Media Informatika, vol. 21, no. 3, pp. 264–279, Mar. 2023, doi: 10.37595/mediainfo.v21i3.159.

Ž. Ð. Vujovic, “Classification Model Evaluation Metrics,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 6, 2021, doi: 10.14569/IJACSA.2021.0120670.

J. Görtler et al., “Neo: Generalizing Confusion Matrix Visualization to Hierarchical and Multi-Output Labels,” in CHI Conference on Human Factors in Computing Systems, New York, NY, USA: ACM, Apr. 2022, pp. 1–13. doi: 10.1145/3491102.3501823.

F. Thabtah, S. Hammoud, F. Kamalov, and A. Gonsalves, “Data imbalance in classification: Experimental evaluation,” Inf Sci (N Y), vol. 513, pp. 429–441, Mar. 2020, doi: 10.1016/j.ins.2019.11.004.

A. Kanukolanu, D. S. P. Kumar, and A. V. S. Abhishek, “Augmentation Techniques Analysis with removal of Class Imbalance using PyTorch for Intel Scene Dataset,” 2022.

A. Lopez-del Rio, M. Martin, A. Perera-Lluna, and R. Saidi, “Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction,” Sci Rep, vol. 10, no. 1, p. 14634, Sep. 2020, doi: 10.1038/s41598-020-71450-8.

S. Ullah and S.-H. Song, “Design of compensation algorithms for zero padding and its application to a patch based deep neural network,” PeerJ Comput Sci, vol. 10, p. e2287, Aug. 2024, doi: 10.7717/peerj-cs.2287.

pytorch.org, “Preparing your data for training with Pytorch DataLoaders.” Accessed: Dec. 26, 2024. [Online]. Available: https://pytorch.org/tutorials/beginner/basics/data_tutorial.html#preparing-your-data-for-training-with-dataloaders

pytorch.org, “Pytorch Neural Network Layer: Embedding.” Accessed: Dec. 26, 2024. [Online]. Available: https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html#torch.nn.Embedding

pytorch.org, “Pytorch Neural Network Layer: Dropout”, Accessed: Dec. 26, 2024. [Online]. Available: https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html#torch.nn.Dropout

pytorch.org, “Pytorch Loss Function: CrossEntropyLoss.” Accessed: Dec. 27, 2024. [Online]. Available: https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss

A. Mao, M. Mohri, and Y. Zhong, “Cross-Entropy Loss Functions: Theoretical Analysis and Applications,” in Proceedings of the 40th International Conference on Machine Learning, A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, Eds., in Proceedings of Machine Learning Research, vol. 202. PMLR, Dec. 2023, pp. 23803–23828. [Online]. Available: https://proceedings.mlr.press/v202/mao23b.html

A. Défossez, L. Bottou, F. Bach, and N. Usunier, “A Simple Convergence Proof of Adam and Adagrad,” Mar. 2020.

Downloads

Published

31-10-2025