Bangla Suicidal Ideation Detection: Performance and Efficiency Benchmark of Simple and Complex Classifiers on Social Media Data

Bangla Suicide Detection NLP Low-Resource Language Ridge Classifier Social Media Mental Health

Authors

  • Jahangir Hussen Department of Computer Science and Engineering, Sonargaon University (SU), Dhaka, Bangladesh, Bangladesh
  • Mohammad Rashed Hasan Polas
    rashedhasanpalash@gmail.com
    Department of Business Administration, Sonargaon University (SU), Dhaka, Bangladesh https://orcid.org/0000-0002-6080-1075
  • MD Yusuf Mia Department of Computer Science and Engineering, Sonargaon University (SU), Dhaka, Bangladesh
  • Mst Kohily Department of Computer Science and Engineering, Sonargaon University (SU), Dhaka, Bangladesh
  • Shahariar Halim Department of Computer Science and Engineering, Sonargaon University (SU), Dhaka, Bangladesh
  • Imran Hossen Department of Computer Science and Engineering, Sonargaon University (SU), Dhaka, Bangladesh
Vol. 9 No. 1 (2025)
Original Research
March 1, 2025

Downloads

Suicide is a persistent global public health crisis necessitating scalable early detection systems outside traditional clinical settings. Despite significant computational strides in specialist high-resource languages, the vast Bengali (Bangla)–speaking populace is severely underrepresented due to data scarcity, morphological complexity, and rampant code-mixing (Banglish), which substantially hinder standard Natural Language Processing (NLP) approaches. This work bridges this technological gap by developing and testing a clinically sound Bangla Suicide Risk Classification System from a socially labeled social media corpus. Empirical findings attest that a feature engineering and Character N-gram TF-IDF vectorization approach is optimal for low-resource languages (LRLs) with high robustness to linguistic noise and sparsity. An extensive benchmarking of eleven Machine Learning (ML) and Deep Learning (DL) models reveals that, although the Bi-directional Long Short-Term Memory (BiLSTM) model achieves the best predictive performance (Accuracy: 0.9228, F1: 0.9211), it suffers from high latency (≈5.23 seconds per post) and is therefore not feasible for real-time triage. In contrast, the lightweight Ridge Classifier (RC) model achieves comparable Accuracy of 0.9150 (F1: 0.9150) with low latency (≈0.32 seconds) and offers approximately 16× faster inference. The study concludes that the RC model is the optimal deployable triage system since it balances predictive performance and computation efficiency. Apart from this, ethical deployment is ensured by Explainable AI (XAI) for detection of high-weighted n-grams (e.g., "মর", "যন্ত্রর", "শেষ") and Dynamic Threshold Tuning (Human-in-the-Loop) for adaptive sensitivity, forming an efficient and sustainable suicide prevention system for the Bangla-speaking community.