Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Detecting Mental Disorders in Social Media Using a Transformer-Based Ensemble of Binary Classifiers

Oleksandr Ovcharuk, Olexander Mazurets, Maryna Molchanova, Alexander Kirpich, Pavel Skums, Olena Sobko, View ORCID ProfileOlexander Barmak, Iurii Krak, Sergiy Yakovlev
doi: https://doi.org/10.64898/2025.12.16.25342390
Oleksandr Ovcharuk
1Department of Computer Science, Khmelnytskyi National University, Khmelnytskyi, Ukraine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Olexander Mazurets
1Department of Computer Science, Khmelnytskyi National University, Khmelnytskyi, Ukraine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Maryna Molchanova
1Department of Computer Science, Khmelnytskyi National University, Khmelnytskyi, Ukraine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alexander Kirpich
2Department of Population Health Sciences, Georgia State University, Atlanta, GA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Pavel Skums
5School of Computing, University of Connecticut, Storrs, CT, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Olena Sobko
1Department of Computer Science, Khmelnytskyi National University, Khmelnytskyi, Ukraine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Olexander Barmak
1Department of Computer Science, Khmelnytskyi National University, Khmelnytskyi, Ukraine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Olexander Barmak
  • For correspondence: barmako{at}khmnu.edu.ua
Iurii Krak
3Department of Theoretical Cybernetics, Taras Shevchenko National University, Kyiv, Ukraine
4Laboratory of Communicative Information Technologies, V.M. Glushkov Institute of Cybernetics, Kyiv, Ukraine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sergiy Yakovlev
6Institute of Computer Science and Artificial Intelligence, V.N. Karazin Kharkiv National University, Kharkiv, Ukraine
7Institute of Mathematics, Lodz University of Technology, Lodz, Poland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

This study introduces a novel transformer-based ensemble framework for the multi-label detection of mental health disorders from social media posts. Unlike traditional multi-class approaches that often struggle with comorbidity, the proposed method employs a binary relevance strategy using fine-tuned DistilBERT models to identify co-occurring conditions, including depression, anxiety, and narcissistic personality disorder. To address class imbalance and optimize decision boundaries, the framework integrates a composite loss function (focal, dice, and log loss) and utilizes Youden’s J statistic for threshold calibration. Validation on textual datasets demonstrates the efficacy of this approach, with an overall F1-score of 0.930 and AUC values exceeding 0.89. Comparative analysis suggests that decomposing complex diagnostic tasks into independent binary problems significantly reduces inter-class confusion relative to standard multi-class baselines. Furthermore, a qualitative error analysis highlights specific linguistic challenges, such as contextual polarity shifting, metaphorical ambiguity, and colloquial usage, that impact model specificity. The findings demonstrate the potential of the proposed framework as a robust screening tool for online mental health monitoring, while underscoring the necessity of human oversight to mitigate linguistic misinterpretations.

Author summary Mental health disorders such as depression, anxiety, and narcissistic personality disorder represent a major global health challenge. This work proposes a method that employs transformer-based deep learning models to analyze social media posts for mental health assessment. A significant hurdle in automated diagnosis is that these conditions often occur together (comorbidity), whereas many existing Artificial Intelligence (AI) systems are designed to detect only a single disorder at a time. This study proposes a solution using a “multi-label” deep learning framework. Rather than relying on a single multi-class classifier, the approach utilizes an ensemble of specialized binary models, each trained to detect indicators of a specific disorder.

This design reduces classification confusion between clinically similar conditions, such as depression and anxiety. The method was evaluated on publicly available datasets, had an F1-score of 0.930 which outperformed the existing approaches. The presented approach demonstrated high effectiveness, achieving better separation between clinically similar disorders compared to traditional methods. Crucially, the detailed investigation beyond the standard statistical metrics was performed which looked into specific models mistakes. It was found that, while the presented AI model is highly sensitive, it can be confused by the specifics of the language such as metaphors (e.g., “feeling like a pressure cooker”), negations (e.g., “I am not worried”), and the colloquial clinical terms. These results highlight that AI is a powerful tool which can be used for early screening and continuous monitoring on social media, while it still requires careful calibration and human oversight to distinguish between genuine symptoms and everyday emotional expression.

The findings demonstrate that analyzing social media texts with advanced machine learning techniques can serve as a powerful complementary tool to clinical diagnostics. While not intended to completely replace professional evaluation, the proposed approach can help identify potential risks, promote earlier detection of mental health disorders, support preventive interventions, and ultimately improve access to care.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study did not receive any funding

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

All data are fully available without restriction: The datasets underlying the results of this study are available on Kaggle: the Text Classification dataset (https://www.kaggle.com/datasets/comsys/text-classification) and the Depression: Reddit Dataset (Cleaned) (https://www.kaggle.com/datasets/infamouscoder/depression-reddit-cleaned). The source code and pre-processing scripts are available on GitHub at https://github.com/oovcharuk/MentalHealthTextClassifier.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

All data presented in this paper are contained in the manuscript and in references provided in the manuscript.

https://www.kaggle.com/datasets/comsys/text-classification

https://www.kaggle.com/datasets/infamouscoder/depression-reddit-cleaned

https://github.com/oovcharuk/MentalHealthTextClassifier

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted December 18, 2025.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Detecting Mental Disorders in Social Media Using a Transformer-Based Ensemble of Binary Classifiers
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Detecting Mental Disorders in Social Media Using a Transformer-Based Ensemble of Binary Classifiers
Oleksandr Ovcharuk, Olexander Mazurets, Maryna Molchanova, Alexander Kirpich, Pavel Skums, Olena Sobko, Olexander Barmak, Iurii Krak, Sergiy Yakovlev
medRxiv 2025.12.16.25342390; doi: https://doi.org/10.64898/2025.12.16.25342390
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Detecting Mental Disorders in Social Media Using a Transformer-Based Ensemble of Binary Classifiers
Oleksandr Ovcharuk, Olexander Mazurets, Maryna Molchanova, Alexander Kirpich, Pavel Skums, Olena Sobko, Olexander Barmak, Iurii Krak, Sergiy Yakovlev
medRxiv 2025.12.16.25342390; doi: https://doi.org/10.64898/2025.12.16.25342390

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (576)
  • Allergy and Immunology (868)
  • Anesthesia (306)
  • Cardiovascular Medicine (4483)
  • Dentistry and Oral Medicine (449)
  • Dermatology (385)
  • Emergency Medicine (615)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1528)
  • Epidemiology (15282)
  • Forensic Medicine (31)
  • Gastroenterology (1134)
  • Genetic and Genomic Medicine (6650)
  • Geriatric Medicine (671)
  • Health Economics (1006)
  • Health Informatics (4606)
  • Health Policy (1378)
  • Health Systems and Quality Improvement (1624)
  • Hematology (544)
  • HIV/AIDS (1276)
  • Infectious Diseases (except HIV/AIDS) (15965)
  • Intensive Care and Critical Care Medicine (1111)
  • Medical Education (626)
  • Medical Ethics (147)
  • Nephrology (674)
  • Neurology (6698)
  • Nursing (346)
  • Nutrition (1006)
  • Obstetrics and Gynecology (1153)
  • Occupational and Environmental Health (961)
  • Oncology (3370)
  • Ophthalmology (988)
  • Orthopedics (370)
  • Otolaryngology (421)
  • Pain Medicine (437)
  • Palliative Medicine (131)
  • Pathology (670)
  • Pediatrics (1704)
  • Pharmacology and Therapeutics (700)
  • Primary Care Research (717)
  • Psychiatry and Clinical Psychology (5497)
  • Public and Global Health (9287)
  • Radiology and Imaging (2225)
  • Rehabilitation Medicine and Physical Therapy (1375)
  • Respiratory Medicine (1202)
  • Rheumatology (598)
  • Sexual and Reproductive Health (721)
  • Sports Medicine (536)
  • Surgery (722)
  • Toxicology (100)
  • Transplantation (290)
  • Urology (267)