Abstract
Sleep assessment is fundamental to understanding sleep architecture, identifying sleep disorders, and advancing personalized sleep medicine. However, current clinical sleep assessment methods rely on time-consuming and often costly procedures, limiting their accessibility and scalability. This study introduces SleepGPT, the first GPT-based language model for efficient sleep assessment encompassing both sleep staging and disorder identification. SleepGPT lever-ages the sequential structure of sleep hypnograms, recognizing strong correlations between successive sleep stages to extract relevant patterns and transitions. Following self-supervised pretraining on manually annotated large-scale whole-night hypnograms, SleepGPT yielded consistent performance gains in sleep staging and disorder diagnosis across five publicly available datasets, with successful blinded replications on three independent datasets. Notably, experiments on established sleep staging benchmarks validate SleepGPT as a robust add-on module that reliably enhances the performance of existing methods. SleepGPT-powered models furthermore achieved comparable sleep staging accuracy using wearable EEG and polysomnography (PSG) in a dataset recorded simultaneously with both modalities. Moreover, a SleepGPT-powered transformer model substantially surpassed state-of-the-art performance in classifying abnormal sleep stage sequences and diagnosing Type-1 narcolepsy. These findings underscore the potential of SleepGPT-powered models as clinically translatable and scalable artificial intelligence (AI) tools for sleep assessment, opening new avenues to advancing precision medicine for sleep disorders.
Competing Interest Statement
W.W. reports equity from Alto Neuroscience. None of the other authors has financial disclosures to report.
Funding Statement
This work was supported in part by STI2030-Major Projects under Grant 2022ZD0211700, the National Natural Science Foundation of China under Grant 62376098, 62276102, and U22A20293, and GuangDong Basic and Applied Basic Research Foundation 2024A1515011983.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study on the private HANG7 dataset was conducted at Zhejiang University with Institutional Review Board approval, and written consent was obtained from all participants or their caregivers.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Experiments on wearable EEG data added; Sleep stage-specific accuracies added; Writing improved;
1 Hypnodensity-based XGBoost results were unavailable due to the absence of hypnodensity data for the CAP dataset.
Data Availability
The SHHS∼\cite{quan1997shhs} and MNC∼\cite{stephansen2018narcolepsy} datasets are provided by the National Sleep Research Resource with appropriate deidentification. Permission and access for these datasets can be obtained via the online portal: \href{https://www.sleepdata.org}{https://www.sleepdata.org}. The SleepEDF∼\cite{kemp2000sleepedf}, Physio2018∼\cite{ghassemi2018physionet}, and CAP∼\cite{mario2001cap} datasets are available from PhysioNet at \href{https://physionet.org/content/sleep-edfx/1.0.0/}{https://physionet.org/content/sleep-edfx/1.0.0/}, \href{https://physionet.org/content/challenge-2018/1.0.0/}{https://physionet.org/content/challenge-2018/1.0.0/}, and \href{https://physionet.org/content/capslpdb/1.0.0/}{https://physionet.org/content/capslpdb/1.0.0/}, respectively. The MASS∼\cite{oreilly2014mass} dataset is available at \href{http://ceams-carsm.ca/mass/}{http://ceams-carsm.ca/mass/}. The BOAS∼\cite{bitbrain} dataset can be accessed at \href{https://openneuro.org/datasets/ds005555/versions/1.0.0}{OpenNeuro}. The ISRUC∼\cite{khalighi2016isruc} dataset can be accessed at \href{https://sleeptight.isr.uc.pt/}{https://sleeptight.isr.uc.pt/}. Access to the HANG7 dataset is governed by data-use agreements, and it is therefore not publicly available.