Abstract
Timely prognosis of type 2 diabetes (T2D) complications is critical for effective interventions and reducing economic burden. AI-driven large language models (LLMs) offer potential for extracting clinical insights but face challenges due to the sparse, high-dimensional nature of longitudinal medical records. This study demonstrates the utility of LLMs in medical time series prediction by preprocessing data with a missing mask, adding an embedding layer to a pretrained LLM, and fine-tuning both components. The fine-tuned model outperformed baselines in predicting both HbA1c and LDL levels using the DPV registry dataset of 449,185 T2D patients, achieving Pearson’s correlations of 0.749 and 0.754, with a delta improvement of 0.253 and 0.259, respectively. The model also demonstrated robust long-term prediction for HbA1c over 554.3 days (95% CI: [547.0, 561.5]), with a 9% improvement in MSE over last-observation-based methods. Integrated gradient analysis identified significant clinical features and visits, revealing potential biomarkers for early intervention. Overall, the results showed the possibility to leverage the prediction power of LLM in T2D prognosis using sparse medical time series, assisting clinical prognosis and biomarker discovery, ultimately advancing precision medicine.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study did not receive any funding
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The institutional ethics committee of Ulm University, Germany, approved the analysis of anonymized DPV data on August 25, 2021 (issue 314/21).
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data availability
The data is not publicly accessible due to the patient’s confidentiality. Researchers who would like to reproduce the results could contact the data access committee at the DPV Initiative in Ulm University. Detailed information can be found here: https://buster.zibmt.uni-ulm.de/.