Enhancing Early Detection of Cognitive Decline in the Elderly: A Comparative Study Utilizing Large Language Models in Clinical Notes

Background: Large language models (LLMs) have shown promising performance in various healthcare domains, but their effectiveness in identifying specific clinical conditions in real medical records is less explored. This study evaluates LLMs for detecting signs of cognitive decline in real electronic health record (EHR) clinical notes, comparing their error profiles with traditional models. The insights gained will inform strategies for performance enhancement. Methods: This study, conducted at Mass General Brigham in Boston, MA, analyzed clinical notes from the four years prior to a 2019 diagnosis of mild cognitive impairment in patients aged 50 and older. We used a randomly annotated sample of 4,949 note sections, filtered with keywords related to cognitive functions, for model development. For testing, a random annotated sample of 1,996 note sections without keyword filtering was utilized. We developed prompts for two LLMs, Llama 2 and GPT-4, on HIPAA-compliant cloud-computing platforms using multiple approaches (e.g., both hard and soft prompting and error analysis-based instructions) to select the optimal LLM-based method. Baseline models included a hierarchical attention-based neural network and XGBoost. Subsequently, we constructed an ensemble of the three models using a majority vote approach. Results: GPT-4 demonstrated superior accuracy and efficiency compared to Llama 2, but did not outperform traditional models. The ensemble model outperformed the individual models, achieving a precision of 90.3%, a recall of 94.2%, and an F1-score of 92.2%. Notably, the ensemble model showed a significant improvement in precision, increasing from a range of 70%-79% to above 90%, compared to the best-performing single model. Error analysis revealed that 63 samples were incorrectly predicted by at least one model; however, only 2 cases (3.2%) were mutual errors across all models, indicating diverse error profiles among them. Conclusions: LLMs and traditional machine learning models trained using local EHR data exhibited diverse error profiles. The ensemble of these models was found to be complementary, enhancing diagnostic performance. Future research should investigate integrating LLMs with smaller, localized models and incorporating medical data and domain knowledge to enhance performance on specific tasks.


Effective and parseable response
The LLM's response provides answers to both questionswhether cognitive decline was identified and what keywords were used for the decision -using a standard JSON format.
Output (JSON format): { "label": "No", "keywords": ["HTN", "prediabetes", "anemia", "Menieres disease", "hearing loss"] } Explanation: After analyzing the clinical text, I have determined that there is no evidence of cognitive decline in this case.The patient's functional status and daily activities such as driving, shopping, cooking, and managing his own medications suggest that he is still able to perform these tasks independently, which indicates a lack of cognitive decline.Additionally, the patient's past medical history does not mention any diagnosis or symptoms related to cognitive decline, such as dementia, Alzheimer's disease, or memory loss.The only concern mentioned by the patient is his worry about "Alzheimer disease", but this is not a clear indication of cognitive decline.Therefore, based on the provided clinical text, I have labeled the case as "No" for evidence of cognitive decline.

Response that is effective but not parseable
The LLM's response answers both questions, but it doesn't adhere to the standard JSON format.
Based on the clinical text, it appears that the patient is experiencing mild cognitive decline due to multiple etiologies, including migraines and OSA, which are important keywords that appeared in the clinical text.Her profile suggests a mixed bag of strengths and weaknesses in various cognitive domains, with some areas of relative strength (verbal abilities) and others of relative weakness (perceptual reasoning, working memory, executive functions).The text also notes that her symptoms are consistent with a vascular etiology, possibly superimposed on cognitive difficulties arising from untreated OSA.

Response that is not effective
The LLM's response fails to answer either of the two questions.
Please provide your output in JSON format as requested above.Also, please include an explanation for your analysis below the output.Thank you! *Although the example for the "response that is not effective" may appear to be prompt, all content in the "Example" column consists of original responses from the large language model (LLM).
Table S3.Detailed Information on Prompt Templates.*Prompt Template

Content Description
Template 1 Using the clinical text provided below, please analyze, and determine whether there is evidence of cognitive decline in the patient.
For your analysis, provide the output in a structured JSON format, which should include the following fields: -'label': Indicate 'Yes' if evidence of cognitive decline is found, or 'No' if not.
-'keywords': Provide all distinct words in the clinical text that help you make your judgment.
Please also find the definition of cognitive decline that may help you to make your analysis: Determining the presence of cognitive decline aimed to identify patients at any stage of cognitive decline, ranging from SCD to MCI to dementia.Therefore, cognitive decline can be captured by the mention of a cognitive concern, symptoms (e.g., memory loss), diagnosis (e.g., MCI, AD dementia), cognitive assessments (e.g., Mini-Cog) (including patients with normal performance but with a note indicating a cognitive concern), or cognitive-related therapy or treatments (e.g., cognitive-linguistic therapy).We focused on progressive cognitive decline that is likely to be consistent with or lead to MCI.Cases that were less likely progressive (e.g., cognitive function has improved), transient (e.g., temporarily forgetful, or occasional memory loss due to medication intake [e.g., codeine]), or reversible (e.g., cognitive function affected soon after some event [e.g., surgery, injury, or stroke]) were considered negative for cognitive decline.We also labeled sections of notes as negative when the record showed broader or uncertain indication of cognitive decline.
In addition, please also add an explanation for your analysis below the output.
Here Please also find the definition of cognitive decline that may help you to make your analysis: Determining the presence of cognitive decline aimed to identify patients at any stage of cognitive decline, ranging from SCD to MCI to dementia.Therefore, cognitive decline can be captured by the mention of a cognitive concern, symptoms (e.g., memory loss), diagnosis (e.g., MCI, AD dementia), cognitive assessments (e.g., Mini-Cog) (including patients with normal performance but with a note indicating a cognitive concern), or cognitive-related therapy or treatments (e.g., cognitive-linguistic therapy).We focused on progressive cognitive decline that is likely to be consistent with or lead to MCI.Cases that were less likely progressive (e.g., cognitive function has improved), transient (e.g., temporarily forgetful, or occasional memory loss due to medication intake [e.g., codeine]), or reversible (e.g., cognitive function affected soon after some event [e.g., surgery, injury, or stroke]) were considered negative for cognitive decline.We also labeled sections of notes as negative when the record showed broader or uncertain indication of cognitive decline.
In addition, please also add an explanation for your analysis below the output.
Here Template 4 Please determine whether the provided section of note contains indication of patient's cognitive decline ranging from subjective cognitive concerns to objective cognitive decline.Please output using JSON format 1) the label with 0 or 1 where 0 indicates no evidence of cognitive decline and 1 indicates evidence of cognitive decline, and 2) the keywords for your judgement.
Please also find the definition of cognitive decline that may help you to make your analysis: Determining the presence of cognitive decline aimed to identify patients at any stage of cognitive decline, ranging from SCD to MCI to dementia.Therefore, cognitive decline can be captured by the mention of a cognitive concern, symptoms (e.g., memory loss), diagnosis (e.g., MCI, AD dementia), cognitive assessments (e.g., Mini-Cog) (including patients with normal performance but with a note indicating a cognitive concern), or cognitive-related therapy or treatments (e.g., cognitive-linguistic therapy).We focused on progressive cognitive decline that is likely to be consistent with or lead to MCI. memory loss due to medication intake [e.g., codeine]), or reversible (e.g., cognitive function affected soon after some event [e.g., surgery, injury, or stroke]) were considered negative for cognitive decline.We also labeled sections of notes as negative when the record showed broader or uncertain indication of cognitive decline.
In addition, please also add an explanation for your analysis below the output.
Here Please also find the definition of cognitive decline that may help you to make your analysis: Determining the presence of cognitive decline aimed to identify patients at any stage of cognitive decline, This template is designed for five-shot prompting.
As illustrated in Supplementary Figure S1, it includes the task description section, the prompt augmentation section, and the additional task guidance section.
ranging from SCD to MCI to dementia.Therefore, cognitive decline can be captured by the mention of a cognitive concern, symptoms (e.g., memory loss), diagnosis (e.g., MCI, AD dementia), cognitive assessments (e.g., Mini-Cog) (including patients with normal performance but with a note indicating a cognitive concern), or cognitive-related therapy or treatments (e.g., cognitive-linguistic therapy).We focused on progressive cognitive decline that is likely to be consistent with or lead to MCI.Cases that were less likely progressive (e.g., cognitive function has improved), transient (e.g., temporarily forgetful or occasional memory loss due to medication intake [e.g., codeine]), or reversible (e.g., cognitive function affected soon after some event [e.g., surgery, injury, or stroke]) were considered negative for cognitive decline.We also labeled sections of notes as negative when the record showed broader or uncertain indication of cognitive decline.
In addition, please also add an explanation for your analysis below the output.Please find the definition of cognitive decline that may help you to make your analysis: Determining the presence of cognitive decline aimed to identify patients at any stage of cognitive decline, ranging from SCD to MCI to dementia.Therefore, cognitive decline can be captured by the mention of a cognitive concern, symptoms (e.g., memory loss), diagnosis (e.g., MCI, AD dementia), cognitive assessments (e.g., Mini-Cog) (including patients with normal performance but with a note indicating a cognitive concern), or cognitive-related therapy or treatments (e.g., cognitive-linguistic therapy).We focused on progressive cognitive decline that is likely to be consistent with or lead to MCI.Cases that were less likely progressive (e.g., cognitive function has improved), transient (e.g., temporarily forgetful, or occasional memory loss due to medication intake [e.g., codeine]), or reversible (e.g., cognitive function affected soon after some event [e.g., surgery, injury, or stroke]) were considered negative for cognitive decline.We also labeled sections of notes as negative when the record showed broader or uncertain indication of cognitive decline.
Please also keep in mind: 1) short-term symptom like "confusion", "delirium", "change of mental status", "altered mental status" are not considered cognitive decline since they may disappear soon; 2) family history of cognitive decline also does not indicate the patient has cognitive decline; 3) swallowing issue is a strong indicator of cognitive decline, especially when combining with other relevant symptoms; 4) depression and anxiety are not considered a sign of cognitive decline; 5) cognitive decline is long-term, and any symptoms that could disappear in a short-term would not be a sign of cognitive decline; 6) cognitive decline related symptoms that are caused by other factors like head injury or bad health condition are not considered a sign of cognitive decline.
In addition, please also add an explanation for your analysis below the output.
Here is the clinical text for analysis: ******[note]****** *The "[note]" represents the actual clinical note section filled in the prompt template; "[Example Note]", "[Label]", and "[Keywords]" represent an example note along with their ground truth label and keywords, which were used for prompt augmentation.The patient is taking memantine (NAMENDA), which is a medication used to treat moderate to severe Alzheimer's disease.This suggests that the patient may be experiencing cognitive decline.The structure includes a required section: task description, and optional sections: prompt augmentation, error analysis-based instructions, and additional task guidance.The task description and additional task guidance can be written without using data samples.However, the prompt augmentation and error analysis-based instructions require data samples; prompt augmentation necessitates examples for inclusion in the prompt, and error analysis-based instructions are derived from a summary of incorrectly predicted training samples.Notably, Template 1 in Supplementary Table S3 includes the task description section and additional task guidance; Templates 2 and 5 include only the task description section; Templates 3 and 4 include both the task description and additional task guidance; Template 6 includes the task description, prompt augmentation, and additional task guidance; Template 7 includes the task description, error analysis-based instructions, and additional task guidance.We observed that the error profiles of GPT-4 models with different prompting strategies were less diverse compared to the error profiles among GPT-4, the attention-based DNN, and XGBoost.There were a total of 76 errors.Dynamic five-shot prompting accounted for 69 errors.Notably, 23 (30.3%) of these errors were common across all prompting strategies.

Figure S2 .
Figure S2.Error Profiles for Different Prompting Strategies.

.
Information of Cloud Instances.

Table S2 .
Examples of Different Response Categories.* Please determine whether the provided section of note contains indication of patient' cognitive decline, and answer with JSON format including two fields: 'label': Indicate 'Yes' if evidence of cognitive decline is found, or 'No' if not, and 'keywords': Provide all distinct words in the clinical text that helped you make your judgment.
Now, please analyze this clinical note:******[Note]****** Template 7 Using the clinical text provided below, please analyze and determine if there is evidence of cognitive decline in the patient.
For your analysis, provide the output in a structured JSON format, which should include the following fields: -'label': Indicate 'Yes' if evidence of cognitive decline is found, or 'No' if not.-'keywords': Provide all distinct words in the clinical text that help you make your judgment.

Table S4 .
Preliminary Results of Manual Template Engineering on Ten Training Cases.

Table S5 .
Accuracy of Most Effective Responses from GPT-4 and Llama 2 on Dataset I-S.

Table S6 .
Testing the Impact of Prompt Augmentation on Performance in Dataset I-S.

Table S7 .
Testing the Impact of Adding Error Analysis-Based Instructions on Performance in Dataset I-S.

Table S8 .
Optimized Parameters for Traditional AI Models.

Table S9 .
Examples of GPT-4's Explanations for Identifying Cognitive Decline from a Long List of Medications.