Abstract
Background Lung cancer is the most common cause of cancer-related death in the United States (US), with most patients diagnosed at later stages (3 or 4). While most patients are diagnosed following symptomatic presentation, no studies have compared symptoms and physical examination signs at or prior to diagnosis from electronic health records (EHR) in the United States (US).
Objective To identify symptoms and signs in patients prior to lung cancer diagnosis in EHR data.
Study Design Case-control study.
Methods We studied 698 primary lung cancer cases in adults diagnosed between January 1, 2012 and December 31, 2019, and 6,841 controls matched by age, sex, smoking status, and type of clinic. Coded and free-text data from the EHR were extracted from 2 years prior to diagnosis date for cases and index date for controls. Univariate and multivariate conditional logistic regression were used to identify symptoms and signs associated with lung cancer. Analyses were repeated excluding symptom data from 1, 3, 6, and 12 months before the diagnosis/index dates.
Results Eleven symptoms and signs recorded during the study period were associated with a significantly higher chance of being a lung cancer case in multivariate analyses. Of these, seven were significantly associated with lung cancer six months prior to diagnosis: hemoptysis (OR 3.2, 95%CI 1.9-5.3), cough (OR 3.1, 95%CI 2.4-4.0), chest crackles or wheeze (OR 3.1, 95%CI 2.3-4.1), bone pain (OR 2.7, 95%CI 2.1-3.6), back pain (OR 2.5, 95%CI 1.9-3.2), weight loss (OR 2.1, 95%CI 1.5-2.8) and fatigue (OR 1.6, 95%CI 1.3-2.1).
Conclusions Patients diagnosed with lung cancer appear to have symptoms and signs recorded in the EHR that distinguish them from similar matched patients in ambulatory care, often six months or more before their diagnosis. These findings suggest opportunities to improve the diagnostic process for lung cancer in the US.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This research was funded by the Gordon and Betty Moore Foundation (GBMF8837) and the CanTest Collaborative, funded by Cancer Research UK (RG85791).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The Human Subjects Division/IRB of the University of Washington gave ethical approval for this work (STUDY 000013191).
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
Conflicts of interest: The authors have no conflicts of interest to declare.
Funding information: This research was funded by the Gordon and Betty Moore Foundation (GBMF8837) and the CanTest Collaborative, funded by Cancer Research UK (RG85791).
Data Availability
All data produced in the present study are available upon reasonable request to the authors.
Abbreviation List
- CACT
- COVID-19 Annotated Clinical Text Corpus
- CPT
- Current Procedural Terminology
- CRD
- Chronic respiratory disease
- EDW
- Enterprise-wide data warehouse
- EHR
- Electronic health records
- ICD
- International Classification of Diseases
- LACT
- Lung Cancer Annotated Clinical Text Corpus
- LDCT
- Low-dose computed tomography
- NLP
- Natural language processing
- SEER
- Seattle/Puget Sound Surveillance, Epidemiology, and End Results
- UWM
- University of Washington Medicine