PT - JOURNAL ARTICLE AU - Emily R. Pfaff AU - Andrew T Girvin AU - Tellen D. Bennett AU - Abhishek Bhatia AU - Ian M. Brooks AU - Rachel R Deer AU - Jonathan P Dekermanjian AU - Sarah Elizabeth Jolley AU - Michael G. Kahn AU - Kristin Kostka AU - Julie A McMurry AU - Richard Moffitt AU - Anita Walden AU - Christopher G Chute AU - Melissa A Haendel TI - Who has long-COVID? A big data approach AID - 10.1101/2021.10.18.21265168 DP - 2021 Jan 01 TA - medRxiv PG - 2021.10.18.21265168 4099 - http://medrxiv.org/content/early/2021/10/22/2021.10.18.21265168.short 4100 - http://medrxiv.org/content/early/2021/10/22/2021.10.18.21265168.full AB - Background Post-acute sequelae of SARS-CoV-2 infection (PASC), otherwise known as long-COVID, have severely impacted recovery from the pandemic for patients and society alike. This new disease is characterized by evolving, heterogeneous symptoms, making it challenging to derive an unambiguous long-COVID definition. Electronic health record (EHR) studies are a critical element of the NIH Researching COVID to Enhance Recovery (RECOVER) Initiative, which is addressing the urgent need to understand PASC, accurately identify who has PASC, and identify treatments.Methods Using the National COVID Cohort Collaborative’s (N3C) EHR repository, we developed XGBoost machine learning (ML) models to identify potential long-COVID patients. We examined demographics, healthcare utilization, diagnoses, and medications for 97,995 adult COVID-19 patients. We used these features and 597 long-COVID clinic patients to train three ML models to identify potential long-COVID patients among (1) all COVID-19 patients, (2) patients hospitalized with COVID-19, and (3) patients who had COVID-19 but were not hospitalized.Findings Our models identified potential long-COVID patients with high accuracy, achieving areas under the receiver operator characteristic curve of 0.91 (all patients), 0.90 (hospitalized); and 0.85 (non-hospitalized). Important features include rate of healthcare utilization, patient age, dyspnea, and other diagnosis and medication information available within the EHR. Applying the “all patients” model to the larger N3C cohort identified 100,263 potential long-COVID patients.Interpretation Patients flagged by our models can be interpreted as “patients likely to be referred to or seek care at a long-COVID specialty clinic,” an essential proxy for long-COVID diagnosis in the current absence of a definition. We also achieve the urgent goal of identifying potential long-COVID patients for clinical trials. As more data sources are identified, the models can be retrained and tuned based on study needs.Funding This study was funded by NCATS and NIH through the RECOVER Initiative.Competing Interest StatementAT Girvin is an employee of Palantir Technologies. ER Pfaff, JP Dekermanjian, SE Jolley, RR Deer, CG Chute, TD Bennett, JA McMurry, RA Moffitt, A Walden, MA Haendel report funding from NIH. ER Pfaff and MG Kahn report funding from PCORI. MA Haendel and JA McMurry are co-founders of Pryzm Health.Funding StatementFunding Sources The analyses described in this publication were conducted with data or tools accessed through the NCATS N3C Data Enclave covid.cd2h.org/enclave and supported by NCATS U24 TR002306. This research was also funded in part by the National Institutes of Health (NIH) Agreement OT2HL161847-01 The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the NIH. Data Partners with Released Data Stony Brook University - U24TR002306; University of Oklahoma Health Sciences Center - U54GM104938: Oklahoma Clinical and Translational Science Institute (OCTSI); West Virginia University - U54GM104942: West Virginia Clinical and Translational Science Institute (WVCTSI); University of Mississippi Medical Center - U54GM115428: Mississippi Center for Clinical and Translational Research (CCTR); University of Nebraska Medical Center - U54GM115458: Great Plains IDeA-Clinical & Translational Research; Maine Medical Center - U54GM115516: Northern New England Clinical & Translational Research (NNE-CTR) Network; Wake Forest University Health Sciences - UL1TR001420: Wake Forest Clinical and Translational Science Institute; Northwestern University at Chicago - UL1TR001422: Northwestern University Clinical and Translational Science Institute (NUCATS); University of Cincinnati - UL1TR001425: Center for Clinical and Translational Science and Training; The University of Texas Medical Branch at Galveston - UL1TR001439: The Institute for Translational Sciences; Medical University of South Carolina - UL1TR001450: South Carolina Clinical & Translational Research Institute (SCTR); University of Massachusetts Medical School Worcester - UL1TR001453: The UMass Center for Clinical and Translational Science (UMCCTS); University of Southern California - UL1TR001855: The Southern California Clinical and Translational Science Institute (SC CTSI); Columbia University Irving Medical Center - UL1TR001873: Irving Institute for Clinical and Translational Research; George Washington Children's Research Institute - UL1TR001876: Clinical and Translational Science Institute at Children's National (CTSA-CN); University of Kentucky - UL1TR001998: UK Center for Clinical and Translational Science; University of Rochester - UL1TR002001: UR Clinical & Translational Science Institute; University of Illinois at Chicago - UL1TR002003: UIC Center for Clinical and Translational Science; Penn State Health Milton S. Hershey Medical Center - UL1TR002014: Penn State Clinical and Translational Science Institute; The University of Michigan at Ann Arbor - UL1TR002240: Michigan Institute for Clinical and Health Research; Vanderbilt University Medical Center - UL1TR002243: Vanderbilt Institute for Clinical and Translational Research; University of Washington - UL1TR002319: Institute of Translational Health Sciences; Washington University in St. Louis - UL1TR002345: Institute of Clinical and Translational Sciences; Oregon Health & Science University - UL1TR002369: Oregon Clinical and Translational Research Institute; University of Wisconsin-Madison - UL1TR002373: UW Institute for Clinical and Translational Research; Rush University Medical Center - UL1TR002389: The Institute for Translational Medicine (ITM); The University of Chicago - UL1TR002389: The Institute for Translational Medicine (ITM); University of North Carolina at Chapel Hill - UL1TR002489: North Carolina Translational and Clinical Science Institute; University of Minnesota - UL1TR002494: Clinical and Translational Science Institute; Children's Hospital Colorado - UL1TR002535: Colorado Clinical and Translational Sciences Institute; The University of Iowa - UL1TR002537: Institute for Clinical and Translational Science; The University of Utah - UL1TR002538: Uhealth Center for Clinical and Translational Science; Tufts Medical Center - UL1TR002544: Tufts Clinical and Translational Science Institute; Duke University - UL1TR002553: Duke Clinical and Translational Science Institute; Virginia Commonwealth University - UL1TR002649: C. Kenneth and Dianne Wright Center for Clinical and Translational Research; The Ohio State University - UL1TR002733: Center for Clinical and Translational Science; The University of Miami Leonard M. Miller School of Medicine - UL1TR002736: University of Miami Clinical and Translational Science Institute; University of Virginia - UL1TR003015: iTHRIV Integrated Translational health Research Institute of Virginia; Carilion Clinic - UL1TR003015: iTHRIV Integrated Translational health Research Institute of Virginia; University of Alabama at Birmingham - UL1TR003096: Center for Clinical and Translational Science; Johns Hopkins University - UL1TR003098: Johns Hopkins Institute for Clinical and Translational Research; University of Arkansas for Medical Sciences - UL1TR003107: UAMS Translational Research Institute; Nemours - U54GM104941: Delaware CTR ACCEL Program; University Medical Center New Orleans - U54GM104940: Louisiana Clinical and Translational Science (LA CaTS) Center; University of Colorado Denver, Anschutz Medical Campus - UL1TR002535: Colorado Clinical and Translational Sciences Institute; Mayo Clinic Rochester - UL1TR002377: Mayo Clinic Center for Clinical and Translational Science (CCaTS); Tulane University - UL1TR003096: Center for Clinical and Translational Science; Loyola University Medical Center - UL1TR002389: The Institute for Translational Medicine (ITM); Advocate Health Care Network - UL1TR002389: The Institute for Translational Medicine (ITM); OCHIN - INV-018455: Bill and Melinda Gates Foundation grant to Sage Bionetworks; The University of Texas Health Science Center at Houston - UL1TR003167: Center for Clinical and Translational Sciences (CCTS); Weill Medical College of Cornell University - UL1TR002384: Weill Cornell Medicine Clinical and Translational Science Center; Montefiore Medical Center - UL1TR002556: Institute for Clinical and Translational Research at Einstein and Montefiore; Regenstrief Institute - UL1TR002529: Indiana Clinical and Translational Science Institute; Boston University Medical Campus - UL1TR001430: Boston University Clinical and Translational Science Institute; Aurora Health Care - UL1TR002373: Wisconsin Network For Health Research; Brown University - U54GM115677: Advance Clinical Translational Research (Advance-CTR); Rutgers, The State University of New Jersey - UL1TR003017: New Jersey Alliance for Clinical and Translational Science; Loyola University Chicago - UL1TR002389: The Institute for Translational Medicine (ITM); UL1TR001445: Langone Health's Clinical and Translational Science Institute; University of Kansas Medical Center - UL1TR002366: Frontiers: University of Kansas Clinical and Translational Science Institute; Massachusetts General Brigham - UL1TR002541: Harvard Catalyst; University of California, Irvine - UL1TR001414: The UC Irvine Institute for Clinical and Translational Science (ICTS); University of California, San Diego - UL1TR001442: Altman Clinical and Translational Research Institute; University of California, Davis - UL1TR001860: UCDavis Health Clinical and Translational Science Center; University of California, San Francisco - UL1TR001872: UCSF Clinical and Translational Science Institute; University of California, Los Angeles - UL1TR001881: UCLA Clinical Translational Science Institute; Additional Data Partners Who Have Signed a DTA and Whose Data Release is Pending The Scripps Research Institute - UL1TR002550: Scripps Research Translational Institute; University of Texas Health Science Center at San Antonio - UL1TR002645: Institute for Integration of Medicine and Science; NorthShore University HealthSystem - UL1TR002389: The Institute for Translational Medicine (ITM); Yale New Haven Hospital - UL1TR001863: Yale Center for Clinical Investigation; Emory University - UL1TR002378: Georgia Clinical and Translational Science Alliance; Medical College of Wisconsin - UL1TR001436: Clinical and Translational Science Institute of Southeast Wisconsin; University of New Mexico Health Sciences Center - UL1TR001449: University of New Mexico Clinical and Translational Science Center; George Washington University - UL1TR001876: Clinical and Translational Science Institute at Children's National (CTSA-CN); Stanford University - UL1TR003142: Spectrum: The Stanford Center for Clinical and Translational Research and Education; Cincinnati Children's Hospital Medical Center - UL1TR001425: Center for Clinical and Translational Science and Training; The State University of New York at Buffalo - UL1TR001412: Clinical and Translational Science Institute; Children's Hospital of Philadelphia - UL1TR001878: Institute for Translational Medicine and Therapeutics; Icahn School of Medicine at Mount Sinai - UL1TR001433: ConduITS Institute for Translational Sciences; Ochsner Medical Center - U54GM104940: Louisiana Clinical and Translational Science (LA CaTS) Center; HonorHealth - None (Voluntary); University of Vermont - U54GM115516: Northern New England Clinical & Translational Research (NNE-CTR) Network; Arkansas Children's Hospital - UL1TR003107: UAMS Translational Research InstituteAuthor DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The N3C data transfer to NCATS is performed under a Johns Hopkins University Reliance Protocol # IRB00249128 or individual site agreements with NIH. Use of the N3C data for this study is authorized under the following IRB Protocols: - University of Colorado (Colorado Multiple Institutional Review Board) approved 21-2759 - Johns Hopkins University (Johns Hopkins Office of Human Subjects Research - Institutional Review Board) approved IRB00249128 - University of North Carolina (University of North Carolina Chapel Hill Institutional Review Board) exempt per 21-0309 - Stony Brook University (Office of Research Compliance, Division of Human Subject Protections, Stony Brook University) exempt per IRB2021-00098I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe N3C data transfer to NCATS is performed under a Johns Hopkins University Reliance Protocol # IRB00249128 or individual site agreements with NIH. The N3C Data Enclave is managed under the authority of the NIH; information can be found at ncats.nih.gov/n3c/resources. Enclave data is protected, and can be accessed for COVID-related research with an approved (1) IRB protocol and (2) Data Use Request (DUR). A detailed accounting of data protections and access tiers is found in [1]. Enclave and data access instructions can be found at https://covid.cd2h.org/for-researchers; all code used to produce the analyses in this manuscript is available within the N3C Enclave to users with valid login credentials to support reproducibility. https://covid.cd2h.org/for-researchers