RT Journal Article SR Electronic T1 Generalizable Long COVID Subtypes: Findings from the NIH N3C and RECOVER Programs JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2022.05.24.22275398 DO 10.1101/2022.05.24.22275398 A1 Justin T. Reese A1 Hannah Blau A1 Timothy Bergquist A1 Johanna J. Loomba A1 Tiffany Callahan A1 Bryan Laraway A1 Corneliu Antonescu A1 Elena Casiraghi A1 Ben Coleman A1 Michael Gargano A1 Kenneth J. Wilkins A1 Luca Cappelletti A1 Tommaso Fontana A1 Nariman Ammar A1 Blessy Antony A1 T. M. Murali A1 Guy Karlebach A1 Julie A McMurry A1 Andrew Williams A1 Richard Moffitt A1 Jineta Banerjee A1 Anthony E. Solomonides A1 Hannah Davis A1 Kristin Kostka A1 Giorgio Valentini A1 David Sahner A1 Christopher G. Chute A1 Charisse Madlock-Brown A1 Melissa A Haendel A1 Peter N. Robinson A1 the N3C consortium A1 the RECOVER Consortium YR 2022 UL http://medrxiv.org/content/early/2022/07/20/2022.05.24.22275398.abstract AB Accurate stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, the natural history of long COVID is incompletely understood and characterized by an extremely wide range of manifestations that are difficult to analyze computationally. In addition, the generalizability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. We present a method for computationally modeling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning procedures. Using k-means clustering of this similarity matrix, we found six distinct clusters of PASC patients, each with distinct profiles of phenotypic abnormalities. There was a significant association of cluster membership with a range of pre-existing conditions and with measures of severity during acute COVID-19. Two of the clusters were associated with severe manifestations and displayed increased mortality. We assigned new patients from other healthcare centers to one of the six clusters on the basis of maximum semantic similarity to the original patients. We show that the identified clusters were generalizable across different hospital systems and that the increased mortality rate was consistently observed in two of the clusters. Semantic phenotypic clustering can provide a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by the National Institutes of Health awards as follows: CD2H NCATS U24 TR002306, NHLBI RECOVER Agreement OT2HL161847-01, Office of the Director Monarch Initiative R24 OD011883, and NHGRI Center of Excellence in Genome Sciences RM1 HG010860; and was conducted under the N3C DUR RP-5677B5. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the NIH. Additionally, Justin T. Reese was supported by the Director, Office of Science, Office of Basic Energy Sciences of the U.S. Department of Energy Contract No. DE-AC02-05CH11231; Peter N. Robinson was supported by the Donald A. Roux Family Fund at the Jackson Laboratory; and Melissa A. Haendel was supported by the Marsico Family at the University of Colorado Anschutz Medical Authorship was determined using ICMJE recommendations. The analyses described in this publication were conducted with data or tools accessed through the NCATS N3C Data Enclave covid.cd2h.org/enclave and supported by NCATS U24 TR002306. This research was possible because of the patients whose information is included within the data from participating organizations (covid.cd2h.org/dtas) and the organizations and scientists (covid.cd2h.org/duas) who have contributed to the on-going development of this community resource.72 The N3C data transfer to NCATS is performed under a Johns Hopkins University Reliance Protocol # IRB00249128 or individual site agreements with NIH. The N3C Data Enclave is managed under the authority of the NIH; information can be found at https://ncats.nih.gov/n3c/resources. Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The N3C data transfer to NCATS is performed under a Johns Hopkins University Reliance Protocol #IRB00249128 or individual site agreements with NIH. The N3C Data Enclave is managed under the authority of the NIH; information can be found at https://ncats.nih.gov/n3c/resources. I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesData are available by application to the N3C Data Enclave, which is managed under the authority of the NIH; information can be found at https://ncats.nih.gov/n3c/resources.