ABSTRACT
Acquired somatic mutations in hematopoietic stem and progenitor cells (clonal hematopoiesis or CH) are associated with advanced age, increased risk of cardiovascular and malignant diseases, and decreased overall survival.1–4 These adverse sequelae may be mediated by altered inflammatory profiles observed in patients with CH.2,5,6 A pro-inflammatory immunologic profile is also associated with worse outcomes of certain infections, including SARS-CoV-2 and its associated disease Covid-19.7,8 Whether CH predisposes to severe Covid-19 or other infections is unknown. Among 515 individuals with Covid-19 from Memorial Sloan Kettering (MSK) and the Korean Clonal Hematopoiesis (KoCH) consortia, we found that CH was associated with severe Covid-19 outcomes (OR=1.9, 95%=1.2-2.9, p=0.01). We further explored the relationship between CH and risk of other infections in 14,211 solid tumor patients at MSK. CH was significantly associated with risk of Clostridium Difficile (HR=2.0, 95% CI: 1.2-3.3, p=6×10−3) and Streptococcus/Enterococcus infections (HR=1.5, 95% CI=1.1-2.1, p=5×10−3). These findings suggest a relationship between CH and risk of severe infections that warrants further investigation.
MAIN
Acquired mutations that lead to clonal expansion are common in the normal aging hematopoietic system (clonal hematopoiesis, or CH), yet are known to alter stem/progenitor and lymphoid function and response to environmental stressors, including systemic infections5,6,9,10. The mutational events that drive CH overlap with known drivers of hematologic malignancies. However, the majority of mutations in CH appear to occur outside of canonical cancer driver genes11,12. The impact of individual mutational events on hematopoietic stem and progenitor cells differs by the nature of the genomic aberration. For example, chromosomal aneuploidies result in a predisposition for lymphoid fate specification and transformation13,14 while point mutations in DNMT3A result in increased myeloid differentiation6,15. Heterogeneity also exists across CH phenotypes by driver gene in regards to its impact on inflammatory signaling6. For example, mutations in TET2 result in heightened secretion of several cytokines including IL-1β/IL-6 signaling that may partially explain the increased risk of cardiovascular disease5,9,16. Moreover, systemic infections and the resultant inflammatory signals can lead to increased clonal fitness of TET2 mutant cells and clonal expansion10,17,18.
Despite these important insights, the relationship between different CH events, infectious risk and infectious disease severity has not been studied. The severity of Covid-19 is also associated with advanced age, cardiovascular and malignant comorbidities, and elevated circulating IL-6 levels; features which are seen with age-associated CH19–23. Given the common inflammatory profile of CH and Covid-19 infection, we investigated the relationship between CH and Covid-19 including the potential for an association of CH with increased Covid-19 disease severity.
Our study included patients from two separate cohorts. The first cohort was composed of patients with solid tumors treated at Memorial Sloan Kettering Cancer Center (MSK) with blood previously sequenced using MSK-IMPACT, a previously validated targeted gene panel capturing all commonly mutated CH-associated genes (Supplementary Table 1)24. Of these patients, 1,626 were tested for SARS-CoV-2 (the virus that causes Covid-19) RNA between March 1st 2020 and July 1st 2020; 403 (24.8%) individuals tested positive for SARS-CoV-2 (Methods and Table 1). The second cohort included 112 previously healthy individuals without cancer who were hospitalized for Covid-19 between January and April 2020 at four tertiary hospitals in South Korea (KoCH cohort). The KoCH cohort was sequenced using a custom targeted NGS panel from Agilent (89 genes) which was designed to include commonly occurring CH genes (Supplementary Table 2).
For both cohorts, the primary outcome was severe Covid-19 infection, defined as the presence of hypoxia requiring supplemental oxygen (oxygen device >1 L or hypoxia <94%). We used multivariable logistic regression adjusting for covariates including age, smoking, prior Covid-19 related comorbidities, and prior cancer treatment to determine the association between severe Covid-19 and CH in each population. We then performed a fixed-effects meta-analysis to estimate the association in the overall population. The full statistical rationale is further described in the Methods section.
Among Covid-19 positive individuals, 23% (N=94) and 61% (N=68) had severe disease in the MSK and KoCH cohorts, respectively (Table 1). Overall, CH was observed in 35% of Covid-19 positive cases at MSK and 21% in KoCH. Of note, when restricting the MSK-IMPACT panel to the 89 genes included in the KoCH panel, 20% of Covid-19 positive cases at MSK had CH. In the MSK cohort, CH was observed in 51% and 30% of patients with severe versus non-severe Covid-19, respectively (adjusted OR: 1.85, 95% CI 1.10-3.12, Figure 1). In the KoCH cohort, CH was observed in 25% and 15.9% of patients with severe versus non-severe Covid-19, respectively (adjusted OR 1.85, 95% CI 0.53-6.43, Figure 1). In a fixed effects meta-analysis of odds-ratio estimates from the multivariable logistic regression models employed in each separate cohort analysis, the presence of CH was associated with an increased risk of severe Covid-19 (OR=1.85, 95%=1.15-2.99, p=0.01) (Figure 1).
Using previously described methods24, CH mutations were classified as known or hypothesized cancer putative drivers (PD-CH) or non putative drivers (non-PD CH). The majority of CH mutations were classified as PD-CH (52% in the MSK cohort and 67% in the KoCH dataset). In order to explore the association between particular mutation types and Covid-19 severity, we performed a stratified analysis of Covid-19 severity by PD-CH versus non-PD CH status. A significant association was observed between non-PD CH and severe Covid-19 (OR=2.01, 95% CI=1.15-3.50, p=0.02), as well as between silent (synonymous) CH and severe Covid-19 (OR=2.58, 95% CI 1.01-6.61, p=0.05). There was not a statistically-significant association between PD-CH and severe Covid-19 infection (OR=1.15, 95% CI=0.61-2.02, p=0.77: Extend ed Data Figure 1). Most non-PD mutations in Covid-19 positive cases occurred in non-recurrently mutated genes (65% at MSK and 76.9% in KoCH, Supplementary Figure 1).
The strength of the association between CH and severe Covid-19 was similar among patients with one CH mutation (OR=1.78, 95% CI=1.0-3.1, p=0.04) and multiple CH mutations (OR=2.0, 95% CI=1.0-3.8, p=0.04). Patients with a maximum CH variant allele frequency (VAF) of >5% showed a significant association with severe Covid-19 (OR=1.9, 95% CI=1.0-3.4, p=0.04). This association trended towards statistical significance in patients with any CH mutation and a maximum VAF<5% (OR=1.75, 95% CI=0.97-3.17, p=0.06: Extended Data Figures 2-4). These data suggest that the presence of CH and resultant alterations in hematopoietic differentiation, and not specific mutant alleles, is predictive of Covid-19 disease severity.
Given the evidence of an association between CH and Covid-19 severity, we sought to explore the relationship between CH and other types of infections. We analyzed billing codes from 14,211 solid tumor patients treated at MSK who underwent blood sequencing by MSK-IMPACT. Using a previously established phenome-wide-association study (Phe-WAS) methodology25, we mapped patient ICD-9 and ICD-10 billing codes to categories of infectious disease. Multivariable Cox proportional hazards regression was used to estimate the hazard ratio (HR) for risk of infection among CH positive compared to CH negative individuals. Given the number of model covariates, we limited the analysis to 32 infection subclasses that affected at least 80 individuals (see Methods). Multiple infection types were associated with CH, although many associations were not statistically-significant after multiplicity adjustment (Figure 2A, Supplementary Table 3). CH was significantly (FDR-corrected p-value<0.10) associated with the onset of two infection subclasses: Clostridium difficile infection (HR=2.0, 95% CI: 1.2-3.3, p=6×10−3) and Streptococcus/Enterococcus infection (HR=1.5, 95% CI=1.1-2.1, p=5×10−3). When stratified by CH-mutation characteristics, patients with two or more CH-mutations had a stronger association with Clostridium difficile infection (OR=3.4, 95% CI=1.8-6.3, p=2×10-4) compared to patients with one CH-mutation (OR=1.4, 95% CI=0.8-2.7, p=0.28). The association between CH and Clostridium difficile infection was significant for mutations with a VAF of >5% (OR=2.5, 95% CI=1.4-4.6, p=0.002) but not mutations with a VAF of 2-5% (OR=1.6, 95% CI=0.8-3.1, p=0.17). Similar to Covid-19 severity, the association between CH and Clostridium difficile infection was significant for non-PD CH (OR=2.0, 95% CI=1.2-3.3, p=0.01) and silent mutations (OR=2.6, 95% CI=1.2-5.8, p=0.02) but not CH-PD (OR=1.4, 95% CI=0.7-2.8, p=0.39) (Figure 2B).
In summary, we show in cancer and non-cancer patients that CH is associated with increased Covid-19 severity. In a large cancer patient cohort, CH is also associated with other severe infections, namely Streptococcus/Enterococccus and Clostridium difficile infections. Our exploratory analysis suggests that the relationship between CH and Covid-19 and CH and Clostridium difficile infection may be partly driven by non-driver CH. Clonal expansions characterized by non-driver mutational events could be facilitated by multiple mechanisms. Many classes of genetic alterations, such as copy number events (CNVs), structural variants, non-coding, and epigenetic changes, are not detectable using the targeted panels included in this study. As such, the observed events that are highly enriched in CH could be ‘passenger’ mutations that co-occur with a positively selected, undetected ‘driver’ mutation such as recurrent CNVs.12,13 Alternatively, driver mutations may have been incompletely classified as “non-driver” events using our methodology. However, cancer driver genes tend to recur in multiple patients, and the majority of witnessed non-driver mutated genes in our cohort were non-recurrent suggesting that clonal expansion, and not the specific event driving clonal expansion, may be associated with Covid-19 disease severity. This will need to be further studied in larger cohorts.
The hematopoietic system is a key regulator of inflammation and immunity. A substantial body of evidence now links somatic alterations in hematopoietic stem and progenitor cells to a variety of health outcomes, with inflammation emerging as a key mediator.2–5,10–13 Our data, along with results in the accompanying manuscript by Zekavat et al., demonstrate a similar association between CH and increased infection severity. This association may be due to residual confounding by variables that are unknown and unaccounted for in our models. Alternatively, this could represent a novel pathophysiology that links CH-induced changes in hematopoietic stem, progenitor, and lymphoid cell function with immune regulation and infection response. Future investigation including functional studies will be important to clarify the mechanisms underlying the association between CH and infection risk and to develop potential interventional strategies to attenuate inflammation, clonal expansion, and infectious sequelae in patients with and without cancer.
METHODS
Sample ascertainment and clinical data extraction
MSK Cohort
The study population included 9,307 patients with non-hematologic cancers at MSKCC who were alive on May 1st 2020 and underwent matched tumor and blood sequencing before this date using the MSK-IMPACT panel on an institutional prospective tumor sequencing protocol (ClinicalTrials.gov number, NCT01775072). Subjects who had a hematologic malignancy diagnosed after MSK-IMPACT testing or who had an active hematologic malignancy at the time of blood draw were excluded. Demographics, smoking history, exposure to oncologic therapy and primary tumor site were extracted from the electronic health record. Accuracy of populated information was manually checked in the EMR by three independent physicians (K.B, M.F, A.S). The presence of co-existing medical comorbidities known to correlate with Covid-19 severity including diabetes, COPD, asthma, hypertension and cardiovascular disease, were ascertained from ICD-9 and ICD-10 billing codes. SARS-CoV-2 status was determined using RT-PCR. We defined severe Covid-19 as the presence of hypoxia requiring supplemental oxygen (supplemental oxygen device >1 L or hypoxia <94%) resulting from Covid-19 infection. There were seven subjects with Covid-19 for whom there was minimal documentation of clinical course following Covid-19 infection and these individuals were excluded. There were three individuals with metastatic cancer and progression of disease at the time of Covid-19 where it was unclear whether documented hypoxia could be attributed to Covid-19 or disease progression. These subjects were also excluded.
KoCH Cohort
Laboratory-confirmed patients with Covid-19 between January and April 2020 in four tertiary hospitals in Republic of Korea were approached for consent to this study. Blood was drawn following confirmation of Covid-19 positivity. All of four hospitals have been running national-designated isolation units, which are located in Seoul, Gyeonggi, or Daegu. These provinces had the highest numbers of Covid-19 cases during the period14. Clinical and laboratory characteristics were retrospectively reviewed using the electronic medical record systems of each institution. Hypoxia requiring supplemental oxygen was defined as supplemental oxygen device >1 L with O2 <94% resulting from Covid-19 infection. Institutional Review Boards (IRBs) of four respective hospitals approved the study (IRB No. 2003-141-1110, B-2006/616-409), and written informed consents of the patients were obtained per IRB recommendations. Subjects who had an active malignancy at the time of blood draw were excluded.
Sequencing and Variant Calling
MSK Cohort
Subjects had a tumor and blood sample (as a matched normal control) sequenced using MSK-IMPACT, an FDA-authorized hybridization capture-based next-generation sequencing assay encompassing all protein-coding exons from the canonical transcript of 341, 410, or 468 cancer-associated genes (Supplementary Table 4). MSK-IMPACT is validated and approved for clinical use by New York State Department of Health Clinical Laboratory Evaluation Program. The sequencing test utilizes genomic DNA extracted from formalin fixed paraffin embedded (FFPE) tumor tissue as well as matched patient blood samples. DNA is sheared and DNA fragments are captured using custom probes47. MSK-IMPACT contains most of the commonly reported CH genes with the exception that earlier versions of the panel did not contain PPM1D or SRSF2.
Pooled libraries were sequenced on an Illumina HiSeq 2500 with 2×100bp paired-end reads. Sequencing reads were aligned to the human genome (hg19) using BWA (0.7.5a). Reads were re-aligned around indels using ABRA (0.92), followed by base quality score recalibration with Genome Analysis Toolkit (GATK) (3.3-0). Median coverage in the blood samples was 497x, and median coverage in the tumors was 790x. Variant calling for each blood sample was performed unmatched, using a pooled control sample of DNA from 10 unrelated individuals as a comparator. Single nucleotide variants (SNVs) were called using Mutect and VarDict. Insertions and deletions were called using Somatic Indel Detector (SID) and VarDict. Variants that were called by two callers were retained. Dinucleotide substitution variants (DNVs) were detected by VarDict and retained if any base overlapped a SNV called by Mutect. All called mutations were genotyped in the patient-matched tumor sample. Mutations were annotated with VEP(version 86) and OncoKb. We applied a series of post-processing filters to further remove false positive variants caused by sequencing artifacts and putative germline polymorphisms as previously described24.
KoCH Cohort
Blood-derived DNA was sequenced using a custom panel of 89 genes frequently mutated in CH. All NGS libraries were prepared using the Agilent SureSelect XT HS and XT Low input enzymatic fragmentation kit. Pooled Libraries were sequenced on an Illumina NovaSeq6000 with 2×150bp paired-end reads. Sequencing reads were trimmed with SeqPrep (v0.3) and Sickle (v1.33) and aligned to the human genome (hg19) using BWA-MEM (v0.7.10). PICARD(v1.94) was used for duplicate marking followed by indel realignment and base quality score recalibration with GATK light(v2.3.9). The mean depth of coverage of samples was higher than 800x. Variant calling was performed using SNver(v0.4.1), LoFreq(v0.6.1), GATK UnifiedGenotyper(v2.3.9) for SNVs. For Insertions and deletions in-house InDel caller was used26. The union of all called results were filtered meeting the internal criteria (total reads >=10, Alt reads >=10, positive Alt reads >=5, negative Alt reads >=5, MQV >=30, BQV >=30) and VAF falling between 2% and 30%. Common germline variants were filtered based on genomAD, 1k Genome v3, ESP6500 and ExAC data. Lastly, technical artifact calls with maf >2% were filtered based on an internal panel of 1000 individuals who were CH negative.
Variant Annotation
Variants from the MSK and KoCH cohort were uniformly annotated according to evidence for functional relevance in cancer (putative driver or CH-PD). We annotated variants as oncogenic if they fulfilled any of the following criteria: 1) truncating variants in NF1, DNMT3A, TET2, IKZF1, RAD21, WT1, KMT2D, SH2B3, TP53, CEBPA, ASXL1, RUNX1, BCOR, KDM6A, STAG2, PHF6, KMT2C, PPM1D, ATM, ARID1A, ARID2, ASXL2, CHEK2, CREBBP, ETV6, EZH2, FBXW7, MGA, MPL, RB1, SETD2, SUZ12, ZRSR2 or in CALR exon 9; 2) any truncating mutations (nonsense, essential splice site or frameshift indel) in known tumor suppressor genes as per the Cancer Gene Census, OncoKB, or the scientific literature; 3) translation start site mutations in SH2B3; 4) TERT promoter mutations; 5) FLT3-ITDs; 6) in-frame indels in CALR, CEBPA, CHEK2, ETV6, EZH2; 7) any variant occurring in the COSMIC “haematopoietic and lymphoid” category greater than or equal to 10 times; 8) any variant reported as somatic at least 20 times in COSMIC; 9) any variant noted as potentially oncogenic in an in-house dataset of 7,000 individuals with myeloid neoplasm greater than or equal to 5 times; 10) any loci (defined by the amino acid location) reported as having at least 5 missense mutations and at least one exact mutational match in TopMed6.
Statistical analysis
CH and Covid-19 Severity
We used multivariable logistic regression to evaluate for an association between clonal hematopoiesis and Covid-19 severity adjusting for age (measured as a continuous variable), gender, race, smoking history and co-existing medical comorbidities including diabetes, COPD/asthma and cardiovascular disease all classified as per Table 1. This was done separately for the MSK and KoCH cohorts due to limitations of data sharing. For solid tumor patients at MSK we also adjusted for primary tumor site (thoracic or non-thoracic cancer) and receipt of cytotoxic chemotherapy before and after IMPACT blood draw. We performed a fixed effects meta-analysis of the MSK and KoCH cohorts to jointly estimate the odds ratio for severe Covid-19 among CH positive compared to CH negative individuals.
CH and risk of infection in the MSK cohort
We analyzed billing codes from 14,211 solid tumor patients at MSKCC who had their blood sequenced using MSK-IMPACT. We applied the phecode nomenclature developed at Vanderbilt9 to map ICD-9 and ICD-10 billing codes to infectious disease subtypes. Subjects who were billed using a ICD9/10 code within the phecode for the first time following their sequencing blood draw with evidence of CH were considered to have an incident infection. Those who were billed for an ICD9/10 code within the phecode prior to blood draw were removed from the analysis of that phecode. In order to evaluate the accuracy of the billing code data, the presence of a documented Clostridium Difficile or Streptococcus infection in an EMR physician note was manually checked for patients respectively identified by billing codes (N=525 patients) by three independent physicians using shared criteria for infection onset. Billing codes were highly accurate in identifying the presence of the respective infectious disease (concordance >95%).
We used Cox proportional hazards regression to estimate the hazard ratio for risk of infection among those with CH compared to CH negative individuals. The date of blood draw (used for MSK-IMPACT sequencing) served as the onset date for this time-to-event analysis; the end-date was the date of billing code entry for the infectious disease subtype phecode, death or last follow-up, whichever came first. All models were adjusted for age, gender, race, smoking, tumor type, and cumulative exposure to cytotoxic chemotherapy prior to blood draw and after blood draw as previously described10. Following the 10:1 rule regarding the number of covariates in a multivariable model in proportion to the number of events16, we excluded infection subclasses populated with less than 80 individuals. The analysis utilized multiplicity correction with the Benjamini-Hochberg method to establish adjusted q-values for hazard ratio with a prespecified false-discovery-rate (FDR) <0.10.
All the statistical analyses were performed with the use of the R statistical package (www.r-project.org).
Data Availability
The minimal clinical and genetic data frame used to generate all analyses presented in this paper are available by request and will be made publicly available upon acceptance to a peer-reviewed journal.
Author Contributions
K.L.B, M.F, J.J, Y.K, H.I, C.S, E.K, N.K, A.Z conceived and designed the study. K.L.B, M.F., J.J., A.S, J.M, J.L, J.J, C.K, K.S, P.C, W.P, H.K, M.O, H.S, S.K,E.K, N.K performed collection of clinical data. K.L.B, R.P, T.G, A.S, M.B,Y.K,H.I,C.S, A.Z led the generation and analysis of sequencing data. K.L.B, M.F, J.J., A.S., Y.K, H.I, C.S, S.K,H.S,A.Z, performed statistical analyses and/or participated in data interpretation. All authors contributed to the writing of the manuscript and approved it for submission.
COMPETING INTEREST DECLARATION
The authors declare the following competing interests: K.B. has received research funding from GRAIL; Y.K is a co-founder in Genome Opinion. M.F.B is on the advisory board for Roche and recieves research support from Illumina. R.L.L. is on the supervisory board of Qiagen and is a scientific advisor to Loxo, Imago, C4 Therapeutics and Isoplexis which include equity interest. He receives research support from and consulted for Celgene and Roche and has consulted for Lilly, Janssen, Astellas, Morphosys and Novartis. He has received honoraria from Roche, Lilly and Amgen for invited lectures and from Gilead for grant reviews. A.Z. received honoraria from Illumina. E.P receives research funding from Celgene.D.G and has received honoraria for speaking and scientific advisory engagements with Celgene, Prime Oncology, Novartis, Illumina and Kyowa Hakko Kirin and is a co-founder in Isabl Technologies. M. Ladanyi serves on the advisory boards for AstraZeneca, Bristol Myers Squibb, Takeda, Bayer and Merck, and has received research support from Loxo Oncology and Helsinn Therapeutics. L.A.D. is a member of the board of directors of Personal Genome Diagnostics (PGDx) and Jounce Therapeutics; is a paid consultant to PGDx and Neophore; is an uncompensated consultant for Merck (with the exception of travel and research support for clinical trials); is an inventor of multiple licensed patents related to technology for circulating tumor DNA analyses and mismatch repair deficiency for diagnosis and therapy from Johns Hopkins University, some of which are associated with equity or royalty payments directly to Johns Hopkins and L.A.D.; and holds equity in PGDx, Jounce Therapeutics, Thrive Earlier Detection and Neophore; his wife holds equity in Amgen. The terms of all of these arrangements are being managed by Johns Hopkins and Memorial Sloan Kettering in accordance with their conflict of interest policies. H.I, C.H.S, H.S, S.K are current employees of Genome Opinion and holds stock in the company.
Extended Data Figures/Tables
Supplementary Figure/Tables
Acknowledgements
This work was supported by the National Institute of Health (K08CA241318 to K.L.B, P50 CA172012 to L.B), American Society of Hematology (K.L.B), EvansMDS Foundation (K.L.B.), European Hematology Association (E.P.), Gabrielle’s Angels Foundation (E.P.), V Foundation (E.P.), Geoffrey Beene Foundation (E.P), Starr Cancer Consortium (to R.L., A.Z., M.B, R.P.), and the Cancer Colorectal Cancer Dream Team Translational Research Grant (SU2C-AACR-DT22-17 to L.D.). E.P. is a Josie Robertson Investigator. M.M is supported by funds from the Intramural Research Program of the National Cancer Institute, National Institutes of Health. The KoCH cohort was supported through a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI14C1277). We thank the Global Science experimental Data hub Center (GSDC) and Korea Research Environment Open NETwork (KREONET) service for data computing and network provided by the Korea Institute of Science and Technology Information (KISTI).Work performed at Memorial Sloan Kettering Cancer Center was supported in part by the Cancer Center Support Grant (grant no. P30 CA008748), the Marie Josee and Henry R Kravis Center for Molecular Oncology, Cycle for Survival, and MSK Molecular Diagnostics Service.
Footnotes
↵* Denotes equal contribution.