Summary
Background As the number of new and recovering COVID-19 cases continues to rise, it has become evident that patients can experience symptoms and complications after viral clearance. Clinical biomarkers characterizing patients who are likely to experience these prolonged effects are unknown.
Methods We conducted a retrospective study to compare longitudinal lab test measurements (hemoglobin, hematocrit, estimated glomerular filtration rate, serum creatinine, and blood urea nitrogen) in patients rehospitalized after PCR-confirmed SARS-CoV-2 clearance (n=104) versus patients not rehospitalized after viral clearance (n=278).
Findings Compared to patients who were not rehospitalized after PCR-confirmed viral clearance, those who were rehospitalized had lower median hemoglobin levels in the year prior to COVID-19 diagnosis (cohen’s D = -0.50; p=1.2×10−3) and during the active infection window (cohen’s D = -0.71; p=4.6×10−8). Patients hospitalized after viral clearance were also more likely to be diagnosed with moderate or severe anemia during the active infection window (OR = 2.18; p = 4.99×10−9).
Conclusions The occurrence of moderate or severe anemia in hospitalized COVID-19 patients is strongly associated with rehospitalization after viral clearance. Whether interventions to mitigate anemia can improve long term outcomes of COVID-19 patients should be further investigated.
Funding This study was funded by nference.
Introduction
Since the first diagnosed case of COVID-19 in December 2019, over 95 million people have been infected with SARS-CoV-2 worldwide resulting in over 2 million deaths. While significant progress has been made in understanding the pathogenesis of COVID-19, including the rapid development and clinical rollout of multiple vaccines (Folegatti et al., 2020, Jackson et al., 2020, Corbett et al., 2020, Mercado et al., 2020, Bos et al., 2020, Mulligan et al., 2020) along with detailed characterizations of the SARS-CoV-2 entry receptor ACE2 (Venkatakrishnan et al., 2020, Anand et al., 2020, Ziegler et al., 2020, Zhao et al., 2020, Singh et al., 2020), there are still few options available for effective treatment of patients with severe COVID-19. Further, as the pandemic has progressed, there have been reports of long-lasting effects of COVID-19 even in patients who did not experience a severe disease course during their active period of infection (Yelin et al., 2020, del Rio et al., 2020, Carfì et al., 2020). However, the clinical, molecular, and demographic biomarkers characterizing patients who are more likely to experience these lasting effects after clearing SARS-CoV-2 are not yet known.
The need to answer such questions during the rapidly evolving COVID-19 pandemic has emphasized the requirement for tools facilitating real-time analysis of patient data as it is obtained and stored in large electronic health records (EHR) systems. Specifically, clinical research efforts to understand the features defining COVID-19 patients, or subsets thereof, fundamentally require reliable systems that enable (1) conversion of unstructured information (e.g., patients notes written by healthcare professionals) into structured formats suitable for downstream analysis and (2) temporal alignment and integration of such unstructured data with the already structured information available in EHR databases (e.g., lab test results, disease diagnosis codes).
With these requirements in mind, we have previously reported the development of augmented curation methods that enable the rapid creation and comparison of defined cohorts of COVID-19 patients within a large EHR system (Pawlowski et al., 2020, Wagner et al., 2020). Here we expand on our prior textual sentiment-based analysis (Pawlowski et al., 2020) to understand the clinical features of patients likely to experience lasting effects of COVID-19, and we find that lab tests support our previously derived hypothesis that anemia and kidney malfunction during active SARS-CoV-2 infection may serve as biomarkers of patients who are more likely to be rehospitalized after PCR-confirmed viral clearance.
Results
Longitudinal analysis of laboratory measurements provides a framework to test hypotheses derived from unstructured electronic health records
Using NLP-based extraction of phenotypes from a large EHR system, we previously reported that COVID-19 patients who are rehospitalized after PCR-confirmed viral clearance were more likely to experience anemia and acute kidney injury (AKI) in the year prior to their diagnosis and during their PCR-positive phase of COVID-19 compared to patients who were not rehospitalized after clearance of SARS-CoV-2 (Pawlowski et al., 2020). Here, we sought to evaluate whether diagnostic lab tests for AKI and anemia corroborate these phenotypic associations in a larger patient cohort.
We split the set of hospitalized COVID-19 patients with confirmed viral clearance (n=382) into two groups: (1) post-clearance hospitalized (“PCH”; n=104) and (2) post-clearance non-hospitalized (“PCNH”; n=278), where viral clearance was defined by two consecutive negative SARS-CoV-2 PCR tests following a positive test (Figure 1A-B). A demographic summarization of these two cohorts is provided in Table 1. We then compared a set of selected lab test results between these cohorts during two time windows: (1) the year prior to COVID-19 diagnosis (“pre-COVID phase”) and (2) the time during which each patient was SARS-CoV-2 positive according to their PCR results (“SARS-CoV-2+ phase”). Given our previous EHR-based findings, we considered both anemia-related and kidney function lab tests including hemoglobin, hematocrit, estimated glomerular filtration rate (eGFR), serum creatinine, and serum blood urea nitrogen (BUN) levels (Figure 1C).
Rehospitalized patients show pathologic alterations in hemoglobin, hematocrit, eGFR, and BUN before COVID-19 diagnosis and during active infection
For each patient, we first considered the median values of each lab test over the designated interval. Consistent with our previous augmented curation derived findings, PCH patients had significantly lower median hemoglobin and hematocrit levels in both the pre-COVID phase (cohen’s D = -0.50, p=1.2×10−3; cohen’s D = -0.48, p=2.5×10−3) and the SARS-CoV-2+ phase (cohen’s D = -0.71, p=4.6×10−8; cohen’s D = -0.69, p=8.5×10−8) (Table 2, Figures 2A-D). Further, PCH patients had lower median eGFR and higher median BUN levels during the pre- COVID phase (cohen’s D = -0.46, p = 0.02; cohen’s D = -0.45, p=1.2×10−3) and the SARS-CoV-2+ phase (cohen’s D = 0.46, p = 0.01; cohen’s D = 0.42, p=8.9×10−6) (Table 2, Figure S1).
We also tested whether extreme (i.e. minimum or maximum) values of a given lab test over the designated periods varied between PCH and PCNH patients, as a measure of central tendency (e.g., median) may fail to capture a single occurrence of phenotypes such as anemia or AKI. PCH patients had lower minimum values of hemoglobin, hematocrit, and eGFR in both the pre-COVID phase (cohen’s D = -0.49, p=2.8×10−3; cohen’s D = -0.45, p=3.0×10−3; cohen’s D = - 0.57, p=3.0×10−3) and the SARS-CoV-2+ phase (cohen’s D = -0.85, p=1.6×10−10; cohen’s D = - 0.79, p=1.2×10−9; cohen’s D = -0.51, p=4.4×10−4). They also had higher maximum serum BUN levels during both the pre-COVID phase (cohen’s D = 0.50, p=6.6×10−4) and the SARS-CoV-2+ phase (cohen’s D = 0.60, p=5.2×10−8) (Tables 3-4, Figure 3 and Figure S2).
Taken together, these analyses corroborate our prior textual sentiment-based EHR findings, suggesting that patients who are rehospitalized after SARS-CoV-2 clearance are more likely to have pathologically altered anemia-related and renal function lab tests both prior to and during SARS-CoV-2 infection.
Post-clearance rehospitalized patients have lower hemoglobin and hematocrit before and during SARS-CoV-2 infection regardless of sex
As males and females have different normal ranges of hemoglobin and hematocrit, we performed sex-split subanalyses of anemia-related lab tests similar to those described above. Patient-level median hemoglobin and hematocrit during the pre-COVID phase were significantly lower in both the female (cohen’s D = -0.66, p=0.01; cohen’s D = -0.67, p=0.01) and male (cohen’s D = -0.42, p=0.02; cohen’s D = -0.37, p=0.05) PCH cohorts versus their PCNH counterparts (Tables 5-6, Figures 4A-D). These trends were even stronger during the SARS-CoV-2+ phase among both the female (cohen’s D = -0.85; p= 7.0×10−6; cohen’s D = -0.91, p=7.0×10−6) and male (cohen’s D = -0.60; p=8.2×10−4; cohen’s D = -0.53, p=1.8×10−3) cohorts (Tables 5-6, Figures 4E-H).
Similarly, in our analysis of extreme values, minimum hemoglobin and hematocrit measurements during the pre-COVID phase tended to be lower in both female (cohen’s D = - 0.60; p=0.01; cohen’s D = -0.58, p=0.01) and male (cohen’s D = -0.42; p=0.05; cohen’s D = -0.38, p=0.07) PCH patients (Tables 7-8, Figures 4I-L). These trends were again even stronger in the SARS-CoV-2+ phase among both females (cohen’s D = -1.02; p=7.5×10−7; cohen’s D = -0.97, p=1.8×10−6) and males (cohen’s D = -0.74; p=1.1×10−4; cohen’s D = -0.66, p=2.7×10−4) (Tables 7-8, Figures 4M-P).
Post-clearance rehospitalized patients are more likely to have experienced anemia and AKI before COVID-19 diagnosis and during active infection
We next evaluated whether outright anemia occurred more frequently in the PCH cohort than the PCNH cohort (see Methods and Figure 5A). Anemia was indeed observed more frequently in the PCH cohort during both the pre-COVID phase (39/55 [71%] vs. 61/136 [45%]; OR=1.58; p=0.001) and the SARS-CoV-2+ phase (93/102 [91%] vs. 202/272 [74%]; OR=1.23; p=2.02×10−4) (Figures 5B-C). Further, the prevalence of moderate or severe anemia was higher in the PCH cohorts during both the pre-COVID phase (13/55 [31%] vs. 19/136 [14%]; OR=1.69; p=0.13) and the SARS-CoV-2+ phase (63/102 [62%] vs. 77/272 [28%]; OR=2.182; p=4.99×10−9), although the former observation did not reach statistical significance (Figures 5D-E).
Similarly we assessed whether laboratory-diagnosed AKI occurs more frequently in the PCH cohort based on the creatinine-related components of the KDIGO (Kidney Disease: Improving Global Outcomes) criteria for diagnosis and staging of AKI in adults (Figure S3A) (Khwaja, 2012). Any stage AKI was indeed more common in the PCH cohort than the PCNH cohort in both the pre-COVID phase (30/69 [43%] vs 45/198 [23%]; OR=1.91; p=2.0×10−3) (Figure S3B) and the SARS-CoV-2+ phase (46/103 [45%] vs. 64/272 [24%]; OR=1.90; p=1.19×10−4) (Figure S3C). Further, stage 2+ AKI was more common in the PCH cohort during both the pre- COVID phase (21/69 [30%] vs. 21/198 [11%]; OR=2.87; p=2.24×10−4) and the SARS-CoV-2+phase (23/103 [22%] vs. 24/272 [9%]; OR=2.53; p=7.91×10−4) (Figure S3D-E). Intriguingly when split by sex, we found that this trend was driven by males, as PCH males were more likely to experience stage 2+ AKI in both the pre-COVID phase (14/37 [38%] vs. 13/109 [12%]; OR=3.17; p=0.001) and the SARS-CoV-2+ phase (15/53 [28%] vs. 14/158 [9%]; OR=3.19; p=9.15×10−4) (Figure S3F-G), whereas this was not true when comparing PCH and PCNH females (data not shown).
Anemia is strongly associated with post-clearance rehospitalization independent of ICU admission status and other covariates
To test whether the previous observations were affected by potential confounding demographic or clinical covariates, we performed a series of logistic regression analyses. First we evaluated the association between post-clearance rehospitalization and the following independent variables during the pre-COVID and SARS-CoV-2+ phases, separately: minimum hemoglobin, maximum BUN, sex, number of blood draws, and ICU admission status (for the SARS-CoV-2+ phase only) (Table 9). The consideration of ICU admission status as a covariate was particularly important because ICU admission was more common among PCH than PCNH patients (Table 1). While none of these variables in the pre-COVID phase showed a significant association with rehospitalization status, minimum hemoglobin during the SARS-CoV-2+ phase was singularly associated with rehospitalization (β=-0.29, p=3.0×10−5).
We then modified our logistic regression analysis by replacing the minimum hemoglobin and maximum BUN terms with binary labels of moderate/severe anemia and stage 2+ AKI, respectively (Table 10). Among the tested pre-COVID variables, only stage 2+ AKI was modestly associated with rehospitalization status (β=0.99, p=0.04). ICU admission during the SARS-CoV- 2+ phase was also modestly associated with post-clearance rehospitalization (β=0.70, p=0.05), suggesting that patients with more severe courses of initial COVID-19 illness are more likely to experience subsequent hospitalization. However, moderate/severe anemia during the SARS- CoV-2+ phase was again the most strongly associated variable with rehospitalization (β=1.16, p=6.0×10−5).
Finally we performed a split cohort subanalysis to specifically test whether the robust association between anemia and rehospitalization status applied to both patients who were and were not admitted to the ICU during their index infection. Indeed, the rate of moderate/severe anemia was significantly higher in PCH patients than PCNH patients when considering only those were were admitted to the ICU (59% [38/64] vs. 19% [10/53], OR=2.10, p=1.09×10−5) or only those who were not admitted to the ICU (33% [25/76] vs. 16% [29/181], OR=1.84, p=4.0×10−3) (Figures S4A-B).
Discussion
Approximately one year after the first confirmed case, the COVID-19 pandemic continues to ravage communities across the globe. While efforts early in the pandemic rightly focused on the acute lung inflammation caused by SARS-CoV-2, the subsequent realization that COVID-19 may have more lasting effects has mandated a better understanding of factors that predispose patients to experience long-term COVID-19 related complications. We have previously sought to address this knowledge gap using state-of-the-art NLP models deployed on a complete EHR system (Pawlowski et al., 2020), and here we have expanded this effort to include the longitudinal analysis of laboratory measurements both prior to COVID-19 diagnosis and during active SARS-CoV-2 infection.
This lab test analysis shows that anemia and renal function in the pre-COVID and SARS-CoV-2+ phases are associated with the risk of post viral clearance hospitalization. Our logistic regression analyses suggest that AKI and static renal lab measurements are not independently associated with rehospitalization, with ICU admission during COVID-19 infection representing a likely confounding factor contributing to the observed trends. Indeed, AKI has previously been reported as a common morbidity in ICU patients (Mohsenin, 2017) and was observed frequently among ICU admitted COVID-19 patients in our cohort, with 78% (94/121) and 39% (47/121) of ICU admitted patients experiencing stage 1+ and stage 2+ AKI, respectively. On the other hand, hemoglobin levels and the outright diagnosis of moderate or severe anemia are robustly associated with post-clearance rehospitalization independent of sex, number of blood draws, and ICU admission status.
While the pathophysiologic foundations for these associations are not yet clear, the findings do merit consideration in the context of COVID-19 clinical care. Indeed, pre-existing conditions are already integrated in the clinical decision-making algorithms around COVID-19 as the Center for Diseases Control (CDC) has designated various chronic conditions as risk factors for severe COVID-19 infection (e.g. cancer, chronic kidney disease, chronic obstructive pulmonary disease, and cardiovascular diseases such as heart disease, obesity, and diabetes) (CDC, 2020). However, there is much less known regarding factors or conditions that place people at risk for subsequent complications such as rehospitalization after viral clearance. Once identified, such factors and conditions should similarly be incorporated into the clinical decision-making process when treating COVID-19 patients.
Our finding that lower hemoglobin and hematocrit levels, and the outright diagnosis of moderate or severe anemia, prior to or during active SARS-CoV-2 infection is associated with post viral clearance hospitalization has not been previously reported. And while sickle cell disease is considered a risk factor for severe COVID-19, anemia itself is not considered to be such a risk factor. While a causal role for anemia in post clearance hospitalization is certainly not established by this analysis, the robust association warrants further studies to determine the whether anemia mitigating therapies (e.g., vitamin or mineral supplementation, erythropoietin administration, or blood transfusion) engender long-term benefits in select COVID-19 patients. Further, even if mitigation of anemia does not impact subsequent hospitalization, these links between anemia and COVID-19 are interesting in light of several previous lines of research.
First, several groups have reported an association between blood groups and susceptibility to or severity of COVID-19 infection (Latz et al., 2020, Ellinghaus et al., 2020). Specifically, it has been suggested that individuals with type O blood may be at lower risk for contracting COVID-19 or experiencing respiratory failure in the context of COVID-19. Whether this association reflects a direct or indirect interaction between SARS-CoV-2 and erythrocytes is not known, but it could certainly be relevant to pursue whether blood type is also associated with the occurrence or severity of anemia in the setting of COVID-19.
Second, fatigue has been commonly reported as both an acute symptom and a lasting effect of COVID-19 (Wagner et al., 2020, Pascarella et al., 2020, Townsend et al., 2020), but the mechanisms underlying this phenotype have not been established. It is worth noting that 295 of the 374 (79%) hospitalized COVID-19 patients in this study had at least mild anemia during their SARS-CoV-2+ phase, and 140 of the 374 (37%) patients had moderate or severe anemia (defined as hemoglobin < 10 g/dL) during this interval. It would be worthwhile to perform a longitudinal follow-up on these patients to determine whether they continue to experience anemia in the months following SARS-CoV-2 clearance, and whether the presence of such a post-COVID anemia is associated with reports of fatigue.
Along with our previous analysis (Pawlowski et al., 2020), this study illustrates the value of deploying sophisticated platforms across EHR systems that enable the integrated analysis of diverse data types including sentiment-laden text and laboratory test measurements. Taken together, these studies provide the first example of leveraging augmented curation methods to first identify phenotypes that distinguish defined clinical cohorts and to then cross check these phenotypic associations through a hypothesis-driven analysis of the most relevant lab tests. This framework can be effectively scaled for other clinical research efforts not only in COVID-19 but also in any other disease areas of interest.
Limitations of the Study
This study has several important limitations to consider. First, this analysis considers only patients within one EHR system; while this system does contain patient data from multiple sites of clinical care in distinct geographic locations (Minnesota, Arizona, Florida), there are still likely underlying biases in important factors such as patient demographics and tendencies around the ordering of laboratory tests by clinicians. Such biases would prevent the studied cohort and their associated data points from serving as true representative samples of all COVID-19 patients. Second, the analyzed cohort was relatively small (n = 382) as most patients diagnosed with COVID-19 do not subsequently receive two confirmatory negative PCR tests, and only a subset of these 382 patients possessed data for each lab test of interest. Finally, the definition of the SARS-CoV-2+ window is imperfect as the true date of viral clearance for a given patient would likely precede their first negative PCR test by an unknown amount of time.
Resource Availability
After publication, the data will be made available to others upon reasonable requests to the corresponding author (venky{at}nference.net). A proposal with a detailed description of study objectives and the statistical analysis plan will be needed for evaluation of the reasonability of requests. Deidentified data will be provided after approval from the corresponding author and Mayo Clinic.
Methods
Study design
This was a case-control study. The primary outcome was rehospitalization status within 30 days of PCR-confirmed SARS-CoV-2 clearance. The exposure variables were anemia and kidney dysfunction as assessed through selected lab measurements detailed below.
Selection of study participants
Cases and controls were selected from a cohort of 66,689 patients who presented to the Mayo Clinic Health System (including tertiary medical centers in Minnesota, Arizona, and Florida) and received at least one positive SARS-CoV-2 PCR test between the start of the COVID-19 pandemic and December 12, 2020 (see Figure 1B). Post clearance hospitalized (“PCH”) cases (n=104) were defined as patients who were hospitalized for COVID-19, had two documented negative SARS-CoV-2 PCR tests following their last positive test result, and were subsequently admitted to the hospital within 30 days of clearance. Post clearance non-hospitalized (“PCNH”) controls (n=278) were defined as those who were hospitalized for COVID-19, had two documented negative SARS-CoV-2 PCR tests following their last positive test result, and were not hospitalized within 30 days of clearance. Demographic and clinical features of the PCH and PCNH cohorts are summarized in Table 1.
Definition of the considered time intervals: pre-COVID phase and SARS-CoV-2+ phase
Laboratory results were assessed (1) during the year prior to COVID-19 diagnosis, referred to throughout this manuscript as the “pre-COVID phase” and (2) during the period in which a patient was positive for SARS-CoV-2 by PCR, referred to throughout this manuscript as the “SARS-CoV-2+ phase.” COVID-19 diagnosis was conferred by a positive SARS-CoV-2 PCR test, and clearance was defined as two consecutive negative SARS-CoV-2 PCR tests occurring after a positive test. The estimated viral clearance date was taken as the date of the first negative PCR test in this sequence of two consecutive negative tests.
Selection and summarization of laboratory measurements
The primary exposure variables were anemia and AKI. The selected laboratory measurements related to anemia included hemoglobin and hematocrit, and laboratory measurements related to AKI included serum creatinine, serum blood urea nitrogen (BUN), and estimated glomerular filtration rate (eGFR). The majority of eGFR measurements (∼96%) were estimated by creatinine; these tests had a maximum recorded value of 90 mL/min/BSA, which corresponds to the lower limit of normal. The remaining 4% of eGFR measurements were estimated by cystatin C levels; for these tests, a value above 90 mL/min/BSA was possible and was indeed recorded in 5 of 31 cases.
For a given lab test, we considered the median, maximum, and minimum measurements for each patient during the specified time windows (i.e. the pre-COVID and SARS-CoV-2+ phases). Histograms showing the number of measurements per patient in each time period for the selected tests are shown in Figures S5-S6. Given the directionality of these tests (i.e. anemia is defined by low hemoglobin and hematocrit, while kidney dysfunction is characterized by increases in serum creatinine and BUN but a decrease in eGFR), we were primarily interested in comparing the patient-level minimum values of hemoglobin, hematocrit, and eGFR, and maximum values of serum creatinine and BUN in each time period.
Consideration of potential confounding variables
As shown in Table 1, there were no statistically significant differences between these groups in age, relative cleared date (defined as the time to the first negative SARS-CoV-2 PCR test in a series of two consecutive negative tests after the last positive test), race, ethnicity, or sex. However, we did note that a higher fraction of PCNH cases were male as compared to PCH counterparts (58% vs. 52%). This potential confounding factor was addressed by performing (1) sex-split subgroup analyses (Tables 5-8) and (2) multivariate logistic regression (Tables 9-10; see Statistics below).
Although hospitalization during index infection was required for inclusion in both the PCH and PCNH cohorts, this criterion does not necessarily ensure comparable severities of index infection. To better assess potential differences in index infection severity, we compared the rates of ICU admission and found this to be significantly higher in the PCH cohort compared to the PCNH cohort (48/104 [46%] vs. 73/278 [26%], p=4.0×10−3; Table 1). Further, patients admitted to the ICU during index infection had slightly lower hemoglobin measurements during the SARS-CoV-2+ phase than patients not admitted to the ICU (cohen’s D = -0.27, p=0.01; Figure S7). Thus, we considered ICU admission as a potential confounding factor in our analyses. We addressed this by performing (1) subgroup analyses to determine whether differences between the PCH and PCNH cohorts were observed both in patients who were and were not admitted to the ICU (Figure S4) and (2) multivariate logistic regression (Tables 9-10; see Statistics below).
We observed that patients in the PCH cohort were more likely to experience anemia in both the pre-COVID and SARS-CoV-2+ phases than patients in the PCNH cohort. Because hospitalized patients can experience anemia due to repeated blood draws for laboratory testing, we also considered the number of blood draws per patient as a potential confounding variable. To address this, we performed multivariate logistic regression (Tables 9-10; see Statistics below).
Classification of patients using clinical diagnostic criteria for anemia and AKI
We classified patients in a binary fashion for each time window based on whether their lab tests were consistent with the clinical diagnosis of anemia or acute kidney injury. Classifications were defined according to the Mayo Clinic reference ranges for anemia and the KDIGO (Kidney Disease: Improving Global Outcomes) criteria for AKI (Khwaja, 2012) as follows (see also Figure 5A and Figure S3A):
Anemia (mild, moderate, or severe): for males, median hemoglobin < 13.5 g/dL or median hematocrit < 38.3%. For females, median hemoglobin < 12.0 g/dL or median hematocrit < 35.5%.
Anemia (moderate or severe): for both males and females, median hemoglobin < 10.0 g/dL.
Acute kidney injury (stage 1, 2, or 3): increase in serum creatinine by ≥0.3 mg/dL within 48 hours or an increase in serum creatinine to ≥1.5x the baseline value which is known or assumed to have occurred in the prior 7 days. The baseline was defined as the minimum value among all serum creatinine tests for the given patient in the prior 7 days.
Acute kidney injury (stage 2 or 3): increase in serum creatinine to ≥2x the baseline value which is known or assumed to have occurred in the prior 7 days, or a serum creatinine value of ≥4 mg/dL. The baseline was defined as the minimum value among all serum creatinine tests for the given patient in the prior 7 days.
Quantification of number of blood draws per patient
To test whether trends in anemia-related measurements could be explained by differences in the number of blood draws received in the pre-COVID phase or SARS-CoV-2+ phase, we counted the number of blood draws in these time intervals for each patient. All tests with a documented source of “Blood”, “Plasma”, or “Serum” were first collected for each patient. For a given patient on a given day, we then took the count of the most frequently obtained test as the number of blood draws for that patient on that day. For example, if the record for Patient P on Day D contained 5 serum sodium measurements, 3 hemoglobin measurements, and 1 plasma IL-6 measurement, then we inferred that Patient P received 5 blood draws on Day D.
Statistics
Laboratory values were assessed within each time interval as patient-wise medians, minima, or maxima. To perform statistical comparisons between the PCH and PCNH cohorts, one-sided Mann-Whitney U-tests and Cohen’s D were applied to continuous outcome measures, generating a p-value and an effect size measurement. The distribution of patient-wise median, minimum, and maximum values obtained for each laboratory measurement among this cohort were assessed with a Kolmogorov-Smirnov (KS) Test of Normality (Figures S8-S10). As these measurements did not follow a normal distribution (KS Test p-value < 0.05), the non-parametric Mann-Whitney U test was chosen for statistical comparisons. A one-sided test was used because these comparisons were performed as follow-up to our previous EHR-based analysis which found a higher prevalence of anemia and kidney injury in the PCH cohort (Pawlowski et al., 2020), providing a pre-supposed direction of change for each tested laboratory measurement. For each set of comparisons performed, p-values were corrected using a Benjamini-Hochberg (BH) correction for multiple hypothesis testing. Differences were considered statistically significant and biologically relevant if the BH-corrected p-value was ≤ 0.05 and the cohen’s D magnitude was ≥ 0.4. Fisher exact tests were applied to categorical outcome measures, generating a p-value and an Odds Ratio. All of the tests described above were applied using the SciPy package (Virtanen et al., 2020) in Python (version 3.5).
To address potential confounding variables that may be related to the observed trends in laboratory measurements, we performed multivariate logistic regressions for both the pre-COVID and SARS-CoV-2+ phases (Tables 9-10). For each regression, the binary dependent variable was defined as post viral clearance hospitalization status (i.e. assignment to the PCH cohort vs. PCNH cohort), and the independent variables included one anemia metric and one renal function metric along with sex (binary), number of blood draws (continuous) in the given time interval, and ICU admission status during that time interval (binary). Stated explicitly, the logistic regression equation was as follows: log(PPCH/(1 - PPCH)) = β0 + β1*(Anemia Metric) + β2*(Renal Function Metric) + β3*(Sex) + β4*(Blood Draw Count) + β5*(ICU Admission Status). In one set of logistic regressions, the anemia and renal function metrics employed were minimum hemoglobin (continuous) and maximum BUN (continuous), respectively (Table 9). In another set, the anemia and renal function metrics employed were diagnosis status for moderate or severe anemia (binary) and diagnosis status for stage 2+ AKI (binary), respectively (Table 10). Binary variables were assigned as follows: anemia: 0 = no anemia or mild anemia, 1 = moderate or severe anemia; AKI: 0 = no AKI or stage 1 AKI, 1 = stage 2+ AKI; sex: 0 = female, 1 = male; ICU admission during interval: 0 = not admitted to ICU, 1 = admitted to ICU. Of note, data regarding ICU admission status was not available prior to April 2020, so this feature was omitted from the pre-COVID regression analyses.
For each time interval, two regressions were performed: one using the continuous metrics for anemia and AKI, and one using the binarized metrics for anemia and AKI. We performed separate regressions for the continuous and binarized anemia and AKI terms rather than including all terms in one model because these metrics are strongly correlated with each other, and multicollinearity of independent variables can negatively impact the estimation of logistic regression coefficients. Each regression yielded a coefficient (log odds ratios) and p-value (calculated using the log likelihood ratio test) for each independent variable. P-values were adjusted within the output of each model using the Bonferroni correction: pBonf = 1-(1-p)n. Here, n corresponds to the number of independent variables tested (4 for pre-COVID models, 5 for SARS-CoV-2+ models). All regressions were performed using the Statsmodels package in python (Seabold and Perktold, 2010).
Study Approval
This retrospective study was reviewed and approved by the Mayo Clinic Institutional Review Board (IRB 20-003278) as a minimal risk study. Subjects were excluded if they did not have a research authorization on file.
Data Availability
After publication, the data will be made available to others upon reasonable requests to the corresponding author. A proposal with a detailed description of study objectives and the statistical analysis plan will be needed for evaluation of the reasonability of requests. Deidentified data will be provided after approval from the corresponding author and Mayo Clinic.
Author Contributions
PL, ER, AV, RM, ADB, and VS contributed to the study design and methodology. ER, GB, and JCO were responsible for data curation. ER performed the formal statistical analyses, which were reviewed by PL, AV, RM, and VS. PL drafted the manuscript with inputs from ER. AV, RM, JCO, ADB, WH, JH, and VS provided critical revisions, which were incorporated into the final manuscript by PL. All authors approved the submitted version of the manuscript.
Declaration of Interests
PL, ER, AV, GB, RM, and VS are employees of nference and have financial interests in the company. JO, WM, and JH have financial conflicts of interest in technology used in the research and with Mayo Clinic may stand to gain financially from the successful outcome of the research. ADB is a consultant for Abbvie, is on scientific advisory boards for nference and Zentalis, is founder and President of Splissen therapeutics, and has financial conflicts of interest in technology used in the research and with Mayo Clinic may stand to gain financially from the successful outcome of the research.
Supplemental Information
Acknowledgements
We thank Murali Aravamudan for his thoughtful review and feedback on this manuscript. We also thank Andrew Danielsen, Jason Ross, Jeff Anderson, and Sankar Ardhanari for their support that enabled the rapid completion of this study. The authors acknowledge funding from nference for this study.
Footnotes
Updated manuscript with a larger and more up-to-date dataset that more strongly confirms the papers originals conclusions