ABSTRACT
Objective To derive a predicted probability of death (PDeathLabs) based upon complete value sets for 11 clinical measurements (CM) obtained on patients prior to their diagnosis of COVID-19. PDeathLabs is intended for use as a summary metric for baseline metabolic status in multivariate models for COVID-19 death.
Methods Cases were identified through the COVID-19 Shared Data Resource (CSDR) of the Department of Veterans Affairs. The diagnosis required at least one positive nucleic acid amplification test (NAAT). The primary outcome was death within 60 days of the first positive test. We retrieved all values for systolic blood pressure (SBP), diastolic blood pressure (DBP), O2 saturation (O2SAT), body mass index (BMI), estimated glomerular filtration rate (EGFR), alanine aminotransferase (ALT), serum albumin (ALB), hematocrit (HCT), LDL cholesterol (LDL) hemoglobin A1c (A1C), and HDL cholesterol (HDL) if they were done at least 14 days prior to the NAAT. Clinicians evaluate several attributes of CM that are of critical importance: metabolic control, disease burden, chronicity, refractoriness, tendency to relapse, temporal trends, and lability. We derived 1-3 parameters for each of these attributes: the most recent value (metabolic control); time-weighted average and abnormal area under a severity versus time curve (disease burden); time and number of readings above or below goal (chronicity); longest abnormal cluster and time/number of consecutive readings above goal if the last value was abnormal (refractoriness); number of abnormal clusters (tendency to relapse); long- and short-term changes (temporal trends); and coefficient of variation and mean deviation between consecutive readings (lability). We created computer programs to derive cumulative values for these 13 parameters for all 11 CM as each new value is added. A fitted logistic model was developed for each CM to determine which of the 13 parameters contributed to the risk of death. A main logistic model was developed to determine which of the 13 * 11 = 143 metabolic parameters were independently predictive of death. The resulting model was used to derive PDeathLabs for each patient and the area under its receiver operating characteristic (ROC) curve calculated. Single variable logistic models were also derived for age at diagnosis, the Charlson 2-year (Charl2Yr) and lifetime (CharlEver) scores, and the Elixhauser 2-year (Elix2Yrs) and lifetime (ElixEver) scores. Stata was used to compare the ROCs for PDeathDx and each of the other metrics.
Results On September 30, 2021 there were 347,220 COVID-19 patients in the CSDR. 329,491 (94.9%) patients had CM performed at least 14 days prior to the COVID-19 diagnosis and form the basis for this report. 17,934 (5.44%) died within 60 days of the diagnosis. On the subset regressions, the number of significant parameters ranged from all 13 for SBP to 7 for HDL. 239,393 patients had complete sets of data for developing the main model. Of 143 candidate predictors, 49 parameters were identified as statistically significant, independent predictors of death. The most influential domains were the most recent value, disease burden, temporal trends, and tendency to relapse. The ROC area for PDeathLabs was 0.785 +/- 0.002. No difference was found in the ROC areas of PDeathLabs and age at diagnosis (0.783 +/- 0.002; P = NS). However, the ROC area for PDeathLabs was significantly greater than that of Charl2Yrs (0.704 +/- 0.002; P < 0.001), CharlEver (0.729 +/- 0.002; P < 0.001), Elix2Yrs (0.675 ± 0.002; P < 0.001), and ElixEver (0.707 +/- 0.002; P < 0.001). A poor prognosis was found for chronic systolic hypertension. On the other hand, a higher BMI was protective once SBP, DBP, HDL, LDL and A1C were considered.
Conclusions Our study confirms that parameters derived for 11 CM are significant determinants of COVID-19 death. The most recent value should not be selected over other parameters for multivariate modeling unless there is a physiologic basis for doing so. PDeathLabs has the same discriminating power as age at diagnosis and outperforms comorbidity indices as a summary metric for pre-existing conditions. If validated by others, this approach provides a robust approach to handling CM in multivariate models.
INTRODUCTION
Many mathematical models have been developed for predicting outcomes of COVID-19 infection (1-5). They have gained popularity because they are very useful for managing patients and allocating scarce resources. Most are composed of demographic traits, co-morbidity scores, and pre-existing conditions grouped under headings like “malignancy”. Clinical measurements (CM) have not been prominent features of these models. CM provide valuable information about the metabolic status of patients at baseline and include vital signs and routine laboratory tests. Some models have none (1, 5), while others include only one (usually the most recent) of the hundreds of measurements that might be available for each type (2-4). However, there are convincing arguments to expand the use of CM in these prediction rules:
a) A diagnosis informs us that the patient has a certain disease, while CM tell us the burden imposed by its risk factors or the disease itself. The two are not synonymous. For example, one patient with hypertension might have had severely elevated blood pressures (BPs) for decades, while another might be normotensive because of successful treatment. The first would be considered at high risk while the second is “normal”. On the other hand, a patient may not have been given the diagnosis of hypertension even though her BPs are abnormal for a term pregnancy. Classification errors such as these are very common in clinical practice.
b) CM provide some insight into the mechanism of injury. For example, suppose chronic hypoxia is more closely associated with death than acute hypoxia. This observation suggests that cumulative injury over time (such as with secondary pulmonary hypertension) is more influential than the acute metabolic burden imposed by low oxygen levels at presentation.
c) CM add explanatory variables to a prediction model. These components are critical for identifying targets for interventions to reduce the risk. For example, causality might be inferred by a dose-response effect between the CM and risk of the death. The model becomes “hypothesis-generating” and could lead to other studies confirming the mechanism of injury or clinical trials to reduce the risk. Prediction rules rarely define causality in a way that points to a further course of action.
CM are routinely included in periodic health examinations and available for testing in prediction models. They include systolic (SBP) and diastolic blood pressure (DBP), oxygen level (O2Sat), body mass index (BMI), estimated glomerular filtration rate (EGFR), alanine aminotransferase (ALT), serum albumin (ALB), hematocrit (HCT), hemoglobin A1c (A1C), and low- (LDL) and high-density lipoprotein cholesterol (HDL). The major problem is the lack of a systematic approach to selecting the appropriate parameter for each type. There is no evidence that the most recent value of any type is the most influential measurement even though it is common practice to include them in models as a representative value. For example, if microvascular disease mediates the risk posed by diabetes, the most recent A1c is the least suitable measure of tissue glycation. The purpose of this manuscript is to validate a method for summarizing the effects of hundreds of measurements done on each patient for each CM and to evaluate the effects of these parameters on prognosis.
Our approach emulates the way that clinicians interpret CM to reach conclusions about the patient’s metabolic status. Unlike prediction rules, they rarely interpret individual values for CM out of context. In fact, conclusions are often based upon a review of neighboring values or even other domains of the medical record. Moreover, clinicians often render judgements about several attributes when assessing a CM. These attributes include metabolic control, chronicity, disease burden, temporal changes, refractoriness, tendency to relapse, and lability. Each concept is independent of the others and has clinical implications of its own. For example, a diabetic patient may have an A1C that meets a metabolic target, but the entire value set could represent a substantial glycemic burden, indicate a tendency to relapse, and demonstrate a steep, upward trend. In our study, a parameter is a variable synthesized from the complete value set for a CM that represents one attribute for that CM. To reach conclusions about metabolic status requires extensive processing of all data for a CM, deriving parameters for each attribute, and testing the association between derived parameters and COVID-19 death. Our computer programs not only derive these parameters but also generate a running summary whenever a new value is added to the set. Thus, the patient’s status relative to any CM can be assessed in detail at any point in time.
METHODS
Cases were identified through VA’s COVID-19 Shared Data Resource (CSDR). Membership in this registry requires at least one positive nucleic acid amplification test. The primary outcome was death within 60 days of the diagnosis. The outcome was retrieved from the CSDR, which assigns a 1 to those who died and 0 otherwise. Data domains in the Corporate Data Warehouse (CDW) were interrogated for all measurements entered ≥ 14 days before the diagnosis of COVID-19. This precaution excludes readings that may have been taken during the pre-symptomatic phases of illness. We retrieved SBP, DBP, O2Sat, heights and weights from the file Vital.VitalSign. The latter were used to calculate BMI. We also retrieved EGFR, ALB, HCT, ALT, A1C, HDL and LDL from the file Chem.PatientLabChem. The following parameters were derived for each of the 11 CMs (above):
a) Metabolic control requires that the measurement be compared to an external standard; that both be standardized to a common scale; that the value is timely, that it has not already been treated, that enough time has elapsed since the last treatment change to reach a plateau value; and that it is not rapidly increasing or decreasing. Metabolic control is reflected in the most recent value (Value1). Our programs provide the option of raw or standardized values and require the user to specify a treatment target. They can be modified to exclude remote values but, for now, include all Value1s in the medical record regardless of their timeliness. No attempt was made to assess prior treatments. Temporal changes are handled by another module.
b) Chronicity is defined as the total number of days or measurements above target. Each reading was paired with the preceding one. Days between successive abnormal readings were considered above goal; days between normal readings as at goal; days between a normal and abnormal reading as worsening; and days between an abnormal and normal reading as improving. Days above goal were summarized across all successive pairs in the patient’s record (FUDaysAboveGoal). NumAboveGoal is the total number of measurements exceeding the target value in the patient’s record.
c) Disease burden is the degree to which an abnormality has caused end-organ damage. For chronic diseases, it is almost always a function of severity and exposure time and expressed as the area under the severity * time curve (AUC). This concept applies to SBP, DBP, BMI, A1c, LDL and LDL. Our software uses a simple trapezoidal estimation technique without interpolation, extrapolation, or curve-fitting and includes all readings from the first to last value. AUC can be large just because the patient is elderly. To eliminate this possibility, our software derives AUC only above the target specified by the user (AbnAUC). Time-weighted averaging eliminates the sampling bias that arises from a tendency to measure a CM more frequently when it is abnormal. The raw average is an over-estimate of severity when abnormal readings are more closely spaced than normal ones. TimeWtAvg is defined as: AUC/FUDays
d) Refractoriness is the repeated failure to achieve goal and is represented by consecutive abnormal values (a “cluster”). Each cluster begins with a change from normal to abnormal value and terminates with the next normal value. The assumption is that an abnormal reading should have triggered an intervention that normalized the measurement. Refractoriness is worse when there are many readings in the cluster, or its duration is long. It can result from the failure to treat, failure of the patient to adhere, failure of the disease to respond, or intolerable side-effects. Our program defines the number of clusters for each patient (NumCluster) and a start date, consecutive days above goal (TimeAboveGoal), and consecutive readings above goal (CtAboveGoal) for each cluster. The least refractory episode has a CtAboveGoal of one, and refractoriness increases as CtAboveGoal increases. TimeAboveGoal and CtAboveGoal are relevant only for patients with an abnormal CM at the time of COVID diagnosis. The term MaxClustDays refers to the longest cluster in the patient’s medical record.
e) Relapses are the occurrence of abnormal values after being at goal. They can occur once or many times. Relapses are often caused by a steady progression in disease severity (like diabetes) which requires multiple titrations of treatment. The number of relapses is represented by NumCluster. The first abnormal cluster is given the value of one – i.e., the person has deteriorated from a previously normal state. Patients with a tendency to relapse have larger values for NumCluster. Relapse and refractoriness have different clinical implications. For example, a person can relapse many times, respond quickly to each intervention, and spend little time above goal. On the other hand, a person with refractoriness can have just one cluster that lasts for years.
f) Lability refers to variations in a CM from one reading to the next. Spontaneous variation can occur when the test is imprecise or when there are fluctuations in the CM itself. Test imprecision may be due to changes in conditions of testing (often behavioral) or variability in the assay. Results of these tests should be confirmed by repeated sampling. On the other hand, physiologic variability results from a complex interplay among multiple control mechanisms and is important in several conditions. For example, labile hypertension is a well-known clinical problem. It has also been shown that hypoglycemia in diabetes results from spontaneous variation in glucose levels and can only rarely be explained by patient behaviors. The coefficient of variation (CoeffVar) for each CM is derived from the grand mean (GrandMean) and standard deviation (GrandStD) for all values in the patient record and given by the expression GrandStD/GrandMean. MeanValDiff refers to the average of the absolute difference between all consecutive pairs.
g) Temporal trends refer to changes over time from one value to another and may be long- or short-term. NetChange is a measure of long-term trend and refers to the difference in the most recent and first value for each CM. The software generates the mean of the 3 lagging values for each measurement (Lag3Mean). Lag3Dev refers to the fractional deviation of the current value from Lag3Mean and is given by the expression: (Value1 – Lag3Mean)/Lag3Mean.
Statistical methods
In summary, we analyzed the following 13 parameters for each CM: Value1, FUDaysAboveGoal, NumAboveGoal, AbnAUC, TimeWtAvg, NumClust, TimeAboveGoal, CtAboveGoal, MaxClustDays, CoeffVar, MeanValDiff, NetChange, and Lag3Dev. These values were extracted for each patient from the running summary associated with the most recent value. The default was to use all available readings in the medical record so that metabolic status was defined over the patient’s lifetime.
Because there are 4 vital signs and 7 laboratory tests of interest, the list of covariates consists of 13 * 11 = 143 items. Group differences in categorical variables were tested by chi-square analysis. Group differences in continuous variables were analyzed by student’s t-test and Mann-Whitney U-test. Because there were so many comparisons, univariate analysis is not included in this report but is available upon request.
For each CM, a subset logistic regression was done to identify which of the 13 parameters was predictive of death. A model was fitted to all patients with complete data. A variable was considered significant if the P-value associated with the coefficient was < 0.05. The subset model was then used to assign a predicted probability of death to each patient. Receiver operating characteristic curve (ROC) analysis was done to determine if that predicted probability discriminated between those who live and died.
All 143 variables were used to generate a main predictive model. Set-wise logistic regression was applied to all patients with a complete data set. The dependent variable was death within 60 days of diagnosis. Modeling started with the set of most recent values. Sets related to different criteria were added in succession, and previous variables were removed if they became insignificant. A variable was considered significant if the P-value associated with its coefficient was < 0.05. The final model was used to assign a predicted probability of death (PDeathLabs) to each patient based upon all CM. Again, ROC analysis was used to determine if PDeathLabs was able to identify those who died.
RESULTS
On September 30, 2021, there were 347,220 COVID-19 patients in VA’s COVID-19 Shared Data Resource. Of these, 329,491 patients (94.9%) had CM performed at least 14 days prior to the COVID-19 diagnosis and form the basis for this report. The mean age at the time of diagnosis was 59.1 ± 16.6 years; 85.5% were male; 23.4% were members of a racial minority; 9.2% were Hispanic; 96.4% were veterans; 0.7% were on supplemental oxygen; and 12.2% were current smokers. 9.3% had been fully vaccinated at least 14 days prior to the COVID-19 diagnosis. 21.6% acquired their infections after July 1, 2021 and were presumed to have the delta variant. Overall, 17,934 patients (5.44%) died within 60 days of their diagnosis.
Subset regressions were done for each CM to identify which of the 13 parameters were determinants of the outcome (Appendix). The model did not converge for ALT. As expected, single CM did not distinguish between surviving and dying patients very well, although some ROCs were close to those of co-morbidity scores. Nevertheless, the number of significant parameters ranged from all 13 for SBP to 7 for HDL. Value1 was significant for 7 CM, FUDaysAboveGoal for 9, NumAboveGoal for 9, AbnAUC for 9, TimeWtAvg for 9, NumClust for 10, TimeAboveGoal for 6, CtAboveGoal for 7, MaxClustDays for 5, CoeffVar for 10, MeanValDiff for 10, NetChange for 9, and Lag3Dev for 7.
239,393 patients (or 70.5% of the cohort) had complete sets of data for developing the main model. Table 1 shows the components of this model. Of 143 candidate predictors, 49 parameters were identified as statistically significant and independent predictors of death. The most influential domains were the most recent value, disease burden, temporal trends, and tendency to relapse. The main model was used to calculate a predicted probability of death (PDeathLabs) for each subject. The ROC area for PDeathLabs was 0.785 ± 0.002. For comparison, single factor logistic models were developed for age at diagnosis, the 2-year Charleson Co-morbidity Index score (Charl2Yrs), lifetime Charleson score (CharlEver), the Elixhauser 2-year score (Elix2Yrs), and lifetime Elixhauser score (ElixEver). Their predicted probabilities were used to derive an ROC area for each. No difference was found in the ROC areas of PDeathLabs and age at diagnosis (0.783 ± 0.002; P > 0.05). However, the ROC area for PDeathLabs was significantly greater than that of Charl2Yrs (0.704 ± 0.002; P < 0.001), CharlEver (0.729 ± 0.002; P < 0.001), Elix2Yrs (0.675 ± 0.002; P < 0.001), and ElixEver (0.707 ± 0.002; P < 0.001). Thus, baseline metabolic measurements outperform co-morbidity scores for pre-existing conditions in predicting COVID-19 deaths.
DISCUSSION
In this manuscript, we propose a novel method for handling pre-existing findings from vital signs and laboratory tests in models predicting COVID-19 death. Our observations confirm that they should be included because they have independent prognostic significance, generate insights into the mechanism of action, and may even be targets for interventions to mitigate the risk. By including explanatory variables, models can become hypothesis-generating and lead to future studies validating the suspected pathogenesis or trials of interventions targeting the abnormality itself.
All 13 parameters were identified as predictors of death in the subset models. These observations provide some justification to the approach that clinicians use to evaluate metabolic status. It is comforting to know that our parameters and the approach used in clinical practice converge upon a common set of criteria. Our main model showed that many parameters besides the most recent value were significant contributors to COVID-19 death. For 2 CM, disease burden contributed to the prediction while the most recent value did not. For 6 others, both current metabolic control and the long-term parameters were independently predictive of death. This finding suggests that CM have chronic and acute effects on the likelihood of recovery from COVID-19 infection and that the former should routinely be considered for prediction models. Our results confirm the importance of CM parameters for COVID-19 prognosis, validate the attributes that they represent, and point out the hazards of using the most recent value as the most representative one.
This study illustrates the complex interactions that become apparent when multiple CM are studied in context. Higher values for O2Value1 and O2TimeWtAvg in our main model independently decreased the risk of death. This observation implies that the metabolic consequences of acute hypoxia and the burden imposed by chronic hypoxia create a substrate for COVID-19 injury. The latter may have done so through secondary pulmonary hypertension or may have simply been a marker for severe lung disease. Of 4 BP parameters, only SBPTimeWtAvg was a risk factor for COVID-19 death. DBPValue1 and DBPTimeWtAvg were protective. This paradoxical effect might be explained by the fact that, for a given level of risk posed by SBP, a higher DBP is associated with a lower pulse pressure, reduced systolic wall stress, and better coronary perfusion. In a previous study, we showed that hypertension is the most important independent risk factor for COVID-19 death (Murata GH, doi pending). This analysis suggests that the most important mechanism is long-term elevation of SBP. BMI and obesity are considered risk factors for COVID-19 death (2, 5). However, we found that higher values for BMIValue1 in the main model lowered the risk of death. This discrepancy might be explained by the fact that we controlled for the mediators of injury in obesity (SBP, DBP, A1C, HDL, and LDL). Once these factors are considered, a higher BMI may only be a marker of better nutrition. This finding illustrates the hazards of drawing conclusions from incompletely specified models. Serum albumin and hematocrit are indicators of nutritional status or catabolism and are powerful predictors of recovery from acute illness. On the other hand, they do not produce cumulative organ injury over years. Accordingly, ALBValue1 and HCTValue1 were included in the main model but not their long-term parameters. A high value for A1CValue1 was identified as a risk factor while A1CTimeWtAvg and A1CAbnAUC were not. Thus, recent hyperglycemia is deleterious in COVID-19 infection but microvascular injury developing over decades may not play that much of a role. Likewise, HDLValue1 had a protective effect while the long-term parameters did not. Finally, we found that time-weighted averages and abnormal AUC were both included as measures of disease burden for EGFR, ALT, LDL, and O2SAT. For the last 3, the effects were in opposite directions. These results imply that abnormal AUC provides information about risk beyond the time weighted average. This study shows that there is no justification for selecting one arbitrary parameter of a CM for modeling over the others unless there is a physiologic basis for doing so.
We also found that temporal instability in a CM – whether reflected in long- or short-term trends, coefficient of variation, or tendency to relapse – had significant effects on prognosis. The direction and magnitude of this effect varied from CM to CM and, at times, were unexpected. Future studies should be done to identify the mechanisms by which fluctuations in CM affect recovery from COVID-19 infection. Until then, it is reasonable to include temporal changes of CM in models of COVID-19 death. This recommendation is consistent with the way that clinicians prioritize treatment of abnormal values. For example, many would consider a patient whose A1c rapidly increased from 5.0 to 8.5 to have a poorer prognosis than one whose A1c decreased from 11.0 to 9.0.
As expected, measures of chronicity added little to the model once disease burden was considered. The minimal contribution from refractoriness suggests that clustering adds very little to the prognosis compared to the same abnormalities scattered over the patient’s timeline. Nevertheless, it remains a useful clinical concept because it identifies patients for whom further treatment may be futile.
We calculated a predicted probability of death (PDeathLabs) based upon complete value sets for 11 CM. The ROC area indicated that vital signs and laboratory findings are powerful predictors of outcomes. In fact, the ROC areas for PDeathLabs and age at diagnosis were equivalent, while PDeathLabs was consistently superior to Charl2Yrs, CharlEver, Elix2Yrs, and ElixEver. In summary, this term was a very convenient way to summarize millions of observations for CM done on hundreds of thousands of patients. Its greatest utility would be as a covariate in a parent model containing many other domains.
Of course, our conclusions are limited to patients with characteristics like the veteran population. Moreover, this study was confined to patients with chronic illnesses that required long-term follow-up and periodic health screening. It is unclear if our conclusions are applicable to patients without such indications. Further studies should be done on other populations and disease states before the method should be widely applied. If validated by others, our method could revolutionize the way in which clinic measurements are handled in multivariate models.
Data Availability
All data produced in the present study are available upon reasonable request to the authors.