## Abstract

Given a lifetime risk of ~90% by the ninth decade of life, it is unknown if there are true controls for hypertension in epidemiological and genetic studies. Here, we use the Bayesian framework to compare logistic and time-to-event approaches to modeling hypertension. Using a proportional hazards model, we explored nonparametric and parametric models of the baseline hazard function, accounting for interval censoring. In the Howard University Family Study (HUFS), a population-based study of African Americans from Washington, D.C., the median age at hypertension was 48 years, baseline hazard rates increased with age until 55 years, and the probability of being free of hypertension at 85 years of age was 12.2%. In the nationwide NHANES study, the median age at hypertension was 42 years in African Americans, in contrast to 57 years in European Americans and 56 years in Mexican Americans. Baseline hazard rates increased with age until 58 years in African Americans, comparable to 60 years in European Americans and 58 years in Mexican Americans. The probability of being free of hypertension at 85 years of age was 8.4% in African Americans, in contrast to 21.4% in European Americans and 20.6% in Mexican Americans. In all four groups, baseline hazard rates decreased but did not reach zero, consistent with the nonexistence of controls. Model fits were comparable for a proportional hazards model based on gamma-distributed hazard rates under a correlated prior process and a logistic model adjusted for age and age^{2}. Last, using an agnostic model screening approach of 38 potential covariates, we identified and replicated in all groups a model that included chloride, low-density lipoprotein cholesterol, uric acid, and weight. With age and age^{2}, the variance explained by these covariates was 40.9% in African Americans, 34.8% in European Americans, and 28.3% in Mexican Americans. Taken together, modeling of the baseline hazard function of hypertension suggests that there are no true controls and that controls in logistic regression are cases with a late age of onset. These findings shed considerable insights into the design of genetic and epidemiological studies of hypertension with implications for ethnic health disparities.

## Introduction

Hypertension, or abnormally high blood pressure, is common in the US, affecting approximately 45% of adults (Centers for Disease Control and Prevention, 2018). Hypertension is a risk factor for heart disease and stroke and causes or contributes to nearly half a million deaths a year (Centers for Disease Control and Prevention, 2019). Globally, an estimated 1.13 billion people have hypertension, and less than 20% of these people have their blood pressure under control (World Health Organization, 2019).

Systolic blood pressure (SBP) has a general tendency to increase linearly with age, across sexes and ethnic groups (Burt et al., 1995). Diastolic blood pressure (DBP) has a general tendency to increase until the end of the fifth decade of life, after which DBP either stabilizes or decreases, again across sexes and ethnic groups (Burt et al., 1995). In the Framingham Heart Study, an individual who is normotensive at 55 to 65 years of age has an 80% to 90% residual lifetime risk of developing hypertension, adjusted for competing causes of mortality (Vasan et al., 2002). Compared to age-matched European Americans, hypertension in African Americans develops at an earlier age and is more prevalent (Chobanian et al., 2003; Cooper, Liao, & Rotimi, 1996; Mozaffarian et al., 2016).

A common approach in genetic epidemiology studies of hypertension involves coding the outcome as a binary variable representing cases and controls and proceeds with logistic regression. Given that the lifetime risk is so high, we first investigated whether a proportional hazards model in time-to-event analysis yields a better fit than logistic regression. Second, as time-to-event analysis assumes that the event will occur, *i.e*., that every individual will become hypertensive if they live long enough, we investigated a proportional hazards model including a fraction of individuals that will never become hypertensive and hence are true epidemiological controls. Third, using an agnostic model screening approach, we explored the issue of what covariates to include and how much variance they explain. We performed these analyses in an observational study of African Americans and then replicated and extended our findings in a nationally representative study of African Americans, European Americans, and Mexican Americans.

## Materials and Methods

### Discovery study

The Howard University Family Study (HUFS) is a population-based observational study of African American families and unrelated individuals from Washington, D.C. (Adeyemo et al., 2009). Ethics approval for the Howard University Family Study (HUFS) was obtained from the Howard University Institutional Review Board and written informed consent was obtained from each participant. All clinical investigation was conducted according to the principles expressed in the Declaration of Helsinki. Families and individuals were not ascertained based on any phenotype. Weight was measured on an electronic scale to the nearest 0.1 kg. Height was measured on a stadiometer to the nearest 0.1 cm. Body mass index (BMI) was calculated as weight divided by the square of height (kg/m^{2}). Waist circumference was measured to the nearest 0.1 cm at the narrowest part of the torso. Hip circumference was measured to the nearest 0.1 cm at the widest part of the buttocks. The waist-hip ratio was calculated as waist circumference in cm divided by hip circumference in cm. Fat mass and fat-free mass were estimated using bioelectrical impedance analysis with a validated population-specific equation as previously described (Luke et al., 1997). Percent fat mass was defined as fat mass divided by weight ×100. Blood pressure was measured while seated using an oscillometric device (Omron Healthcare, Inc., Bannockburn, Illinois). Three readings were taken at 10-minute intervals. Reported readings were the averages of the second and third readings. Hypertension was defined as SBP ≥ 140 mm Hg, DBP ≥ 90 mm Hg, or treatment with anti-hypertensive medication. Blood was drawn after an overnight fast of at least 8 hours and all collected samples were stored at -80°C pending biochemical assay. Creatinine, total cholesterol, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, triglycerides, fructosamine, glucose, alkaline phosphatase, alanine aminotransferase, total bilirubin, sodium, potassium, chloride, calcium, uric acid, urea, C-reactive protein, albumin, bicarbonate, and total protein were measured using COBAS INTEGRA tests (Roche Diagnostics, Indianapolis, Indiana). Cortisol and insulin were measured using Elecsys assays (Roche Diagnostics). Creatinine clearance was calculated using the Cockcroft-Gault equation and the estimated glomerular filtration rate (eGFR) was calculated using the 4-variable Modification of Diet in Renal Disease Study equation (National Kidney Foundation, 2002; Levey et al., 1999). T2D case status was defined as fasting plasma glucose level ≥ 126 mg/dL or treatment with anti-diabetic medication. T2D control status was defined as fasting plasma glucose ≤ 100 mg/dL and no treatment with anti-diabetic medication. The last completed grade of education and income were self-reported on a questionnaire. The proportions of African and European ancestry were estimated as described previously (Shriner, Adeyemo, & Rotimi, 2011). We extracted a subset of 1,014 unrelated individuals.

### Bayesian logistic regression and time-to-event analysis

Let *T* represent the time of an event and *S* (*t*) = Pr (*T > t*) represent the survival function, *i.e*., the probability of being event-free as a function of observation time *t*. With cross-sectional data, there is a single observation for each individual. If the individual has not yet experienced the event, then the event time is right censored, because the event is presumed to occur some unknown time after observation. If the individual has already experienced the event, then the event time is left censored, because the event occurred at some unknown time prior to observation. The combination of left and right censored data is known as interval censored data. We performed interval censored proportional hazards analysis using the R package icenReg.

To perform Bayesian modeling, we used WinBUGS, version 1.4 with the R package R2WinBUGS. We modeled logistic regression as:

In this reduced model, the prior distribution for the intercept *α* follows a diffuse normal distribution with mean 0 and variance 10^{6}. We ran three chains of 10,000 iterations, with a burn-in of 2,500 iterations and thinning of 10, yielding a posterior sample based on 2,250 iterations. We assessed convergence using the potential scale reduction factor Rhat, which should equal 1 at convergence for all monitored parameters. We assessed model fit using the deviance information criterion (DIC). We then added age to the reduced model and ran three chains of 10,000 iterations, with a burn-in of 2,500 iterations and thinning of 10, yielding a posterior sample of 2,250 iterations. Next, we added age and age^{2} to the reduced model and ran three chains of 100,000 iterations, with a burn-in of 10,000 and thinning of 50, yielding a posterior sample of 5,400 iterations. In all instances, effect sizes followed a diffuse normal prior distribution with mean 0 and variance 10^{6}.

We performed time-to-event analysis using a proportional hazards model (Congdon, 2003). The hazard function *h*(*t*) defines the instantaneous risk of the event at time *t*, conditional on being event-free at that time:

Given the hazard function *h*(*t*), the cumulative hazard function *H*(*t*) and the survival function *S*(*t*) are defined as follows:

In Cox’s proportional hazards model (Cox, 1972), the hazard function *λ*(*t*| *z*) is given by *λ*_{0}(*t*)*e ^{βz}*, in which

*λ*

_{0}(

*t*) is the baseline hazard function and z is a covariate with coefficient p. In the absence of covariates, the hazard function is equivalent to the baseline hazard function,

*λ*(

*t*)

*= λ*

_{0}(

*t*). We modeled the hazard rates as gamma-distributed with a correlated prior process (Arjas & Gasbarra, 1994; Congdon, 2006). Specifically, the hazard rates were:

In this model, the expected value of *λ*_{0}(*t+*1) equals *λ*_{0}(*t*) and hence the hazard function is a martingale. We divided time into intervals of one year and assumed that the baseline hazard was constant within intervals.

A major assumption of time-to-event analysis is that the event will occur for every individual at some point in time, although the individual may die before the event would have occurred. This assumption can be relaxed by incorporating a permanent stayer or cured fraction, *i.e*., a fraction of individuals who will never experience the event. In our context, this fraction represents individuals who never develop hypertension and hence are true epidemiological controls. Let *π* represent the permanent stayer fraction. Then, the survival function for the entire population *S _{p}*(

*t*) is the two-component mixture model

*S*(

_{p}*t*) =

*π*+(1

*−π*)

*S*(

*t*) (Gu, Sinha, & Banerjee, 2011). We assigned to

*π*the prior distribution Uniform(0,1).

### Model selection

We tested 38 covariates for inclusion in the model: sex; weight, height, hip circumference, waist circumference, waist-hip ratio, body mass index; fat mass, fat-free mass, percent fat mass; type 2 diabetes status, fasting glucose, fasting insulin, fructosamine; triglycerides, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, total cholesterol; creatinine, creatinine clearance, estimated glomerular filtration rate; alkaline phosphatase, alanine aminotransferase, total bilirubin; last grade of education completed, income; percent African ancestry; calcium, chloride, potassium, sodium; albumin, carbon dioxide, C-reactive protein, total protein, uric acid, urea, and cortisol. We used a four-step forward-backward regression procedure to perform model selection. First, we fit a single regression model for each covariate. Second, we fit a multiple regression model with all significant predictors from step 1 and used backward selection to remove nonsignificant predictors. Third, starting with the final model from step 2, we reconsidered each nonsignificant covariate from step 1 using forward selection. Fourth, we performed a final pruning step on the final model from step 3. For every test, we declared a significance level of 0.05. Pseudo-*r*^{2} values were estimated using the formula , in which *D*_{1} is the deviance, *D*_{0} is the null deviance, and *n* is the sample size (Nagelkerke, 1991). Finally, we added age, age^{2}, and selected covariates to the Bayesian logistic regression model described above and ran three chains of 1,000,000 iterations, with a burn-in of 100,000 iterations and thinning of 1,000, yielding a posterior sample of 2,700 iterations. Effect sizes for all covariates followed a diffuse normal prior distribution with mean 0 and variance 10^{6}.

### Replication study

The National Center for Health Statistics of the US Centers for Disease Control and Prevention conducts the ongoing National Health and Nutrition Examination Survey (NHANES). The survey comprises an in-home interview and a clinical examination by a mobile examination center. We retrieved 16 years of examination data (from 1999 to 2014) from the CDC portal (http://www.n.cdc.gov/Nchs/Nhanes). We downloaded the variables BMXWT, BPQ020, BPQ040A, BPXDI1, BPXDI2, BPXDI3, BPXDI4, BPXSY1, BPXSY2, BPXSY3, BPXSY4, LBDLDL, LBXIN, LBXSCLSI, LBXSKSI, LBXSUA, RIDAGEEX, RIDRETH1, SDVMVPSU, SDMVSTRA, and WTSAF2YR. SBP was defined as the average of BPXSY1, BPXSY2, BPXSY3, and BPXSY4. DBP was defined as the average of BPXDI1, BPXDI2, BPXDI3, and BPXDI4. Hypertension was defined as SBP ≥ 140 mm Hg, DBP ≥ 90 mm Hg, treatment with anti-hypertensive medication, or having ever been diagnosed by a doctor. Strata, clusters, and weights were designed to make statistical estimates representative of the noninstitutionalized, civilian US population. We included the strata variable SDMVSTRA as a factor. Clusters were nested within strata but were ultimately omitted because the data were too sparse after including only those individuals with fasting samples. To account for eight survey cycles, we multiplied the weights WTSAF2YR equally by 1/8. For each value of RIDRETH, we rescaled the weights by dividing by the mean. For the survey cycle 2013–2014, we recalibrated insulin to account for changes in the protocol: (https://www.n.cdc.gov/Nchs/Nhanes/2013-2014/INS_H.htm). Across the eight survey cycles, we retrieved data for a total of 23,628 participants, including 5,146 African Americans (“non-Hispanic Blacks”), 10,023 European Americans (“non-Hispanic Whites”), and 5,059 Mexican Americans.

### Code Availability

WinBUGS code is available at https:// github.com/dshriner/Time-to-event.

## Results

### Time-to-event analysis

Across African Americans, European Americans, and Mexican Americans, median systolic blood pressure increased with age whereas median diastolic blood pressure increased and then decreased (Figure 1). In time-to-event analysis, the probability of not having hypertension decreased across the entire age range (Figure 2). However, in all three groups, there was an inflection point in middle age after which the probability of having hypertension increased at a slower rate (Figure 2). Despite this slowdown, the probability of not having hypertension was not zero, even by the middle of the ninth decade (Figure 2). In HUFS, by 85 years of age, 12.2% (95% credible interval (CI) [5.1%, 20.6%]) of African Americans remained free of hypertension (Figure 2). In NHANES, by 85 years of age, 8.4% (95% CI [5.4%, 11.6%]) of African Americans, 21.4% (95% CI [18.1%, 24.6%]) of European Americans, and 20.6% (95% CI [13.6%, 27.3%]) of Mexican Americans remained free of hypertension (Figure 2). The median age at which hypertension occurred was 48 (95% CI [45, 50]) years for African Americans in HUFS and 42 (95% CI [40, 44]) years for African Americans, 57 (95% CI [55, 59]) years for European Americans, and 56 (95% CI [54, 58]) years for Mexican Americans in NHANES (Figure 2).

We next investigated seven distributions, six parametric and one nonparametric, for the baseline hazard function in a proportional hazards model. Generalized gamma, loglogistic, and exponential distributions all yielded poor fits to the HUFS data (Figure 3). Log-normal, gamma, and Weibull distributions all yielded good fits to the HUFS data (Figure 3), with the log-normal distribution yielding the highest likelihood of the six parametric distributions (Table 1). Overall, a nonparametric hazard function yielded the highest likelihood (Table 1). However, we could not perform model comparison between the log-normal distribution of the baseline hazard function and the nonparametric hazard function in the frequentist framework because it is unclear how to measure the dimensionality of the nonparametric hazard function.

Based on a Bayesian model in which the hazard rates were gamma distributed with a correlated prior process, we estimated the underlying hazard rates. Across African Americans, European Americans, and Mexican Americans, hazard rates increased and then decreased (Figure 4). The initial hazard rate for African Americans (0.0123 (95% highest posterior density interval [0.0091, 0.0161])) was larger than the initial hazard rate for both European Americans (0.0028 (95% highest posterior density interval [0.0013, 0.0044])) and Mexican Americans (0.0036 (95% highest posterior density interval [0.0019, 0.0052])). The maximum hazard rate for African Americans (0.0630 (95% highest posterior density interval [0.0341, 0.0940])) was trending larger than the maximum hazard rate for European Americans (0.0368 (95% highest posterior density interval [0.0200, 0.0546])) and Mexican Americans (0.0417 (95% highest posterior density interval [0.0219, 0.0635])). The age of maximum hazard was 55 (95% CI [43, 72]) years in HUFS and 58 (95% CI [48, 67]) years in African Americans, 60 (95% CI [52, 70]) years in European Americans, and 58 (95% CI [48, 69]) years in Mexican Americans in NHANES (Figure 4).

### Proportional Hazards vs. Logistic Models

Using the HUFS data, we found that logistic regression yielded a better fit than the proportional hazards model based on the DIC, provided that both age and age^{2} were included as covariates in the logistic regression model (Table 2). Using logistic regression, the addition of age to the reduced (intercept-only) model resulted in a substantially lower DIC (Table 2), with a linear effect of age explaining 27.8% of the variance at the cost of one additional parameter. The addition of age^{2} further decreased the DIC (Table 2), explaining an additional 0.8% of the variance at the cost of one additional parameter. With smoothing, the effective dimensionality of the proportional hazards model was 3.3, comparable to the dimensionality of 3.0 for the logistic model adjusted for age and age^{2} (Table 2). We also found that inclusion of a permanent stayer fraction increased the DIC of the proportional hazards model, indicating that inclusion of a permanent stayer fraction was not supported (Table 2).

### Model selection

We used forward-backward regression to perform model selection on a set of 38 potential covariates. In the HUFS data set, the final model included six of these covariates: chloride, insulin, low-density lipoprotein cholesterol, potassium, uric acid, and weight. Compared to the logistic model with age and age^{2}, these six covariates improved the fit (*p* = 2.01×10^{−15}) and explained an additional 8.1% of the variance, for a total of 36.7% of variance explained. Each of these six covariates were replicated (p-values from 2.81×10^{−2} to 1.83×10^{−12}) and directionally consistent in NHANES African Americans (Table 3). Furthermore, chloride, low-density lipoprotein cholesterol, potassium, uric acid, and weight, but not insulin, were significant covariates in NHANES European Americans, whereas chloride, insulin, low-density lipoprotein cholesterol, uric acid, and weight, but not potassium, were significant covariates in NHANES Mexican Americans (Table 4). In NHANES, the variance explained by age, age^{2}, chloride, low-density lipoprotein cholesterol, uric acid, and weight was 40.9% in African Americans, 34.8% in European Americans, and 28.3% in Mexican Americans.

We further investigated two unexpected results of the model selection. One, sex was not selected in the final model. Of the selected covariates, three showed sex dimorphisms in HUFS by Welch’s t-test, with uric acid and weight higher in males and low-density lipoprotein cholesterol higher in females (Table 5). Second, across African Americans, European Americans, and Mexican Americans, increasing low-density lipoprotein cholesterol was associated with decreased risk of hypertension (Tables 3 and 4). As the direction of this effect was unexpected, we reanalyzed the HUFS African Americans accounting for lipids medications. Self-reported use of any lipid medication (coded as yes/no) was associated with increased risk of hypertension (*p* = 1.25×10^{−3}) but increasing low-density lipoprotein cholesterol remained associated with decreased risk of hypertension (p = 4.46×10^{−2}). Of the individuals in this sub-analysis, 8.3% reported use of any lipid medication, all instances of which involved statins, so drug class was not a confounder.

To investigate the possibility of time-dependent covariates, we added interaction terms to the full logistic model (with age, age^{2} and the selected covariates). For each of the selected covariates, the Akaike information criterion was larger for the model containing a term that interacted with age (Table 6). Thus, the evidence does not support time-dependence for any of the selected covariates.

### 2017 revised classification of hypertension

We reanalyzed the HUFS data based on the 2017 reclassification of hypertension as SBP ≥ 130 mm Hg or DBP ≥ 80 mm Hg (Whelton et al., 2018). Under these more stringent thresholds, the prevalence of hypertension increased from 48.3% to 66.1%, the median time to hypertension decreased to 36 years (95% CI [33, 39]), the age at maximum hazard increased to 57 (95% CI [33, 72]) years, and the lifetime risk at 85 years of age increased to 95.8% (95% CI [90.3%, 99.2%]). The 95% highest posterior density interval of the permanent stayer fraction included zero.

## Discussion

Time-to-event analysis of hypertension with a primary focus on modeling the baseline hazard function recapitulated three known health disparities. One, the baseline hazard rate at the earliest ages were higher in African Americans compared to European Americans and Mexican Americans. Two, the median age when hypertension occurred was approximately a decade earlier for African Americans compared to European Americans and Mexican Americans. Three, by the middle of the ninth decade of life, the probability of African Americans remaining free of hypertension was less than half the probability for European Americans and Mexican Americans.

The Bayesian model of the hazard rates revealed an inflection point, consistent with the lognormal parametric distribution yielding the best fit among the parametric distributions we explored. A decrease in hazard rates implies that the number of events decreases or that the number of individuals at risk increases. We suggest that the simplest explanation for the inflection point in hazard rates is the change in trajectory of DBP, with the decrease in DBP leading to a decreased hazard of hypertension and consequently a reduced number of events. We cannot rule out the possibility that there are individuals who are at less risk, although the finding that the hazard rate does not reach zero (by 85 years of age) indicates the continued presence of risk. It is also possible that mortality due to comorbidities begins to subside by around 60 years of age, leading to reduced hazard rates of hypertension in older ages.

To enable model comparison between logistic regression and proportional hazards analysis, we used the DIC, which is based on an estimate of dimensionality. The existence of the inflection point in the baseline hazard function implies an approximately quadratic effect of age, which in logistic regression is captured by an age^{2} term with a negative regression coefficient. Our results indicated that the proportional hazards model we used, which incorporated smoothing to prevent over-parameterization, and logistic regression with age and age^{2}, provide comparable fits.

In addition to non-zero hazard rates, evidence for the lack of existence of true controls comes from the modeling of a permanent stayer fraction. We found that the permanent stayer fraction essentially corresponded to the fraction of individuals who had not yet become hypertensive at the last observed age. As the permanent stayer fraction did not capture any additional information, models including the permanent stayer fraction had a worse fit compared to models without that additional parameter. A major implication for proportional hazards analysis is that the assumption that the event will occur for every individual at some point in time, unless the individual dies first, is valid. A major implication for logistic regression is that controls should be interpreted as individuals that have not yet become hypertensive, rather than as individuals who will not become hypertensive.

Traditional risk factors for hypertension include excess body weight, excess dietary sodium intake, reduced physical activity, deficiency of potassium, and excess alcohol intake (Chobanian et al., 2003). Our results have three implications regarding these risk factors. One, we confirmed that excess weight and low potassium, but not excess sodium, were risk factors in our multiple regression model. It is possible that the inclusion of chloride and/or potassium could account for the effects of sodium. Two, longitudinal analysis of 30 years of follow-up in the Framingham Heart Study showed that the incidence rate of hypertension increased faster in females than in males, with females having lower incidence under 50 years of age and higher incidence over 50 years of age (Dannenberg, Garrison, & Kannel, 1988). In contrast, we found that sex was not a significant covariate, although low-density lipoprotein cholesterol, uric acid, and weight showed sex dimorphisms. Three, we did not have data on birth weight, but our findings regarding hazardrates at early ages are consistent with low birth weight being a risk factor for hypertension (Lackland, Egan, & Ferguson, 2003) and African Americans having lower birth weight than European Americans (David & Collins, 1997).

Low-density lipoprotein cholesterol has been reported to be positively associated with hypertension, but this association generally does not remain significant after covariate adjustment (Haffner, Ferrannini, Hazuda, & Stern, 1992; Laaksonen et al., 2008; Otsuka et al., 2016; Sesso, Buring, Chown, Ridker, & Gaziano, 2005; Wildman et al., 2004). In contrast, we found that low-density lipoprotein cholesterol was negatively associated with hypertension, across African Americans, European Americans, and Mexican Americans. The explanation for this discrepancy is unclear, but we present evidence against three possibilities. We obtained a negative association in both single and multiple regression models, suggesting that the opposite direction of effect was not due to other (known) covariates. We also found that the opposite direction of effect was not due to a time-dependent effect. Furthermore, the opposite direction of effect was not due to confounding by self-reported use of medication (*i.e*., statins).

Uric acid is associated with hypertension, but whether this association is causal remains unestablished. Mendelian randomization (MR) studies have provided conflicting evidence regarding the causality of uric acid for hypertension, with evidence for no effect (Palmer et al., 2013), protection (Sedaghat et al., 2014), and risk (Parsa et al., 2012). MR studies have reported that uric acid is not causal for adiposity, chronic kidney disease, triglycerides, type 2 diabetes, or obesity (Jordan et al., 2019; Kleber et al., 2015; Lyngdoh et al., 2012; Rasheed, Hughes, Flynn, &Merriman, 2014). In contrast, higher adiposity, higher body mass index, lower high-density lipoprotein cholesterol, and higher triglycerides are causally associated with increased uric acid (Lyngdoh et al., 2012; Palmer et al., 2013; Rasheed et al., 2014; Yu, Chen, Huang, & Zeng, 2019). The association of uric acid with hypertension may reflect pleiotropy through linked metabolic pathways, perhaps those involving lipid metabolism (Li et al., 2019).

The African Americans in HUFS were all recruited and enrolled in Washington, D.C. The fact that neither the last completed grade of education nor income were significant predictors during model selection may reflect homogeneity of study participants within one city. In contrast, the African Americans in NHANES were recruited nationwide. All covariates that were significant during model selection in HUFS replicated in NHANES African Americans, indicating that the findings in HUFS were generalizable to the national level.

In summary, by Bayesian modeling of the baseline hazard function in time-to-event analysis of hypertension, we found that logistic regression, if adjusted for both linear and quadratic effects of age, yielded a fit comparable to proportional hazards regression. We found no evidence to support the existence of true controls, suggesting that if an individual lives long enough, hypertension is inevitable. Finally, we found that the combination of chloride, low-density lipoprotein cholesterol, uric acid, and weight, in addition to age and age^{2}, accounted for 40.9% of the variance of hypertension in African Americans, 34.8% in European Americans, and 28.3% in Mexican Americans, simultaneously consistent with common risk factors among the groups and heterogeneity across the groups.

## Competing Interests

The authors declare no competing interests.

## Data Availability

HUFS data are available upon request from the corresponding author. NHANES data are freely available at http://wwwn.cdc.gov/Nchs/Nhanes.

## Acknowledgements

The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official view of the National Institutes of Health. This research was supported by the Intramural Research Program of the Center for Research on Genomics and Global Health (CRGGH). The CRGGH is supported by the National Human Genome Research Institute, the National Institute of Diabetes and Digestive and Kidney Diseases, the Center for Information Technology, and the Office of the Director at the National Institutes of Health (1ZIAHG200362).