COVID-19 Surveillance in the Biobank at the Colorado Center for Personalized Medicine

Background: Characterizing the experience and impact of the COVID-19 pandemic among various populations remains challenging due to the limitations inherent in common data sources such as the electronic health record (EHR) or convenience sample surveys. Objective: To describe testing behaviors, symptoms, impact, vaccination status and case ascertainment during the COVID-19 pandemic using integrated data sources. Methods: In summer 2020 and 2021, we surveyed participants enrolled in the Biobank at the Colorado Center for Personalized Medicine (CCPM, N = 180,599) about their experience with COVID-19. Prevalence of testing, symptoms, and the impacts of COVID-19 on employment, family life, and physical and mental health were calculated overall and by demographic categories. Using the Electronic Health Record (EHR), we compared COVID-19 case ascertainment and characteristics in the EHR versus the survey. Results: Of the 25,063 survey respondents (13.9%), 42.5% had been tested for COVID-19 and of those, 12.8% tested positive. Nearly half of those tested had symptoms and/or had been exposed to someone who was infected. Young adults (18-29 years) and Hispanics were more likely to have positive tests compared to older adults and persons of other racial/ethnic groups. Mental health (54.6%) and family life (48.8%) were most negatively affected by the pandemic and more so among younger groups and women; negative impacts on employment were more commonly reported among Black respondents. After integration with EHR data up to the time of the survey completion, 4.0% of survey respondents (n=1,006) had discordant COVID-19 case status between the EHR and the survey. Using all longitudinal EHR and survey data, we identified 11,472 COVID-positive cases among Biobank participants (6.4%). In comparison to COVID-19 cases identified through the survey, EHR-identified cases were younger and more likely to be Hispanic. Conclusions: Integrated data assets such as the Biobank at the CCPM are key resources for population health monitoring in response to public health emergencies, such as the COVID-19 pandemic.


INTRODUCTION
Survey respondents who reported receiving a positive COVID-19 test result were 114 considered a "confirmed case" of COVID-19. Self-reported cases also reported whether the 115 respondent tested positive for COVID-19, saw a doctor in-person or through telehealth, visited 116 the emergency room, were hospitalized overnight, stayed home/isolated, or did nothing different. 117 We looked at severity either in terms of hospitalization due to COVID-19 or death after COVID- 118 19. Respondents who reported having one or more overnight stays in the hospital were 119 considered to be 'hospitalized'. 120 Positive cases were identified in the EHR using ICD-10 diagnosis codes, healthcare 121 encounter types, and encounter primary diagnoses. Participants who received an ICD-10 122 diagnosis code of U07.1 or at least one of 11 COVID-19 specific encounter primary diagnoses 123 (Multimedia Appendix 4) were considered an "EHR-confirmed case". Participants who were 124 hospitalized in a UCHealth hospital overnight during the 3 days before or up to 21 days after 125 their COVID-19 diagnosis date and who had at least one of 64 COVID-19 related encounter 126 primary diagnoses (Multimedia Appendix 5) were considered to be "EHR-hospitalized." To 127 compare positive cases identified from the EHR and survey, we examined the number of 128 hospitalized cases that were discordant between these data sources. 129 All-cause mortality data stored in the Health Data Compass clinical data warehouse 130 include the cause of death as certified by a physician or coroner/medical examiner, related ICD-131 10 cause of death codes generated by Centers for Disease Control, and age at death. These data Health and the Environment (CDPHE). Accounting for the ~3-month lag time to register 135 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 21, 2022. ;https://doi.org/10.1101https://doi.org/10. /2022 certificates, map ICD-10 cause of death codes, and update the clinical databases, the 136 ascertainment of mortality among UCHealth patients for this analysis is nearly 95% complete.  Statistical Analysis 143 We generated descriptive statistics to characterize our study population and responses to 144 survey questions. We also stratified respondents with respect to COVID-19 infection status 145 based on reported test status and symptomology. We compared COVID-19+ individuals that 146 were identified via the survey and via the EHR by demographics and severity (overnight 147 hospitalization and death). We investigated case status and hospitalization misclassification in 148 both the survey and EHR by comparing those who were discordant in the survey and EHR. We 149 calculated differences between groups using chi-square and t-test statistics for categorical and 150 continuous measures, respectively. As expected, due to very large sample size in the study, most 151 comparisons were statistically significant at a two-sided alpha < 0.05. Therefore, we focus results 152 and interpretation on effect sizes and corresponding standard error of the estimate.

169
The most common reasons for testing were having symptoms (29.5%), exposure to someone 170 who tested positive for COVID-19 (18.5%), doctor recommendation (14.7%), requirement of 171 employer (8.9%), and recent international travel (3.4%). An additional 40.8% of individuals 172 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 21, 2022. ;https://doi.org/10.1101https://doi.org/10. /2022 tested, reported 'other' reasons for testing that included having surgery or other medical 173 procedure, planned travel, desire or need to be around large groups or family members and work-174 site offerings for testing.

175
Of those tested, 1366 (12.8%) tested positive for COVID-19 (Table 1)  to a household member who tested positive for COVID-19 were different across the three groups 179 of those who tested positive, tested negative, and were not tested (all P<.001). Young adults 180 (ages 18-29 years) were overrepresented among the tested-positive group, representing 10.7% of 181 those who tested positive compared to 6.7% of those who tested negative and 5.1% of those who 182 were not tested (P for trend <.001). Similarly, individuals of Hispanic race-ethnicity were 183 overrepresented in the tested-positive group at 9.2%, compared to 5.7% of those who tested 184 negative and 4.3% of those who were not tested. Individuals who tested positive were also more 185 likely to report symptoms, household exposure to COVID-19 and poor health status (Table 1,  CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 21, 2022. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 21, 2022.

192
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 21, 2022. ; https://doi.org/10. 1101 individuals had at least one of the following COVID-19 related symptoms since February 2020:

211
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 21, 2022.  Comparing asymptomatic cases compared to symptomatic cases (at least one symptom) by C) 217 age, D) sex, E) race/ethnicity. P = p-value from Pearson's chi-square test for different 218 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 21, 2022.   Among respondents who were not tested but reported having at least 1 COVID-related 237 symptom, 1901 (41.9%) said they did nothing different, whereas 1920 (42.3%) stayed home and 238 self-isolated ( Figure 3B). A third (n = 1,515) sought out at least one form of medical care, 934 239 (20.6%) had an in-person clinic visit, 77 (17.1%) had a telehealth clinic visit, 275 (6.1%) went to 240 the ER, and 90 (2.0%) had an overnight stay in the hospital ( Figure 3B).

241
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 21, 2022. ; https://doi.org/10. 1101 The impact of the COVID-19 pandemic on employment, family life, mental health or 251 physical health was largely negative, with more than 75% of respondents (n=18,861) reporting a 252 negative impact from the COVID-19 pandemic in at least one of these domains, compared to 253 23% of respondents (n=5,856) reporting a positive impact in at least one domain (P<.001).

254
Mental health and family life were most negatively affected by the pandemic, at 54.6% 255 (n=13,688) and 48.8% (n=12,233) of respondents reporting a negative impact, respectively.

258
The impact of the COVID-19 pandemic was not equal across groups by age, race-  Figure 4D).

273
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  Figure 5A). Women were slightly less likely 286 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 21, 2022. ; https://doi.org/10.1101/2022.02.15.22271018 doi: medRxiv preprint to receive a vaccine (4.5% of women vs. 3.3% of men; P = 0.003, Figure 5B). Vaccination rate 287 was very similar across race/ethnicity categories, with 4.1% of non-Hispanic Whites, 4.3% of 288 non-Hispanic Blacks, 4.1% of Hispanics, and 3.3% of those in the other race category not 289 receiving vaccines (P = 0.8, Figure 5C). The median income of the home 3-digit zip code was CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Figure 5: Vaccine uptake by A) age, B) sex, and C) race-ethnicity. P = p-value from 296
Pearson's chi-square test for different distributions across impact and demographic groups. Error 297 bars indicate the 95% confidence interval for the percent point estimate.  In comparing COVID-19 cases from the EHR to those in the survey (Figure 7), we found 309 that cases identified in the EHR were younger, with 17.2% of individuals in the 18-29 age group 310 compared to 10.7% in the survey group (P<.001, Figure 7A). A higher percentage of cases 311 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 21, 2022. ; https://doi.org/10.1101/2022.02.15.22271018 doi: medRxiv preprint identified in the EHR were Hispanic compared to survey cases (14.7% vs 9.2%, respectively, 312 P<.001, Figure 7B). The EHR cases also had a slightly lower proportion of women (61.9%) 313 compared to the survey group (66.0%); (P = 0.003, Figure 7C). The median income for the 3- CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 21, 2022. ;https://doi.org/10.1101https://doi.org/10. /2022 321 from Pearson's chi-square test for different distributions across impact and demographic groups.

322
Error bars indicate the 95% confidence interval for the percent point estimate.

325
A higher percent of COVID-positive cases identified from the survey were hospitalized 326 overnight (8.3%) compared to the EHR group (6.5%) (P = 0.01, Figure 7D). Using all-cause 327 mortality data obtained from CDPHE vital statistics, 130 (2.3%) individuals in the EHR case 328 group died, leading to a death rate of 1.2%. Four people in the survey case group died, with a 329 death rate of 0.22%.

330
The EHR is a longitudinal data source, therefore we can capture COVID-19 cases on a

340
To quantify discordance of COVID-19 case status between the EHR and the survey we 341 looked across our entire set of survey respondents (n = 25,063). We only counted a participant as 342 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 21, 2022. ;https://doi.org/10.1101https://doi.org/10. /2022 happened after the survey was taken. While neither the survey nor the EHR are a gold standard 344 for case classification, we can look at the discordance between them to identify the potential for 345 misclassification. Overall, there were a total of 1,006 respondents discordant for COVID-19 346 case-status (4%). One hundred seventy-three individuals of the 25,063 individuals who took the 347 survey were identified as COVID-19+ in the EHR but negative or not tested in the survey, 348 leading to a discordance rate of 0.7% ( Table 2). 833 individuals were identified as  in the survey but negative in the EHR, leading to a discordance rate of 3.3%.  To quantify discordance of hospitalization status in both the EHR and in the survey, we 354 restricted it to individuals who responded to the survey and were either COVID-19 positive in 355 the EHR or the survey (n = 2,273). EHR hospitalizations were only considered if they were prior 356 to taking the survey. There were 6 individuals who were positive for hospitalization in the EHR 357 but negative in the survey, a discordance rate of 0.3% (Table 3). There were 59 individuals who 358 were positive for hospitalization in the survey who were negative in the EHR, a discordance rate 359 of 2.6%. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 21, 2022. We found that the COVID-19 pandemic has had far-reaching and varying effects among 367 our Biobank participants. Over 84% of the 25,063 survey respondents reported having one or 368 more COVID-related symptoms since February 2020, 40% were tested for the virus, 13% of 369 those tested were positive, and among positive cases, 45% sought medical care following their 370 diagnosis. Our overall case positivity rate of 13% is comparable to those reported by other EHR-371 based retrospective studies conducted in 2020 and 2021 [8,9]. However, our finding of higher 372 positivity rates among our younger participants (aged 18-39 years; 20%) and Hispanics (19%) in 373 our participants has not been reported previously and may reflect differences in reasons for 374 testing in these groups (e.g. due to having symptoms or recent exposure vs. other reasons).

375
Though not surprising that a large proportion of respondents reported having symptoms given 376 the breadth of symptoms reported (e.g. runny nose, fever, body aches), it is notable that 40% of 377 those with symptoms did not undergo testing nor seek medical care. It is likely that a percentage 378 of this group had COVID-19 and would not be counted as such via public health surveillance 379 efforts, which could lead to substantial underestimates of the true infection rate in the general 380 population.

381
The vast majority of all survey respondents (75%) reported a negative impact from the 382 COVID-19 pandemic-most commonly around mental health and family life. We found that 383 females more often reported negative impacts than males in all domains-employment, family 384 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 21, 2022.  [12]. It is both notable and concerning that nearly 75% of younger adults (aged 18-29 years) 389 reported negative impacts on their mental health, which was higher than for any other group. The 390 younger end of this range captures members of Generation Z, who are more likely to report poor 391 mental health compared to prior generations [13,14]. However, they are also more likely to 392 receive mental health therapy or treatment [13], and therefore may accept interventions to 393 address the negative mental health consequences of the COVID-19 pandemic. Further, we found 394 that negative impacts on employment were more commonly reported among Black participants.

395
These findings highlight the breadth of negative impacts of this pandemic in our community, and 396 reveal the disproportionate impact experienced by certain subgroups that should be targeted in 397 future intervention efforts.

398
Our study population had a much higher vaccination rate compared to Colorado overall 399 and the general US population. Over 95% of our survey participants are fully vaccinated 400 compared to 76% of adults throughout Colorado [15]. Vaccination directly reduces likelihood of 401 infection and severity of disease but it also has an indirect effect on society via reduced viral 402 transmission and herd immunity. Because of this impact on others, getting vaccinated is 403 considered a prosocial behavior [16][17][18]. Being a participant in a biobank has also been 404 positively associated with prosocial behavior, as the individuals who participate in biobanks tend 405 to be motivated by furthering research for the greater good [19,20]. Since our study population 406 only includes those who elected to be in the biobank and additionally, those who responded to 407 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 21, 2022. ;https://doi.org/10.1101https://doi.org/10. /2022 explains the high vaccination rate. Given our highly self-selected study population, results may 409 not generalize outside of the CCPM Biobank and UCHealth population. However, our ability to 410 incorporate EHR data allows us to build a research population of biobank participants that is 411 more representative of the entire patient population.

412
Differences between data captured in the EHR vs. those captured in the survey reveal the 413 benefit of using both sources in combination. For example, mild cases with sub-clinical 414 manifestations of infection that did not result in seeking care may be missing from the EHR but 415 captured in a survey. The EHR is a longitudinal data source that collects clinical information on 416 all patients diagnosed with and/or treated for COVID-19 within the UCHealth system 417 irrespective of proclivity to participate in research or respond to surveys. As such, the EHR 418 captured data from Biobank participants that the survey did not. Periodic analysis of EHR data 419 will allow us to study COVID-19 reinfection and vaccine breakthrough cases over time.

420
However, UCHealth is not a closed system and Biobank participants can receive care outside of 421 UCHealth, so we recognize that not all individuals that were diagnosed or treated for COVID-19 422 are or will be captured in the EHR. We believe that the survey data more completely identifies 423 individuals who did not seek healthcare or sought care outside of the UCHealth system, in 424 particular patients who reported COVID-19 infection in the survey but had no corresponding 425 record in the EHR.

426
COVID-19 has variable clinical presentations ranging from asymptomatic infections to 427 severe symptoms that require hospitalization. We expected that COVID-19 patients identified in 428 the EHR would be sicker on average than survey only cases, more likely to have severe COVID-429 19 and less likely to have asymptomatic infections [22,23]. However, we found that there was a 430 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 21, 2022.  [19806848] and is a limitation of the convenience survey design.

442
Additionally, the Hispanic population in Colorado, as in many other states, had higher incidence 443 of COVID- 19 infections, hospitalizations, and death [4, 25-28], which may explain why they are 444 more likely to be identified through the EHR.

445
A key strength to this study is our ability to leverage an existing, living resource in the 446 CCPM Biobank and survey engine to assess the health and wellbeing of our participants in ways 447 that are not highlighted by the EHR. Further, because participants consent to re-contact, we have 448 the opportunity to follow up with sub-populations within our cohort to collect additional 449 information and monitor outcomes such as re-infection and vaccine uptake. Although our 450 overall response to the survey was sizeable, we acknowledge that the composition of the 451 underlying patient population at UCHealth who enrolled in the Biobank, and differential 452 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 21, 2022. The combination of EHR and survey data provides a powerful opportunity to monitor and 457 study the on-going effects of the COVID-19 pandemic in our communities. As the pandemic 458 continues, there is a critical need for optimal COVID-19 case ascertainment in order to capture 459 both mild and severe cases, and to monitor specific long-term outcomes such as post-acute CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 21, 2022. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.