ABSTRACT
Introduction The Chasing COVID Cohort (C3) study is a US-based, geographically and socio-demographically diverse sample of adults (18 and older) enrolled into a prospective cohort study during the upswing of the U.S. COVID-19 pandemic.
Methods We used internet-based strategies to enroll C3 participants beginning March 28th, 2020. Following baseline questionnaire completion, study participants will be contacted monthly (for 6 months) to complete assessments of engagement in non-pharmaceutical interventions (e.g., use of cloth masks, avoiding large gatherings); COVID-19 symptoms; SARS/COV2 testing and diagnosis; hospitalizations; healthcare access; and uptake of health messaging. Dried blood spot (DBS) specimens will be collected at the first follow-up assessment (last week of April 2020) and at month 3 (last week of June 2020) and stored until a validated serologic test is available.
Results As of April 20, 2020, the number of people that completed the baseline survey and provided contact information for follow-up was 7,070. Participants resided in all 50 US states, the District of Columbia, Puerto Rico, and Guam. At least 24% of participants were frontline workers (healthcare and other essential workers). Twenty-three percent (23%) were 60+ years, 24% were Black or Hispanic, 52% were men, and 52% were currently employed. Nearly 20% reported recent COVID-like symptoms (cough, fever or shortness of breath) and a high proportion reported engaging in non-pharmaceutical interventions that reduce SARS/COV2 spread (93% avoided groups >20, 58% wore masks; 73% quarantined). More than half (54%) had higher risk for severe COVID-19 illness should they become infected with SARS/COV2 based on age, underlying health conditions (e.g., chronic lung disease), or daily smoking.
Discussion A geographically and socio-demographically diverse group of participants was rapidly enrolled in the C3 during the upswing of the SARS/COV2 pandemic. Strengths of the C3 include the potential for direct observation of, and risk factors for, seroconversion and incident COVID disease (among those with or without antibodies to SARS/COV2) in areas of active transmission.
INTRODUCTION
The Coronavirus Disease 2019 (COVID-19) pandemic has dramatically transformed life across the entire United States, resulting in medical and economic challenges and threats for many households and communities. The earliest research efforts have focused on understanding the clinical course of COVID-19 and the most effective ways of treating people with severe symptoms or illness. As the pandemic progresses, however, we must also investigate COVID-19’s evolving epidemiology and the impact of non-pharmaceutical interventions (NPIs), such as physical distancing, health messaging, and testing. Researchers and public health practitioners have called for cohort studies to describe the community attack rate, as well as how attack rates are influenced by different approaches to NPI implementation.1 Internet-based strategies, which facilitate rapid recruitment of large and diverse samples, can be leveraged to understand and inform this swiftly changing and protracted public health crisis.2,3
In response to the COVID-19 pandemic the CUNY Institute for Implementation Science in Population Health (ISPH) launched the Communities, Households and SARS/COV-2 Epidemiology (CHASING) COVID Cohort “C3” study on March 28, 2020. We sought to recruit an online prospective cohort of 7,500 adults (18 years or older) in the United States (US) and US territories in order to rapidly contribute to our understanding of the spread and impact of the SARS/COV2 pandemic within households and communities. In a prospective cohort study, we will assess the impact of implementing, and relaxing, NPIs on SARS/COV2 clinical outcomes and psychosocial outcomes such as mental health, social support, and interpersonal violence.
METHODS
We aimed to rapidly enroll a geographically and socio-demographically diverse sample of adult participants residing in the US and US territories. We applied internet-based strategies that have been demonstrated to be effective for recruiting and following large and geographically diverse online cohorts.2–4
Cohort Eligibility and Recruitment
Persons aged 18 years and above who resided in the US or US territories were eligible to join the study. Study participants were recruited via social media platforms (e.g., Facebook, Instagram, and Scruff) or via referral to the study. Anyone with knowledge of the study was allowed to invite others to participate. By tapping into personal networks of participants, we aimed to improve recruitment of persons >59 years of age, who may not be as active on social media as younger persons and thus rely more on snowball sampling methods. The study was promoted as a way for participants to contribute to understanding the COVID-19 pandemic. Facebook and Instagram advertisements were developed in English and Spanish and were geographically targeted to people currently residing in the US and US territories who were 18 or older.
The C3 had a targeted sample size of 7,500 participants. Study staff actively monitored cohort demographics and adjusted advertisement strategies as needed to recruit a more geographically and socio-demographically diverse sample. The advertisement strategies were adaptive based on the profile of participants enrolled as of a given date. For example, strategies could shift to recruit older persons if that demographic was not well-represented.
Potential participants were directed to an enrollment survey (hosted by Qualtrics) in their web browser on a computer or mobile device.5 The consent form described the study, monthly follow-up assessments, and future study opportunities, including the possibility to receive a SARS/COV2 serologic test as part of the study. The consent form also described the incentive schedule: a drawing for $100 for the baseline survey (with 20 winners) and gift cards ranging from $5-30 for all participants for completion of subsequent surveys and antibody testing.
Enrollment and baseline assessment
Enrollment for C3 began on March 28, 2020, at which point there were 122,000 documented COVID-19 cases and 2,200 COVID-19 deaths reported in the US.64 Enrollment ended on April 20, 2020, when there were 783,000 persons diagnosed with SARS/COV2, including 42,000 deaths in the U.S. (Figure 1).6
Individuals who provided informed consent were then screened for eligibility via the Qualtrics survey. Eligible and consenting individuals were asked to provide an email address for future follow-up assessments.
Measures included on the baseline questionnaire were derived from previously published research (e.g., Together 50002, BRFSS, and H1N1 influenza studies7,8) and from other researchers who had developed surveys for understanding COVID-19 (e.g., Canadian Institutes of Health Research7 and Food Access and Food Security during COVID-199). Measures were also developed de novo in response to the novel pandemic. We deployed a second version of the baseline questionnaire on April 9, 2020, which added questions to capture healthcare and other essential worker status. The surveys are available on the C3 webpage (https://cunyisph.org/chasing-covid/).
Follow-up assessments
Questionnaires. Following the baseline assessment, C3 participants will be surveyed monthly for 6-months (until September, 2020). The follow-up assessments will gather data on symptoms, testing, hospitalizations and other time-varying factors (e.g., NPI uptake and relaxation) (see Table 1 for survey realms).
Specimen collection. At the first follow-up assessment (end of April, 2020) and the third follow-up assessment (end of June, 2020), participants will be asked to self-collect a specimen for serologic testing. Participants will be mailed dried blood spot (DBS) self-sampling kits. Using the provided lancet, they will prick the side of their finger and provide a sample of blood on the provided card. To facilitate self-sampling procedures, all participants will receive printed instructions demonstrating procedures for DBS collection and instructions to contact the study team, if they have questions.10 DBS cards will be returned via US Postal Service (self-addressed, stamped envelope containing EBF Foil biohazard bag™) to the C3 study laboratory (Molecular Testing Labs)11, where they will be banked at −80ºC for future serologic testing for IgM and IgG antibodies to SARS/COV2 via a suitable validated assay. We will monitor the emerging pipeline of serology test systems. Participants who do not submit a DBS card at the end of Month 1 will be sent a reminder and will be allowed to submit it at any time during follow-up. Participants will receive $20 upon returning a valid specimen to the study laboratory.
Daily symptom tracking. Monthly assessments are supplemented by voluntary daily symptom tracking via an innovative COVID-19 symptom tracker12 that we have deployed in our cohort. The Coronavirus Pandemic Epidemiology (COPE) consortium has developed the COVID Symptom Tracker app, downloadable for free in the Apple and Android App Stores, which enables individuals to self-report information on COVID-19 exposure and infections. On first use, the app queries location, age, and core risk factors and comorbidities. Daily prompts query for updates on interim symptoms, health care visits, and COVID-19 testing results. The C3 Study joined the COPE consortium on April 6th, allowing our cohort members who use the app to consent to share their data with us so that it can be linked to the larger C3 cohort data. Consent to merge data from the symptom tracker app with other C3 data is being solicited at the Month 1 follow-up assessment.
Outcomes
Incidence of SARS/COV2 infection. The cumulative incidence of SARS/COV2 infection at the end of Month 1 (~end of April, 2020) and Month 3 (~end of June, 2020) will be defined as the proportion of persons at each time point with SARS/COV2 infection as confirmed by serologic testing: IgM and/or IgG positive. For persons testing positive at Month 1, we will assume infection occurred at the midpoint between 5 days before the county’s first reported SARS/COV2 diagnosis and the date of the serologic test, unless we can assign a more accurate date based on reported symptoms and/or presence of IgM/IgG. For those who test negative at Month 1 and positive at Month 3, we will assume infection occurred at the midpoint between the two tests (if no symptoms) or 5 days before a documented report of onset of COVID-like symptoms.
Asymptomatic SARS/COV2 infection. Asymptomatic infection will be defined as a positive SARS/COV2 serologic test, with no documented report of COVID-like symptoms from prior C3 interviews or from the COVID-19 symptom tracker app. The proportion with asymptomatic infection among persons with a positive SARS/COV2 serologic test will then be calculated as the number with asymptomatic infection divided by the total number with a positive serologic test. We will stratify our estimates of the proportion with asymptomatic infection by whether the seroconversion occurred prior to month 1 or whether seroconversion was observed in the cohort between Months 1 and 3 among those with a documented negative serologic test at Month 1.
Confirmed and possible COVID-19 disease at or prior to baseline. For SARS/COV2 seropositive persons, we will define confirmed prior COVID-19 as a self-report of symptoms consistent with COVID-19 on our symptom screener in the two weeks prior to baseline, with a date of disease onset estimated as one week prior to baseline. Those reporting COVID-19 symptoms without a serologic test will be considered possible cases of prior COVID-19.
Confirmed incident COVID-19 disease after serologic testing. For analyses to assess subsequent disease after Month 1, incident COVID-19 disease will be defined as development of new COVID-like symptoms ≥7 days after the first (positive or negative) SARS/COV2 serologic test result.13 We will count new COVID-19 disease, including a self-report of COVID-like symptoms, COVID-19 diagnosis or hospitalization on the C3 questionnaire or via the COVID-19 symptom tracker app. Severe COVID-19 disease will be defined as having been hospitalized for COVID-like symptoms.
Other outcomes. We will examine secondary outcomes such as: anxiety symptoms (Generalized Anxiety Disorder-7 [GAD-7] 7-item scale]14), and depressive symptoms (Patient Health Questionnaire-2 [PHQ-2]15). We will also measure and describe the prevalence of food insecurity9, substance use16, 17, unhealthy drinking (Alcohol Use Disorders Identification Test [AUDIT-C])[18, and intimate partner violence (IPV).
Exposures
NPI uptake in the C3 cohort. We will examine specific NPI uptake among C3 cohort participants over time, and also calculate NPI uptake by creating an index. The index is a summative score of responses to survey questions about engagement in NPI actions in the two weeks prior to the survey. Each affirmative or neutral action is weighted 1 and each negative response 0, so that a higher index value indicates greater NPI uptake/engagement.
State level NPI implementation. We and others have compiled living databases of state-level implementation, and relaxing of NPIs, which document the type and date of NPI implementation (e.g., school closings, restaurant closings, stay at home order, cloth masks), who is covered (older persons, non-essential workers), as well as when specific NPIs are relaxed. We will characterize NPI implementation according to indices of stringency developed by others.19
County-level physical distancing. Stay-at-home orders and other measures have greatly reduced population mobility in many areas.20 We will operationalize physical distancing using a proxy—changes in mobility at the county level relative to the timing of first cases and deaths in each county—to classify counties as having achieved physical distancing early or late and to assess the influence of relaxing physical distancing measures (relative to nadir). We will use county-level data on mobility, updated daily, from Descartes Labs (posted on GitHub)21. These data include mobility calculated using GPS data from ‘a collection of mobile devices reporting consistently throughout’ each day.22 The maximum distance moved in kilometers is calculated daily for each person, and aggregated up to the county level median (or other aggregate metric). We will use the following metrics: 1. M50 - the median of the max-distance mobility for all samples in each county; and 2. M50I (m50 index) - the percent of normal M50 in the region, with normal M50 defined during Feb 17, 2020 to March 7, 2020. M50, and to a lesser degree M50I, will be highly influenced by the proximity/density of resources (e.g., supermarkets).
Planned analyses. We will integrate time-updated publicly available county-level data on mobility22, SARS/COV2 diagnoses, and COVID-19 deaths23 with our longitudinal C3 cohort data to determine the impact of NPI implementation on SARS/COV2 outcomes. Among SARS/COV2 seropositive individuals, we will estimate the proportion with asymptomatic and mild disease using their previously reported data on symptoms consistent with COVID-19, weighted to the US adult population using age, sex, and race/ethnicity stratification. To assess whether seropositivity for SARS/COV2 is protective against new disease, we will compare the incidence of COVID-like disease (any and severe) among persons previously identified as seropositive to that of persons previously identified as seronegative, geographically matched within areas where SARS/COV2 transmission remains active.
Data management
All data were imported and cleaned in R and SAS (V9.4). Data were geocoded based on a self-reported ZIP code. Maps were created in ArcGIS 10.7.
Ethical Approval
The C3 study protocol was approved by the Institutional Review Board at the City University of New York (CUNY) Graduate School for Public Health and Health Policy.
RESULTS
Cohort Eligibility and Recruitment
Among the N=8,711 participants who were eligible, 82% (N = 7,125) completed the survey, and 81% (7,070) left an email address for future study-follow-up (Figure 2).
Baseline Characteristics of C3 Stratified by Age Characteristics
The final cohort of 7,070 was geographically diverse (Figure 3), with participants from all 50 states, the District of Columbia, Puerto Rico and Guam. The median age of participants was 42 years (interquartile range: 31, 58); 61% were aged 18-49 (including 5% <21 years (N=345)), 16% were aged 50-59, and 23% were aged 60 years or older (Table 1). Just over half (51%) were male, 14% were Hispanic, 11% black non-Hispanic, 5% Asian or Pacific Islander, and 67% white non-Hispanic. A majority were currently employed (53%), 18% were retired, 16% were out of work, and 9% were students.
More than half (54%) were at increased risk for COVID-19 illness should they become infected with SARS/COV2 on the basis of age (60+), reporting an underlying health condition (chronic lung disease, asthma (current), type 2 diabetes, serious heart condition, kidney disease, or an immunocompromised status), or daily smoking (Table 2). The proportion of persons with an underlying health condition increased with age category (28% among 18-49 year olds and 44% among 60+), and the proportion of daily smokers decreased with increasing age category (20% and 11%, respectively).
Among 5,403 participants that completed the updated version of the baseline assessment that included questions on essential employment (participants could select more than one employment category), 24% reported being a frontline worker (healthcare or other essential workers) (Table 2, Figure 1). By employment category, 9% were healthcare workers, 5% were healthcare workers who screened or cared for COVID-19 patients, 11% were in delivery services (e.g., food) and 4% were in transportation (e.g., taxis). The proportion of persons employed in frontline work decreased with increasing category of age.
NPI / Physical Distancing Behaviors Stratified by Age Categories
A high proportion of participants reported avoiding large groups with >20 people in the past two weeks and avoiding handshakes or hugs (93% and 92%, respectively) (Table 3). Nearly half (49%) reported working from home. A majority reported wearing gloves (56%) and masks (58%), and these proportions significantly increased with age (54% of 18-49 year olds wore gloves versus 61 % of 60+, and 54% of 18-49 year olds wore masks versus 68% of 60+, p for chi-square: <0.001 for each comparison). Almost one in six (15%) participants reported stockpiling personal protective equipment and 41% reported stockpiling food. The proportion of participants who reported stockpiling decreased significantly with increasing age categories (17% of 18-49 year olds stockpiled PPE versus 13% of 60+, and 47% of 18–49 year olds stockpiled food versus 31% of 60+, p<0.001 for each comparison).
COVID-19 symptoms and care outcomes
One in five (20% or N = 1,436) reported any COVID-like symptoms prior to C3 enrollment (cough, fever or shortness of breath) and this decreased significantly with age (23% versus 13% among 18-49 and 60+ year olds, respectively and p<0.001) (Table 3). The most common symptoms reported were new cough (12%) followed by shortness of breath (9%) and fever (7%).
Among the 20% of participants reporting COVID-like symptoms (N = 1,436), 39% (N = 555) said they called or saw a physician/healthcare professional and 12% (N = 166) were hospitalized (Table 4). Compared to participants at lower risk for COVID-19 illness, participants with higher risk for severe COVID-19 illness were more likely to report seeing a physician or hospitalization (28% versus 46% and 2% versus 18%, respectively and p <0.001 for each comparison). Among all participants, 5% (N = 368) reported being tested for COVID-19 and 3% (N = 191) reported receiving a COVID-19 diagnosis. Participants at higher risk for COVID-19 illness were significantly more likely to report testing or receiving a diagnosis than participants at lower risk for severe COVID-19 illness (testing: 7% versus 3% and diagnosis: 4% versus 1%, respectively and p<0.001 for each comparison).
DISCUSSION
The C3 study of 7,070 persons from all 50 US states, the District of Columbia, Puerto Rico and Guam was rapidly established in the middle of the SARS/COV2 upswing in the US. The C3 cohort is geographically and socio-demographically diverse, and includes participants from many active hotspots during the recruitment period (March 28-April 20, 2020), as well as frontline health care workers and other essential employees, and individuals who are vulnerable to severe outcomes associated with SARS/COV2 infection.
At the baseline assessment, nearly one in five reported having had recent COVID-like symptoms. Among those reporting COVID-like symptoms, 38% reported seeing a health care provider and 12% reported being hospitalized. A small proportion of C3 participants reported being tested for or diagnosed with SARS/COV2 (5% and 3%, respectively), and participants with elevated risk for COVID-19 illness were more likely to report seeking care, hospitalization, and testing than participants without elevated risk. Limitations of serological assays notwithstanding, recent cross-sectional serosurveys done prior to the relaxing of physical distancing have reported seroprevalence estimates ranging from 3% in CA to 21% in NYC.24–26 Many in the C3 cohort are likely to have serologic evidence of prior SARS/COV2 infection as of the Month 1 follow-up assessment. Those that are seronegative at Month 1 may be at high risk for seroconversion between Months 1 and 3, given ongoing transmission in many areas, and the expectation of physical distancing measures being relaxed in the coming months. When this occurs, many areas will also be implementing enhanced testing, contact tracing, and quarantine. The C3 Study has the potential to monitor and assess the uptake and impact of these key strategies that are part of the public health response to control and mitigate the SARS/COV2 pandemic in the U.S.
Strengths of the C3 study include its prospective (vs. cross-sectional) design, allowing direct observation of seroconversions and incident COVID disease among those who were unexposed and/or disease free. The longitudinal design also allows prospective estimation of the incidence of COVID disease among those with antibodies to SARS/COV2, allowing a rapid assessment -- in the midst of a pandemic -- of the extent to which SARS/COV2 antibodies offer short-term protection against subsequent disease. Prospective studies, which by definition follow the same individuals forward in time, are complementary to and offer some strengths over cross sectional studies, especially in the context of rapidly evolving emergencies and the associated public health response. While repeat cross-sectional surveys are valuable in a pandemic, including their ability to assess trends in many important outcomes, they cannot assess what factors may influence change over time in an individual. Cross-sectional studies also by definition will exclude persons who are in the hospital or who have died.
We are using assessment strategies designed to minimize their assessment effects, as well as objective biological indicators. Studies requiring human contact can cause participants to under-report sensitive health behaviors and to adopt behaviors that make them less representative of the populations from which they were drawn. Studies involving high levels of contact may induce behavior change by repeatedly engaging participants outside of their natural context, artificially biasing results27, 28 and reducing generalizability.29
The C3 study has limitations worth noting, as they inform what can and cannot be assessed. First, C3 will be unable to provide representative estimates of prevalence and incidence. Second, we will underestimate hospitalizations and are unable to capture deaths due to COVID or other causes. Most research studies, including ours, deployed in the middle of a pandemic will, by definition, produce some biased estimates since they will not include information on persons who died from COVID, were hospitalized with COVID prior to or during recruitment, or were too sick to participate in a research study at the time of recruitment. From published studies, we will assess bias in our estimates due to these factors and adjust them accordingly. Third, we will be unable to conduct state or county specific analyses, except for a few localities with high participation (e.g., New York and California). Finally, we do not yet know the retention rate or the acceptance rate for specimen collection. However, participants may be more motivated to participate in follow-up, given the active threat and novelty of the COVID-19 pandemic and its ongoing impact on individuals and communities.
We considered the strengths and weaknesses of several different study designs and methodologic features when designing and launching the C3. Ultimately, we chose a design that prioritized our ability to rapidly answer key epidemiologic questions and enroll a geographically and socio-demographically diverse sample of individuals. We considered whether a probability sample of households with a telephone phone interview should be leveraged, given the potential to use a known sampling frame which would facilitate estimates that may be more population representative. However, given the need for rapid information and knowledge generation, we chose to recruit participants from online settings, and enrolled >7,000 people in 3 weeks. Although our sample is not representative of the entire US population, it is geographically representative and socio-demographically diverse. Our study will complement other efforts to address similar research questions, such as the NIH’s planned serosurvey.30 Indeed, it will be important to assess if online vs. conventional recruitment methods reach similar conclusions.
Our approach employs protocols for overcoming common pitfalls of fully online studies (e.g., repeat/duplicate participation). Our online, volunteer recruitment approach allows us to sample individuals who may not be reached by traditional telephone recruitment approaches, which can have very low response rates. As part of our enrollment procedures, we record IP address, email addresses, participant contact information, and require participants to have valid US mailing addresses (required to receive an at-home SARS/COV2 specimen collection kit). Participants will be “known” to the research team (name, email, address), thus averting some of the traditional shortcomings of online-only studies (particularly anonymous, cross-sectional online studies).
Data sharing
We plan to rapidly produce manuscripts, which will be simultaneously submitted to MedRxiv[75] and leading scientific journals for peer review. To increase the impact of our work, we will also post a deidentified, HIPAA compliant, public use version of our baseline and follow-up data on GitHub.[76] Data will be presented as flat text files (CSV) formatted for compatibility with the New York Times county-level longitudinal case load dataset[3], including date, county, state, and fips code. A GitHub Actions script will perform weekly updates of the repository and its associated GitHub Pages site, automatically incorporating all new submissions from the previous week. Finally, we will provide direct feedback to our cohort and other stakeholders who have signed up for updates via our C3 newsletter.31
Conclusion
A geographically and socio-demographically diverse group of participants was rapidly enrolled in the C3 during the upswing of the SARS/COV2 pandemic. Strengths of the C3 include the potential for direct observation of, and risk factors for, seroconversions and incident COVID disease (among those with or without antibodies to SARS/COV2) in areas of active transmission. The C3 Study has the potential to monitor and assess the uptake and impact of the public health response to control and mitigate the SARS/COV2 pandemic in the US.
FUNDING
Funding for this project is provided by the CUNY Institute for Implementation Science in Population Health (cunyisph.org) and the COVID-19 Grant Program of the CUNY Graduate School of Public Health and Health Policy.
Data Availability
The data that support the findings of this study are available on request from the corresponding author, [DN]. The data are not yet publicly available, but we are preparing to post a deidentified, HIPAA compliant, public use version of our baseline and follow-up data on GitHub.
Footnotes
CONFLICT OF INTEREST: None declared