A national prospective cohort study of SARS/COV2 pandemic outcomes in the U.S.: The CHASING COVID Cohort

Introduction: The Chasing COVID Cohort (C3) study is a US-based, geographically and socio-demographically diverse sample of adults (18 and older) enrolled into a prospective cohort study during the upswing of the U.S. COVID-19 pandemic. Methods: We used internet-based strategies to enroll C3 participants beginning March 28th, 2020. Following baseline questionnaire completion, study participants will be contacted monthly (for 6 months) to complete assessments of engagement in non-pharmaceutical interventions (e.g., use of cloth masks, avoiding large gatherings); COVID-19 symptoms; SARS/COV2 testing and diagnosis; hospitalizations; healthcare access; and uptake of health messaging. Dried blood spot (DBS) specimens will be collected at the first follow-up assessment (last week of April 2020) and at month 3 (last week of June 2020) and stored until a validated serologic test is available. Results: As of April 20, 2020, the number of people that completed the baseline survey and provided contact information for follow-up was 7,070. Participants resided in all 50 US states, the District of Columbia, Puerto Rico, and Guam. At least 24% of participants were frontline workers (healthcare and other essential workers). Twenty-three percent (23%) were 60+ years, 24% were Black or Hispanic, 52% were men, and 52% were currently employed. Nearly 20% reported recent COVID-like symptoms (cough, fever or shortness of breath) and a high proportion reported engaging in non-pharmaceutical interventions that reduce SARS/COV2 spread (93% avoided groups >20, 58% wore masks; 73% quarantined). More than half (54%) had higher risk for severe COVID-19 illness should they become infected with SARS/COV2 based on age, underlying health conditions (e.g., chronic lung disease), or daily smoking. Discussion: A geographically and socio-demographically diverse group of participants was rapidly enrolled in the C3 during the upswing of the SARS/COV2 pandemic. Strengths of the C3 include the potential for direct observation of, and risk factors for, seroconversion and incident COVID disease (among those with or without antibodies to SARS/COV2) in areas of active transmission.


INTRODUCTION
The Coronavirus Disease 2019 (COVID- 19) pandemic has dramatically transformed life across the entire United States, resulting in medical and economic challenges and threats for many households and communities. The earliest research efforts have focused on understanding the clinical course of COVID-19 and the most effective ways of treating people with severe symptoms or illness. As the pandemic progresses, however, we must also investigate COVID-19's evolving epidemiology and the impact of non-pharmaceutical interventions (NPIs), such as physical distancing, health messaging, and testing. Researchers and public health practitioners have called for cohort studies to describe the community attack rate, as well as how attack rates are influenced by different approaches to NPI implementation. 1 Internet-based strategies, which facilitate rapid recruitment of large and diverse samples, can be leveraged to understand and inform this swiftly changing and protracted public health crisis. 2,3 In response to the COVID-19 pandemic the CUNY Institute for Implementation Science in Population Health (ISPH) launched the Communities, Households and SARS/COV-2 Epidemiology (CHASING) COVID Cohort "C 3 " study on March 28, 2020. We sought to recruit an online prospective cohort of 7,500 adults (18 years or older) in the United States (US) and US territories in order to rapidly contribute to our understanding of the spread and impact of the SARS/COV2 pandemic within households and communities. In a prospective cohort study, we will assess the impact of implementing, and relaxing, NPIs on SARS/COV2 clinical outcomes and psychosocial outcomes such as mental health, social support, and interpersonal violence.

METHODS
We aimed to rapidly enroll a geographically and socio-demographically diverse sample of adult participants residing in the US and US territories. We applied internet-based strategies . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 4, 2020. . https://doi.org/10.1101/2020.04. 28.20080630 doi: medRxiv preprint that have been demonstrated to be effective for recruiting and following large and geographically diverse online cohorts. [2][3][4]

Cohort Eligibility and Recruitment
Persons aged 18 years and above who resided in the US or US territories were eligible to join the study. Study participants were recruited via social media platforms (e.g., Facebook, Instagram, and Scruff) or via referral to the study. Anyone with knowledge of the study was allowed to invite others to participate. By tapping into personal networks of participants, we aimed to improve recruitment of persons >59 years of age, who may not be as active on social media as younger persons and thus rely more on snowball sampling methods. The study was promoted as a way for participants to contribute to understanding the COVID-19 pandemic.
Facebook and Instagram advertisements were developed in English and Spanish and were geographically targeted to people currently residing in the US and US territories who were 18 or older.
The C 3 had a targeted sample size of 7,500 participants. Study staff actively monitored cohort demographics and adjusted advertisement strategies as needed to recruit a more geographically and socio-demographically diverse sample. The advertisement strategies were adaptive based on the profile of participants enrolled as of a given date. For example, strategies could shift to recruit older persons if that demographic was not well-represented.
Potential participants were directed to an enrollment survey (hosted by Qualtrics) in their web browser on a computer or mobile device. 5 The consent form described the study, monthly follow-up assessments, and future study opportunities, including the possibility to receive a SARS/COV2 serologic test as part of the study. The consent form also described the incentive schedule: a drawing for $100 for the baseline survey (with 20 winners) and gift cards ranging from $5-30 for all participants for completion of subsequent surveys and antibody testing.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 4, 2020.  Enrollment and baseline assessment   Enrollment for C 3 began on March 28, 2020, at which point there were 122,000 documented COVID-19 cases and 2,200 COVID-19 deaths reported in the US. 6  Measures included on the baseline questionnaire were derived from previously published research (e.g., Together 5000 2 , BRFSS, and H1N1 influenza studies 7,8 ) and from other researchers who had developed surveys for understanding COVID-19 (e.g., Canadian Institutes of Health Research 7 and Food Access and Food Security during COVID-19 9 ). Measures were also developed de novo in response to the novel pandemic. We deployed a second version of the baseline questionnaire on April 9, 2020, which added questions to capture healthcare and other essential worker status. The surveys are available on the C 3 webpage ( https://cunyisph.org/chasing-covid/ ).

Follow-up assessments
Questionnaires . Following the baseline assessment, C 3 participants will be surveyed monthly for 6-months (until September, 2020). The follow-up assessments will gather data on . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 4, 2020. . https://doi.org/10.1101/2020.04. 28.20080630 doi: medRxiv preprint symptoms, testing, hospitalizations and other time-varying factors (e.g., NPI uptake and relaxation) (see Table 1 for survey realms).
Specimen collection. At the first follow-up assessment (end of April, 2020) and the third follow-up assessment (end of June, 2020), participants will be asked to self-collect a specimen for serologic testing. Participants will be mailed dried blood spot (DBS) self-sampling kits. Using the provided lancet, they will prick the side of their finger and provide a sample of blood on the provided card.  11 , where they will be banked at -80ºC for future serologic testing for IgM and IgG antibodies to SARS/COV2 via a suitable validated assay. We will monitor the emerging pipeline of serology test systems.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 4, 2020. . Participants who do not submit a DBS card at the end of Month 1 will be sent a reminder and will be allowed to submit it at any time during follow-up. Participants will receive $20 upon returning a valid specimen to the study laboratory.
Daily symptom tracking. Monthly assessments are supplemented by voluntary daily symptom tracking via an innovative COVID-19 symptom tracker 12 that we have deployed in our cohort. The Coronavirus Pandemic Epidemiology (COPE) consortium has developed the COVID Symptom Tracker app, downloadable for free in the Apple and Android App Stores, which enables individuals to self-report information on COVID-19 exposure and infections. On first use, the app queries location, age, and core risk factors and comorbidities. Daily prompts query for updates on interim symptoms, health care visits, and COVID-19 testing results. The C 3 Study joined the COPE consortium on April 6th, allowing our cohort members who use the app to consent to share their data with us so that it can be linked to the larger C 3 cohort data. Consent to merge data from the symptom tracker app with other C 3 data is being solicited at the Month 1 follow-up assessment.

Incidence of SARS/COV2
infection . is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 4, 2020. . https://doi.org/10.1101/2020.04. 28.20080630 doi: medRxiv preprint between the two tests (if no symptoms) or 5 days before a documented report of onset of COVID-like symptoms.
Asymptomatic SARS/COV2 infection . Asymptomatic infection will be defined as a positive SARS/COV2 serologic test, with no documented report of COVID-like symptoms from prior C 3 interviews or from the COVID-19 symptom tracker app. The proportion with asymptomatic infection among persons with a positive SARS/COV2 serologic test will then be calculated as the number with asymptomatic infection divided by the total number with a positive serologic test. We will stratify our estimates of the proportion with asymptomatic infection by whether the seroconversion occurred prior to month 1 or whether seroconversion was observed in the cohort between Months 1 and 3 among those with a documented negative serologic test at Month 1.

Confirmed and possible COVID-19 disease at or prior to baseline. For SARS/COV2
seropositive persons, we will define confirmed prior COVID-19 as a self-report of symptoms consistent with COVID-19 on our symptom screener in the two weeks prior to baseline, with a date of disease onset estimated as one week prior to baseline. Those reporting COVID-19 symptoms without a serologic test will be considered possible cases of prior COVID-19.

Confirmed incident COVID-19
disease after serologic testing. For analyses to assess subsequent disease after Month 1, incident COVID-19 disease will be defined as development of new COVID-like symptoms > 7 days after the first (positive or negative) SARS/COV2 serologic test result. 13 We will count new COVID-19 disease, including a self-report of COVID-like symptoms, COVID-19 diagnosis or hospitalization on the C 3 questionnaire or via the COVID-19 symptom tracker app. Severe COVID-19 disease will be defined as having been hospitalized for COVID-like symptoms.
Other outcomes. We will examine secondary outcomes such as: anxiety symptoms (Generalized Anxiety Disorder-7 [GAD-7] 7-item scale] 14 ), and depressive symptoms (Patient . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 4, 2020.  15 ). We will also measure and describe the prevalence of food insecurity 9 , substance use 16,17 , unhealthy drinking (Alcohol Use Disorders Identification Test [AUDIT-C])[ 18 , and intimate partner violence (IPV).

Exposures
NPI uptake in the C 3 cohort. We will examine specific NPI uptake among C 3 cohort participants over time, and also calculate NPI uptake by creating an index. The index is a summative score of responses to survey questions about engagement in NPI actions in the two weeks prior to the survey. Each affirmative or neutral action is weighted 1 and each negative response 0, so that a higher index value indicates greater NPI uptake/engagement.
State level NPI implementation. We and others have compiled living databases of state-level implementation, and relaxing of NPIs, which document the type and date of NPI implementation (e.g., school closings, restaurant closings, stay at home order, cloth masks), who is covered (older persons, non-essential workers), as well as when specific NPIs are relaxed. We will characterize NPI implementation according to indices of stringency developed by others. 19 County-level physical distancing. Stay-at-home orders and other measures have greatly reduced population mobility in many areas. 20 We will operationalize physical distancing using a proxy-changes in mobility at the county level relative to the timing of first cases and deaths in each county-to classify counties as having achieved physical distancing early or late and to assess the influence of relaxing physical distancing measures (relative to nadir). We will use county-level data on mobility, updated daily, from Descartes Labs (posted on GitHub) 21 . These data include mobility calculated using GPS data from 'a collection of mobile devices reporting consistently throughout' each day. 22 The maximum distance moved in kilometers is calculated daily for each person, and aggregated up to the county level median (or other aggregate . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 4, 2020. . https://doi.org/10.1101/2020.04. 28.20080630 doi: medRxiv preprint metric). We will use the following metrics: 1. M50 -the median of the max-distance mobility for all samples in each county; and 2. M50I (m50 index) -the percent of normal M50 in the region, with normal M50 defined during Feb 17, 2020 to March 7, 2020. M50, and to a lesser degree M50I, will be highly influenced by the proximity/density of resources (e.g., supermarkets). seropositive individuals, we will estimate the proportion with asymptomatic and mild disease using their previously reported data on symptoms consistent with COVID-19, weighted to the US adult population using age, sex, and race/ethnicity stratification. To assess whether seropositivity for SARS/COV2 is protective against new disease, we will compare the incidence of COVID-like disease (any and severe) among persons previously identified as seropositive to that of persons previously identified as seronegative, geographically matched within areas where SARS/COV2 transmission remains active.

Data management
All data were imported and cleaned in R and SAS (V9.4). Data were geocoded based on a self-reported ZIP code. Maps were create d in ArcGIS 10.7.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Ethical Approval
The C 3 study protocol was approved by the Institutional Review Board at the City University of New York (CUNY) Graduate School for Public Health and Health Policy.

Cohort Eligibility and Recruitment
Among the N=8,711 participants who were eligible, 82% (N = 7,125) completed the survey, and 81% (7,070) left an email address for future study-follow-up ( Figure 2).
More than half (54%) were at increased risk for COVID-19 illness should they become infected with SARS/COV2 on the basis of age (60+), reporting an underlying health condition (chronic lung disease, asthma (current), type 2 diabetes, serious heart condition, kidney disease, or an immunocompromised status), or daily smoking ( Table 2). The proportion of . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 4, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 4, 2020. . https://doi.org/10.1101/2020.04. 28.20080630 doi: medRxiv preprint 4% were in transportation (e.g., taxis). The proportion of persons employed in frontline work decreased with increasing category of age.

NPI / Physical Distancing Behaviors Stratified by Age Categories
A high proportion of participants reported avoiding large groups with >20 people in the past two weeks and avoiding handshakes or hugs (93% and 92%, respectively) ( Table 3).
Nearly half (49%) reported working from home. A majority reported wearing gloves (56%) and masks (58%), and these proportions significantly increased with age (54% of 18- . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 4, 2020. . https://doi.org/10.1101/2020.04. 28.20080630 doi: medRxiv preprint Among th e 20% of participants reporting COVID-like symptoms (N = 1,436), 39% (N = 555) said they called or saw a physician/healthcare professional and 12% (N = 166) were hospitalized (Table 4). Compared to participants at lower risk for COVID-19 illness, participants with higher risk for severe COVID-19 illness were more likely to report seeing a physician or hospitalization (28% versus 46% and 2% versus 18%, respectively and p <0.001 for each comparison). Among all participants, 5% (N = 368) reported being tested for COVID-19 and 3% (N = 191) reported receiving a COVID-19 diagnosis. Participants at higher risk for COVID-19 illness were significantly more likely to report testing or receiving a diagnosis than participants at lower risk for severe COVID-19 illness (testing: 7% versus 3% and diagnosis: 4% versus 1%, respectively and p<0.001 for each comparison).

DISCUSSION
The C 3 study of 7,070 persons from all 50 US states, the District of Columbia, Puerto Rico and Guam was rapidly established in the middle of the SARS/COV2 upswing in the US.
The C 3 cohort is geographically and socio-demographically diverse, and includes participants from many active hotspots during the recruitment period (March 28-April 20, 2020), as well as frontline health care workers and other essential employees, and individuals who are vulnerable to severe outcomes associated with SARS/COV2 infection.
At the baseline assessment, nearly one in five reported having had recent COVID-like symptoms. Among those reporting COVID-like symptoms, 38% reported seeing a health care provider and 12% reported being hospitalized. A small proportion of C 3 participants reported being tested for or diagnosed with SARS/COV2 (5% and 3%, respectively), and participants with elevated risk for COVID-19 illness were more likely to report seeking care, hospitalization, and testing than participants without elevated risk. Limitations of serological assays notwithstanding, recent cross-sectional serosurveys done prior to the relaxing of physical . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 4, 2020. . distancing have reported seroprevalence estimates ranging from 3% in CA to 21% in NYC. [24][25][26] Many in the C 3 cohort are likely to have serologic evidence of prior SARS/COV2 infection as of the Month 1 follow-up assessment. Those that are seronegative at Month 1 may be at high risk for seroconversion between Months 1 and 3, given ongoing transmission in many areas, and the expectation of physical distancing measures being relaxed in the coming months. When this occurs, many areas will also be implementing enhanced testing, contact tracing, and quarantine. The C 3 Study has the potential to monitor and assess the uptake and impact of these key strategies that are part of the public health response to control and mitigate the SARS/COV2 pandemic in the U.S.
Strengths of the C 3 study include its prospective (vs. cross-sectional) design, allowing direct observation of seroconversions and incident COVID disease among those who were unexposed and/or disease free. The longitudinal design also allows prospective estimation of the incidence of COVID disease among those with antibodies to SARS/COV2, allowing a rapid assessment --in the midst of a pandemic --of the extent to which SARS/COV2 antibodies offer short-term protection against subsequent disease. Prospective studies, which by definition follow the same individuals forward in time, are complementary to and offer some strengths over cross sectional studies, especially in the context of rapidly evolving emergencies and the associated public health response. While repeat cross-sectional surveys are valuable in a pandemic, including their ability to assess trends in many important outcomes, they cannot assess what factors may influence change over time in an individual. Cross-sectional studies also by definition will exclude persons who are in the hospital or who have died.
We are using assessment strategies designed to minimize their assessment effects, as well as objective biological indicators. Studies requiring human contact can cause participants to under-report sensitive health behaviors and to adopt behaviors that make them less . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 4, 2020. . https://doi.org/10.1101/2020.04. 28.20080630 doi: medRxiv preprint representative of the populations from which they were drawn. Studies involving high levels of contact may induce behavior change by repeatedly engaging participants outside of their natural context, artificially biasing results 27,28 and reducing generalizability. 29 The C 3 study has limitations worth noting, as they inform what can and cannot be assessed. First, C 3 will be unable to provide representative estimates of prevalence and incidence. Second, we will underestimate hospitalizations and are unable to capture deaths due to COVID or other causes. Most research studies, including ours, deployed in the middle of a pandemic will, by definition, produce some biased estimates since they will not include information on persons who died from COVID, were hospitalized with COVID prior to or during recruitment, or were too sick to participate in a research study at the time of recruitment. From published studies, we will assess bias in our estimates due to these factors and adjust them accordingly. Third, we will be unable to conduct state or county specific analyses, except for a few localities with high participation (e.g., New York and California). Finally, we do not yet know the retention rate or the acceptance rate for specimen collection. However, participants may be more motivated to participate in follow-up, given the active threat and novelty of the COVID-19 pandemic and its ongoing impact on individuals and communities.
We considered the strengths and weaknesses of several different study designs and methodologic features when designing and launching the C 3 . Ultimately, we chose a design that prioritized our ability to rapidly answer key epidemiologic questions and enroll a geographically and socio-demographically diverse sample of individuals. We considered whether a probability sample of households with a telephone phone interview should be leveraged, given the potential to use a known sampling frame which would facilitate estimates that may be more population representative. However, given the need for rapid information and knowledge generation, we chose to recruit participants from online settings, and enrolled >7,000 people in 3 weeks.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 4, 2020. . Although our sample is not representative of the entire US population, it is geographically representative and socio-demographically diverse. Our study will complement other efforts to address similar research questions, such as the NIH's planned serosurvey. 30 Indeed, it will be important to assess if online vs. conventional recruitment methods reach similar conclusions.
Our approach employs protocols for overcoming common pitfalls of fully online studies (e.g., repeat/duplicate participation). Our online, volunteer recruitment approach allows us to sample individuals who may not be reached by traditional telephone recruitment approaches, which can have very low response rates. As part of our enrollment procedures, we record IP address, email addresses, participant contact information, and require participants to have valid US mailing addresses (required to receive an at-home SARS/COV2 specimen collection kit).
Participants will be "known" to the research team (name, email, address), thus averting some of the traditional shortcomings of online-only studies (particularly anonymous, cross-sectional online studies).

Data sharing
We plan to rapidly produce manuscripts, which will be simultaneously submitted to MedRxiv[75] and leading scientific journals for peer review. To increase the impact of our work, we will also post a deidentified, HIPAA compliant, public use version of our baseline and follow-up data on GitHub.
[76] Data will be presented as flat text files (CSV) formatted for compatibility with the New York Times county-level longitudinal case load dataset [3], including date, county, state, and fips code. A GitHub Actions script will perform weekly updates of the repository and its associated GitHub Pages site, automatically incorporating all new submissions from the previous week. Finally, we will provide direct feedback to our cohort and other stakeholders who have signed up for updates via our C 3 newsletter. 31

Conclusion
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 4, 2020. .
A geographically and socio-demographically diverse group of participants was rapidly enrolled in the C 3 during the upswing of the SARS/COV2 pandemic. Strengths of the C 3 include the potential for direct observation of, and risk factors for, seroconversions and incident COVID disease (among those with or without antibodies to SARS/COV2) in areas of active transmission. The C 3 Study has the potential to monitor and assess the uptake and impact of the public health response to control and mitigate the SARS/COV2 pandemic in the US. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 4, 2020. . https://doi.org/10.1101/2020.04. 28.20080630 doi: medRxiv preprint Tables . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 4, 2020. . https://doi.org/10.1101/2020.04. 28.20080630 doi: medRxiv preprint