Increased risk of psychiatric sequelae of COVID-19 is highest early in the clinical course

Summary Background: COVID-19 has been shown to increase the risk of adverse mental health consequences. A recent electronic health record (EHR)-based observational study showed an almost two-fold increased risk of new-onset mental illness in the first 90 days following a diagnosis of acute COVID-19. Methods: We used the National COVID Cohort Collaborative, a harmonized EHR repository with 2,965,506 COVID-19 positive patients, and compared cohorts of COVID-19 patients with comparable controls. Patients were propensity score-matched to control for confounding factors. We estimated the hazard ratio (COVID-19:control) for new-onset of mental illness for the first year following diagnosis. We additionally estimated the change in risk for new-onset mental illness between the periods of 21–120 and 121–365 days following infection. Findings: We find a significant increase in incidence of new-onset mental disorders in the period of 21–120 days following COVID-19 (3.8%, 3.6–4.0) compared to patients with respiratory tract infections (3%, 2.8–3.2). We further show that the risk for new-onset mental illness decreases over the first year following COVID-19 diagnosis compared to other respiratory tract infections and demonstrate a reduced (non-significant) hazard ratio over the period of 121–365 days following diagnosis. Similar findings are seen for new-onset anxiety disorders but not for mood disorders. Interpretation: Patients who have recovered from COVID-19 are at an increased risk for developing new-onset mental illness, especially anxiety disorders. This risk is most prominent in the first 120 days following infection. Funding: National Center for Advancing Translational Sciences (NCATS).

. Predictors from regression with and without post-COVID visit frequency. The major predictors are shown together with their 95% confidence intervals. Cox regression was performed in the same way in both experiments except that visit frequency was included in only one experiment. Visit frequency was defined as the number of visits starting at day 21 following the initial event (diagnosis of COVID-19 or RTI). The number of visits was counted up to the final event in the survival curve (either diagnosis of a mental illness or the last day of observation), and was divided the the total number of days between day 21 and the final event.

S1.1 Data Preparation
The first step of our analysis consisted in the definition of cohorts and the preparation of data needed for the statistical analysis.

S1.2 Inputs
The inputs for our analysis included OMOP tables as well as some tables provided by the Palantir platform with processed data.

S1.2.1 OMOP Table
The following OMOP tables were used for our analysis (Table S4). We refer the reader to the original OMOP documentation for more information details [4]. Table  Summary condition_occurrence Dates when a condition is considered to have started and (if applicable) ended observation_period Defines the time period for which a patient's demographics, conditions, procedures and drugs are recorded in the source system with the expectation of a reasonable sensitivity and specificity.

S1.3 Tables offered by the Enclave
The following Enclave-specific tables were used for our analysis.

S1.3.1 concept set members
This table defines the relations between the codesets and concept sets (Table S5).
In the code workbooks, we refer to this This table is prepared as described [5], and contains derived information that was used to provide information about some of the covariates used in our analysis (

S1.4 Data Preparation
The following subsubsections show the code used for preparing the data for analysis. Each subsubsection corresponds to a single node (transform) in the Enclave. Table   Inputs:

S1.4.2 Severity Table for COVID and Controls
Inputs: We first get concept_IDs from predefined concept sets for our control groups. We then select data on covariates from the complete patient table (Section S1.3.2). Finally, we add a column called inciting_condition that contains either "COVID" or the name of a control condition. Listing 2: The SQL command performs a left join on visit occurence id, which is is used in the CPTDS table to denote the visit in which acute COVID-19 or a control condition first occurs.

S1.4.5 covariate list
Here we create a pivot table that makes a column for each of the covariates (which were mentioned in rows in the input table). Note that the suffix -bdc was used to keep track of codesets for this project (the Enclave stores codesets from all projects in the same place, and so generally suffices are used for project-specific codesets). We did not include some covariates that are related to psychiatric outcomes such as vascular dementia and Alzheimer's disease, so they are removed here.

S1.4.6 eligible pt table
Input: • Observation_period_oct21 (Section S1.2.1 ) • Severity_Table_for_COVID_and_Controls (Section S1.4.2) We first extract the period of observation (needed for the time to event analysis) in lines 1-4. We then remove patients with less than one year of history prior to the first medical encounter with COVID-19 or control event (This is done because we want to perform an analysis on new-onset psychiatric disease following COVID-19. Although the previous history of one year does not guarantee that there was no previous encounter, we define the filter as such because of the limited amount of history available in the Enclave) (lines [6][7][8][9][10][11][12][13][14]. Note that there were rare end dates higher than 50,000 days that we interpreted as data problems; affected data points were removed. The next stanza selects patients who developed a psychiatric condition 21 days or later after the initial encounter with COVID1-9 or control condition (lines 15-23).
Finally, the resulting cohort of patients is joined to the covariate list in line 34.

S1.4.7 mental disorder info
Input: • eligible_pt_table (Section S1.4.6) • Psych_Conditions (Section S1.4.3) This transform identifies patients who were diagnosed with a new-onset psychiatric condition 21 or more days after the start of the medical encounter for COVID-19 or the control condition. Listing 7: Gather patients with a diagnosis of psychiatric illness that appears at least 21 days after COVID-19 or the control condition. S15 S1.5 Create Table 1 Input: • Eligible_pt_table (Section S1.4.6 ) The following code was used to gather the information used in Table 1 of the manuscript.
The following transform is written as an R function. In the Enclave, each R or Python transform is written as a function that is executed by the system. The print statement on line 14 causes the table to be printed in another dialog for downstream use.

S1.6 Statistical analysis
The statistical analysis is performed using different transforms for different control groups. Here we show a typical example.

S1.7 Create case propensity-matched control cohorts
In the following code, we create case-control datasets for analysis. In the example we show here, we use "RTI" (non-COVID-19 respiratory tract infection); an analogous analysis was performed for the other cohorts described in the main manuscript.

S1.7.1 RTI COVID Matched
Input: • Eligible_pt_table (Section S1.4.6 ) S16 For each of the case control groups, a function called run_matchit is called that uses the R package matchit to perform propensity score matching [6]. For instance, for the RTI group, the following transform is run.   We do not show further code related to Fracture or Urolithiasis because the code is completely analogous to the code for RTI.

S1.7.2 run matchit
We run propensity matching by considering the cohort assignment (COVID-19 vs. RTI) as the "treatment" (i.e., has_condition) in preparation for the investigation of whether there is a significant difference between the two groups with respect to time to event of a psychiatric diagnosis.

S1.8 coxRegression
Input: • RTI_matched_w_psych_info (Section S1.7.3 Here, we perform cox regression to compare time to event (i.e., of a psychiatric diagnosis) of COVID-19 patients vs. a control group. We use short helper functions to avoid code duplication. The following was used for the comparison between COVID-19 and RTI, and analogous drivers were used for the other comparisons. The code first defines the outcome variable by dividing patients into those who receive a diagnosis of psych_disorder as compared to those who did not have any diagnosis (for instance, if we are testing "Mood disorder", patients with "Anxiety" are excluded from the control group). Then the number of days following the initial medical encounter (COVID-19 or control) is calculated. Cox regression over the entire time course is performed first.
We use the survSplit function, which takes a survival data set and a set of specified cut times and splits each record into multiple subrecords at each cut time. This allows us to perform Cox regression separately on this first and the second half of the year following the initial medical encounter. Note that we disregard results after the second cut because of a substantially smaller amount of data. Finally, the code displays a Kaplan Meier plot (this is the code that was used to generate plots shown in the manuscript and supplement). S1.9 make table 2 RTI Input: • cox_regression (Section S1.8) • Analogous inputs for the other comparisons.
The following code combines all of our input datasets and extracts information that is used to generate table 2. make t a b l e 2 RTI <− f u n c t i o n ( cox RTI a l l psych , cox RTI Mood , cox RTI Anxiety , 2 cox RTI Fatigue , cox RTI Dyspnea ) { s e q u e l a e data <− l i s t ( cox RTI a l l psych , cox RTI Mood , cox RTI Anxiety , 4 cox RTI Fatigue , cox RTI Dyspnea ) r e t u r n ( make t a b l e 2 ( s e q u e l a e data , "RTI" , c u t p o i n t = 1 2 0 ) ) 6 } Listing 16: Make Table 2 help function.

S20
The make_table_2 function is as follows. hr p v a l <− " Hazard R at io P Value " s e q u e l a e names <− c ( " A l l p s y c h i a t r i c i l l n e s s ( t 1 ) " , "Mood d i s o r d e r ( t 1 ) " , " Anxiety ( t 1 ) " , " F a t i g u e ( t 1 ) " , " Dyspnea ( t 1 ) " , " A l l p s y c h i a t r i c i l l n e s s ( t 2 ) " , "Mood d i s o r d e r ( t 2 ) " , " Anxiety ( t 2 ) " , " F a t i g u e ( t 2 ) " , " Dyspnea ( t 2 ) " )