Abstract
Objectives To address the lack of individual-level socioeconomic information in electronic health care records, we linked the 2011 census of England and Wales to patient records from a large mental healthcare provider. This paper describes the linkage process and methods for mitigating bias due to non-matching.
Setting South London and Maudsley NHS Foundation Trust (SLaM), a mental health care provider in southeast London.
Design Clinical records from SLaM were supplied to the Office of National Statistics (ONS) for link-age to the census through a deterministic matching algorithm. We examined clinical (ICD-10 diagnosis, history of hospitalisation, frequency of service contact) and sociodemographic (age, gender, ethnicity, deprivation) information recorded in CRIS as predictors of linkage success with the 2011 Census. To assess and adjust for potential biases caused by non-matching, we evaluated inverse probability weighting for mortality associations.
Participants Individuals of all ages in contact with SLaM up until December 2019 (N=459,374).
Outcome measures Likelihood of mental health records’ linkage to census.
Results 220,864 (50.4%) records from CRIS linked to the 2011 census. Young adults (Prevalence ratio (PR) 0.80, 95% CI 0.80-0.81), individuals living in more deprived areas (PR 0.78,0.78-0.79), and minority ethnic groups (e.g., Black African, PR 0.67, 0.66-0.68) were less likely to match to census. After implementing inverse probability weighting, we observed little change in the strength of association between clinical/demographic characteristics and mortality (e.g., presence of any psychiatric disorder: unweighted PR 2.66, 95% CI 2.52, 2.80; weighted PR 2.70, 95% CI 2.56, 2.84)
Conclusions Lower response rates to the 2011 census amongst people with psychiatric disorders may have contributed to lower match rates, a potential concern as the census informs service planning and allocation of resources. Due to its size and unique characteristics, the linked dataset will enable novel investigations into the relationship between socioeconomic factors and psychiatric disorders.
Strengths and limitations of this study
This is the first time mental healthcare electronic records have been linked to ONS census at the individual-level in England. Due to its scale, ethnic diversity and demographic characteristics, and abundance of detailed information on a variety of socioeconomic and demographic indicators acquired through the linkage to census records, this dataset will enable novel investigations into the causes, trajectories and outcomes of psychiatric disorders.
A significant strength of the study is that we could assess and adjust for potential biases caused by non-matching related to age, gender and deprivation.
Whilst we observed differences between individuals that matched to census, and those that did not, our weighted analyses were able to show that these differences did not substantially alter associations with mortality outcomes.
Due to the nature of the deterministic linkage algorithm, we could not determine the causes of non-linkage.
Introduction
The growing size and depth of routinely collected administrative data available for research is transforming the study of mental disorders. Traditional epidemiological methods, such as prospective cohort or case-control studies, can present considerable methodological, logistical, and financial challenges due to a high degree of attrition (1), the inherent difficulties in selecting controls (2), and the costs associated with data collection. Electronic health records (EHRs) and other administrative data from public services are therefore increasingly being utilised in epidemiological investigations because they partially address the issue of data loss by collecting information from all individuals who interact with services (3). They also provide a convenient mechanism for sampling controls and eliminate the need for data collection. However, despite their strengths, EHRs typically contain limited information on socioeconomic characteristics at the individual level. Data on occupational classification, long-term unemployment, ethnicity, housing tenure, education, migration, and other relevant socioeconomic measures are often either missing, inaccurate, or collected infrequently, hindering efforts to better understand relationships between mental health and socioeconomic and sociodemographic factors. In prior EHR research, the influence of social determinants has largely been assessed through area-level measures of deprivation, which may not accurately correspond to an individual’s socioeconomic circumstances, potentially biasing observed associations and obfuscating inferences that can be made.
To address these issues, we linked clinical records from the South London and Maudsley (SLaM) Mental Health Trust accessed through its Clinical Record Interactive Search (CRIS) platform, to administrative records from the 2011 population census for England and Wales. The modern census of England and Wales, organised and conducted by the Office for National Statistics (ONS) (4), is a rich source of information on a multitude of socioeconomic indicators such as ethnicity, religion, education, employment, housing, migration, and citizenship and also includes self-rated measures of health and functioning. Because of the size and the considerable ethnic diversity of the mental health services’ catchment area from which CRIS records are derived, we anticipated that this linkage would facilitate the assessment of several pressing questions on the social determinants of onset, course, and outcomes of severe mental health conditions that have thus far only been examined in case-control and prospective cohort studies limited by small sample sizes and significant attrition.
The purpose of this paper is to describe the creation of this data resource and to outline the methodology employed in linking individual records from the two sources We also sought to describe the cohort’s characteristics and to assess how these were associated with successful matches to census records. Finally, to evaluate the potential influence of records not matching on study outcomes, we compared unweighted and inverse probability weighted mortality estimates.
Methods
Data sources used for creating the cohort
CRIS
SLaM provides mental health care to approximately 1.3 million residents in an urban, ethnically diverse, and relatively deprived catchment area comprised of four south London boroughs: Croydon, Lambeth, Lewisham, and Southwark. It is one of Europe’s largest mental health care providers and covers all mental health services provided by the National Health Service (NHS), including the Improving Access to Psychological Therapies (IAPT) service, child and adolescent mental health services (CAMHS) and adult mental health, as well as general hospital liaison and various embedded specialist services (e.g., the eating disorders outpatient service). Since 2007, clinical records for all SLaM services have been electronic-only, provided by its electronic Patient Journey System (ePJS) in the form of tick boxes, drop-down lists, free text, and document attachments (3, 5). The CRIS application was developed to enable these records to be used for research within a robust data security and governance framework requiring a combination of data processing pipelines, including deidentification, supplemented by natural language processing (NLP) techniques to provide text-derived metadata (3). Thus, CRIS provides the entirety of a patient’s mental health record, including information from structured data fields (e.g., age, sex, diagnosis), but also de-identified free-text information, such as clinical correspondence letters, documents outlining care plans and detentions under the mental health act, and routine clinical notes. Diagnostic data is captured through codes from the 10th edition of the International Classification of Disease (ICD), which may appear in both structured and unstructured data fields.
2011 census data
We utilised the results from the 2011 census of England and Wales as they were the most recent at the time that we initiated this data linkage project. The 2011 census was sent out to every household in England and Wales, and additional measures were taken to ensure the representation of individuals living in communal establishments, such as care homes, prisons, and student halls, and of individuals without a fixed address, such as travellers or rough sleepers (6). The person response rate for the 2011 census was 94%, making it the most comprehensive and representative source of socioeconomic and demographic data in England and Wales (7). Census variables are categorised as ‘standard’ or ‘derived’, depending on whether the information they pertain to was explicitly referred to in census questions or derived from respondents responses to other questions (8). For example, ‘standard’ variables relate to information such as accommodation type, employment status, long-term health problems and disability, caring responsibilities, and religious affiliation, whilst ‘derived’ variables relate to occupational social class, household deprivation, tenure of household (i.e., rented or owned), degree of educational qualifications, economic activity (i.e., employed, retired, job seeking, etc.,), and family composition, and many others. For more information about the census, please see https://www.ons.gov.uk/census/2011census.
Linked dataset creation
We sought access to identifiable information for all individuals who had interacted with SLaM mental health services, including IAPT, up until 31 December 2018. This was done through the Health Research Authority (HRA) by obtaining approval from the Confidential Advisory Group (CAG) to identify patients under Section 251 (9). The reason for seeking access was to enable the linkage of records from CRIS and the 2011 census, which do not have a common identifier (e.g. NHS number) and therefore must be linked through the use of identifiable information, such as name, date of birth, and address. Records from CRIS were then supplied to the ONS, who acted as the trusted linkage function on behalf of the Administrative Data Research Centre for England (ADRCE) and conducted the linkage to the 2011 census. Once records had been matched, identifiable information was removed, and each of the records were given an identifier. The de-identified matched file was then hosted in the ONS secure environment, and accessible only to accredited researchers with project-specific approvals to access the data.
For the present analyses, we report associations between the clinical dataset (CRIS) and the census match ‘flag’ generated following linkage. We removed observations if they contained erroneous birthdates (e.g., year of birth was 1900), or if individuals had died before the census (23 March 2011) or were born afterwards (Figure 1). Research Ethics Committee (REC) approvals for the establishment of the linked research database were also obtained, which was an approved in addition to the existing REC approvals for CRIS (see Ethical Approvals section below).
Linkage methodology
Records were linked deterministically through a series of matchkeys comprised of information common to both datasets to create unique identifiers. Because a single matchkey might be unable to resolve inconsistencies between data sources, multiple matchkeys were employed. Table 1 summarises each matchkey, the degree to which they uniquely identified records in each dataset, the proportion of CRIS to census matches, and the specific discrepancy they intended to address. For instance, match-key 2 did not include postcode, thereby allowing records to match on name and date of birth, even if the individual’s residence had changed. Matchkeys were by the proportion of unique observations that they identified and required exact matches on all the selected variables. To reduce the risk of false positives, records only linked on a matchkey if it was unique on both datasets. That is, when a record in one dataset matched multiple records in the other dataset, no matches were made, and a new match was instead attempted with the next matchkey in the hierarchy. Once records matched, they were removed from the pool of records eligible to be selected for matching; another match with these records could therefore no longer be attempted. This means that there was no way to review and unlink matches made earlier in the hierarchy on the basis that the true match was identified at later stages of the matching procedure. Matchkeys 1-11 constitute a set of standard matchkeys that are routinely employed when data owned by the ONS is linked to another dataset (10). We also investigated whether the number of linked records could be increased by attempting further linkage with a set of experimental matchkeys on a randomly selected sample of CRIS data. This additional analysis resulted in matchkey 12.
Measures
We examined an array of routinely recorded sociodemographic and clinical variables in the health record as predictors for successful matching (successful matching denoted through a ‘match flag’ as described above), including age, sex, ethnicity, marital status, referral date, history of admission to psychiatric hospital, clinical diagnosis by ICD-10 chapter, and frequency of service contact. This information was primarily sourced from structured data fields in the health record (e.g., a drop-down list). Diagnostic information was supplemented by meta-data derived from a bespoke validated NLP algorithm applied to text fields (e.g., clinical correspondence) (3, 11). We classified psychiatric disorder diagnoses according to ICD-10 F chapter headings, with an additional “other diagnoses” category (e.g., “Unspecified mental disorder”). We categorised ethnicity following the ‘18+1’ONS standard (12), although we merged some categories due to low cell counts. including an aggregation of all mixed ethnicity groups. Similarly, we placed individuals who were married or in a civil union in the same category. Age was calculated by subtracting the date of patients’ first recorded contact with services from their birthdates and arranged into 7 age bands (less than 25 years old, 25-34, 35-44, 45-54, 55-64, 65 years or older). We also extracted information on incident inpatient admission. Clinical records in CRIS also store information on death, which is obtained on a monthly basis from the NHS’ “Service User Death Report” (13). We used this information to examine mortality as a secondary outcome in order to assess and adjust for potential biases introduced by non-matching. We also explored if outcomes varied by deprivation with the Index of Multiple Deprivation (IMD), an area-level composite measure of deprivation based on income, employment, crime, barriers to housing, health and disability, living environment, and skills and training (14). IMD scores are provided for small geographical areas that correspond to approximately 1,500 individuals, known as a Lower-layer Super Output Area (LSOA). Scores are assigned according to a patient’s postcode that was on record closest to the Census date, and placed in quartiles, with a higher score indicating higher levels of deprivation.
Statistical methods
Using the census match flag, we compared linked and unlinked records to better understand which factors were associated with successful linkage between CRIS and Census records. Because odds ratios fail to approximate relative risks when outcomes are common, we estimated prevalence ratios directly through a modified Poisson model with a robust variance estimator following methods outlined by Zou (15). We opted for this method over a log-binomial modelling approach as it addresses the potential issue of model non-convergence (15). We estimated crude prevalence ratios (PR) indicating the association between demographic (e.g., gender, age, ethnicity, neighbourhood deprivation) and clinical characteristics (e.g., psychiatric diagnosis, history of admission) recorded in CRIS and the probability of matching to census records.
Weighted analyses
A potential issue with linking datasets is that not all records will match, and that this might introduce bias if some parameters (e.g., gender) are related to both matching status and outcomes of interest (16). One way of mitigating the influence of biases due to non-matching is through inverse probability weighting (IPW). IPW weights each observation inversely to its probability of being matched so that those which are less likely to be matched receive higher weight (17). Because we had near complete data in CRIS on gender, age, and area-level deprivation, irrespective of matching status, we could assess and adjust for non-matching related to these characteristics by weighting the matched sample. We calculated the probability of matching through a logistic regression by entering match status as the outcome variable (i.e., 1 = matched; 0 = did not match), with age group, gender and deprivation quartile as covariates. These probabilities were then converted into weights using the following formula, with P indicating the probability of matching of the jth observation: 1 - Pj. We then estimated weighted and unweighted prevalence ratios to measure the association between demographic (e.g., marital status, ethnicity) and clinical variables (i.e., diagnosis of a mental disorder, history of admission, frequency of contact with services etc.,) and all-cause mortality. The weighted and unweighted estimates were adjusted by age, gender, and deprivation quartile.
Results
Cohort characteristics
We identified 459,374 records in CRIS, of which 231,387 (50.4%) matched the 2011 census through matchkeys 1-12 (Table 1). We then applied further exclusion criteria, reducing our matched cohort to 220,864 cases (Figure 1), which is the denominator for all proportions reported below. Just over half of total cohort members were women (54.6%) and the largest ethnic group was White British (52.9%), followed by Black Caribbean (13.8%) and Black African (4.8%). Nearly two-thirds (65.7%) of cohort members were single and/or separated. The average age of the cohort was 37 (standard deviation: 20).
Predictors of non-linkage
We observed differences within all demographic and clinical categories that we examined as predictors for matching success (Table 2). For sex, men were less likely to match compared with women (PR 0.92, 95% CI 0.91-0.92). Relative to the youngest age group, those aged between 25 and 44 matched less frequently, but conversely, individuals 44 years or older were more likely to match, with the oldest age group (65+) having the highest probability of matching (PR 1.31, 1.29-1.34). Widowed (PR 1.27, 1.25-1.28) and married (PR 1.24, 1.23-1.25) individuals matched more often than those whose who were unmarried. The probability of matching was lower for all minority ethnic groups compared with the White British group, with individuals identifying as White Other or Black African ethnicity the least likely to match. We observed a monotonic relationship between deprivation and matching success, with matching probability decreasing as deprivation increased. Matching success also appeared to vary by referral year, with the highest proportion (59.1%) seen in individuals referred in 2011 (the year of the census), with the next highest in the year after (2012; 57.9%) and before (2010; 55.9%) (Figure 2). Matching success varied by ICD-10 diagnosis (Table 2), with relatively lower rates in individuals diagnosed with mental and behavioural disorders due to psychoactive substance use (F10-F19) or schizophrenia, schizotypal and delusional disorders (F20-F29) (PRs 0.86, 0.85-0.87, and 0.91, 0.89-0.92, respectively), and higher rates in those with Organic mental disorders (F00-F09) (PR 1.38, 1.36-1.40). Similarly, frequent contact with services was associated with a higher probability of matching (1-10 contacts: PR 1.04, 1.04-1.05) compared with individuals without repeated contacts.
Weighted vs. unweighted mortality estimates
Weighted prevalence ratios estimating risk of death tended to be higher for most categories examined compared with unweighted estimates (Table 3); however, the differences were generally very small.
After adjusting for age, gender and deprivation quartile, individuals who were widowed were at the highest risk of death (Table 3). Relative to other minority ethnic groups, the White British ethnic category was associated with the highest risk of death, as indicated by the lower prevalence ratios in all other ethnic groups. However, weighted estimates for the association between ethnicity and all-cause mortality did not vary greatly, compared with unweighted estimates. As can be seen in Table 3, all psychiatric disorders were associated with an increased risk of death, except for behavioural and emotional disorders with onset usually occurring in childhood and adolescence.
Discussion
Summary of results
To our knowledge, this is the first time in which large-scale routine electronic health records from a major secondary mental healthcare provider have been successfully linked to individual-level socio-demographic data from census in England. The resultant dataset draws from an urban and ethnically diverse catchment area from which 220,864 secondary mental healthcare records were linked deterministically to detailed sociodemographic data from the 2011 census of England and Wales. Overall, half (50.4%) of records in the secondary mental healthcare dataset linked to 2011 census, and our analyses revealed differences between matched and non-matched records with respect to several sociodemographic and clinical characteristics. We observed the lowest match rates among young adults, individuals living in more deprived areas, and among members of ethnic minority groups. We applied weights to assess how non-matching influenced mortality estimates and observed negligible differences between unweighted and weighted estimates, suggesting that non-linkage to census did not significantly bias associations.
Analysis of records not matching
There are multiple reasons why non-linkage might occur. Firstly, the match rate in our study will have been inherently constrained by the proportion of cases in the CRIS cohort that responded to the 2011 census in the first place. The average response rate within the four London boroughs that comprise the SLaM catchment was lower (88%) compared with the national average (94%) (7). Among younger individuals (25-34 year-olds), who constituted a large proportion of our sample, the response rate was even lower in this region (84%). More mobile populations, which may include migrant and other groups temporarily moving into an area for work alongside people with severe mental illnesses (18), may have been less likely to have taken part in the census. Individuals who moved into the SLaM catchment area and accessed services after 2011 would by default be unable to match. In addition, a growing body of evidence shows that racially minoritised groups, migrants, and other socioeconomically marginalised groups are more likely to face discrimination in their interaction with governmental institutions in the UK, such as the police and the criminal justice system (19, 20), and the NHS (21). It is conceivable that such experiences might coalesce into a general sense of institutional distrust among some members of these communities that is manifested in lower rates of participation. Whatever the cause may be, it would nevertheless seem improbable that our match rate would exceed the average census response rate specific to the SLaM region or the various demographic groups that were prevalent in our sample. It is also well established that unit non-response can be considerable among individuals with a history of mental health disorders, who because of their illnesses might find it challenging to participate (22) or may be more mobile (18). Individuals with mental disorders are also more likely to experience objective social isolation (e.g., have fewer measurable contacts with other individuals) (23) and might consequently be less likely to be captured through proxy responses (i.e., family members responding in their stead). Indeed, surveys conducted annually since 2004 by the Quality Care Commission (CQC), the independent regulator of healthcare in the UK, have never observed response rates of above 41% in community mental health samples (24).
Another factor that merits consideration is the underlying methodology employed in the matching itself. In our study, records were matched deterministically through matchkeys comprised of administrative information collected in both datasets. Inaccuracies or differences (e.g., wrong post-code, incorrect date of birth, name changes due to marriage, or alternative or erroneous spelling of names) in how these data were recorded might therefore have prevented some records from successfully matching. For example, previous linkage of health records to the census in Scotland highlighted a higher chance of clerical error with respect to the spelling of names for minority ethnic groups, leading to lower match rates (25). As individuals from these groups were preponderant in our cohort, it is possible that clerical error accounted for a degree of non-matching in our study. Moreover, because most matchkeys required postcode information to match and because the match rate peaked among individuals who were referred the year the census was taken, it is possible that the deterministic matching methodology that we employed also missed some individuals who had a different address at the time they interacted with SLaM services and responded to the census. This is supported by higher observed levels of matching (60%) for those with an address recorded in the mental health records at the time of census, in 2011, and is consistent with the interpretation that a high proportion of the sample in this study were potentially more mobile. Comparisons to previous efforts of linking the 2011 Census to other administrative data could help disentangle the relative effects of sample-specific non-participation (e.g., cohort member mobility or non-participation due to mental illness) and issues related to the methodology itself (e.g., sensitivity of matchkeys). However, data linkage methods and the measurement of the linkage quality are continuously evolving within the ONS following the adaptation of new working environments and data sharing agreements, which preclude a fair comparison to other data linkage efforts involving the 2011 Census. Our weighted analyses nevertheless indicated that missingness had a negligible influence on relevant study outcomes, such as associations of clinical/sociodemographic characteristics with all-cause mortality.
Finally, together with existing evidence from cohort studies of substantial attrition among participants diagnosed with mental illnesses, and of non-participation in community surveys, our findings point to non-response being a significant contributor to the low match-rate that we observed. Since the Census informs the planning, funding, and commissioning of local services, such as schools and health services, the potential underrepresentation of individuals with mental illnesses is concerning and merits further investigation.
Strength and weaknesses
We believe that this is the first study to link census data in England to clinical records from a population in contact with secondary mental health care services. Because of the cohort’s size, unique socio-demographic composition, and abundant individual-level data on a multitude of important sociodemographic indicators provided by the linkage, we expect this dataset to facilitate novel investigations into health inequalities among people living with mental disorders. The overall size of the cohort is several magnitudes larger than previous UK based mental health cohorts (26), particularly with respect to ethnic minority groups and specific clinical sub-populations (e.g., individuals with severe mental illnesses). The degree of non-linkage that we observed is a potential source of bias. However, we had comprehensive data on many relevant characteristics for the fully enumerated cohort, irrespective of matching status, and could therefore determine through non-response weighting the relative influence that missingness related to these characteristics had, on all-cause mortality estimates. We intend to incorporate these weights in all future analyses to minimise sources of bias. Although the area is ethnically diverse with a good overall representation of Black Caribbean and Black African people, other prevalent ethnic minority groups in England, such as Indian, Pakistani and Bangladeshi populations, are less well represented. Although the highly urban nature of the south London catchment area may be generalisable to other urbanised locations in England, inferences relating to more rural areas may not be possible. There is some evidence that matching of administrative records can be improved through the use of probabilistic techniques (27), but these were not utilised by the ONS for this linkage. It is possible that we could have obtained a higher match rate had record matching been supplemented with probabilistic methods. One of the challenges with the linkage methods employed here is that we could not conclusively determine the exact causes of non-linkage. For instance, we could not quantify the relative degree to which non-linkage was caused by unit non-response or clerical errors in how data was recorded.
Ethical approvals
CRIS has Research Ethics Committee approval as a source of anonymised data for secondary analysis (Oxford REC C, reference 18/SC/0372). The current CRIS-Census linkage was supported through: REC reference for CRIS-Census Linkage: 18/SC/0003. Additional approvals from the Confidential Advisory Group to access patient information without consent, for the purposes of linkage, were obtained (CAG S251 reference: 17/CAG/0204). Approvals were also sought and obtained from the National Statistician’s Data Ethics Advisory Committee (NSDEC) for approvals to use linked CRIS-census data for specified projects.
Patient and public involvement
Patient involvement was supported through consultation with the SLaM Clinical Data Linkage Service (CDLS) Data Linkage Service User and Carer Advisory Group, an advisory group of carers and individuals with lived experience of mental illnesses and mental health service use (28), who were consulted at key points during the project. In addition, a CRIS oversight committee which is chaired by a service user, approves all projects proposing to use CRIS-linked data.
Data Availability
Data from SLaM are owned by a 3rd party SlaM BRC CRIS tool which provides access to anonymised data derived from SlaM electronic medical records. These data can only be accessed by permitted individuals from within a secure firewall (i.e., remote access is not possible and the data cannot be sent elsewhere) in the same manner as the authors. Our team is interested in supporting collaboration with interested researchers, subject to appropriate approvals and accreditation status. Requests to access data can be directed to jayati.das-munshi@kcl.ac.uk
Funding
This paper represents independent research part-funded by the National Institute for Health Research (NIHR) Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London. LC and MW are supported by a grant from the ESRC (ES/S002715/1). RH is currently funded by a doctoral studentship granted by the UKRI ESRC LISS-DTP managed by King’s College London. JD and CM are part supported by the ESRC Centre for Society and Mental Health at King’s College London (ESRC Reference: ES/S012567/1) and by the National Institute for Health Research (NIHR) Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London and the National Institute for Health Research (NIHR) Applied Research Collaboration South London (NIHR ARC South London) at King’s College Hospital NHS Foundation Trust. MH is a NIHR Senior Investigator. RS is part-funded by: i) the National Institute for Health Research (NIHR) Maudsley Biomedical Research Centre at the South London and Maudsley NHS Foundation Trust and King’s College London; ii) the NIHR Applied Research Collaboration South London (NIHR ARC South London) at King’s College Hospital NHS Foundation Trust; iii) UKRI – Medical Research Council through the DATAMIND HDR UK Mental Health Data Hub (MRC reference: MR/W014386); iv) the UK Prevention Research Partnership (Violence, Health and Society; MR-VO49879/1), an initiative funded by UK Research and Innovation Councils, the Department of Health and Social Care (England) and the UK devolved administrations, and leading health research charities. The views expressed are those of the authors and not necessarily those of the ESRC, NHS, the NIHR or the Department of Health and Social Care or King’s College London.
Author Contributions
JD conceived the study, designed the work and led acquisition of the linked dataset and interpretation of findings. LC led design, analysis and interpretation of findings. JD and LC led drafting of the manuscript. AJ supported the design and acquisition of the linked dataset, with MP. NC conducted the initial analysis of findings. MED advised on statistical analyses and interpretation. CM, MH, RHu, RHi, MW and RS contributed to the interpretation of findings. All authors were involved in drafting the work or revising it critically prior to submission and all authors approved the final version to be published and agree to be accountable for all aspects of the work.
Conflicts of interest
MH is principal investigator of the RADAR-CNS, a pre-competitive public-private collaboration on mobile health funded by the Innovative Medicine Initiative with cash and in-kind contributions paid to the university from Janssen, Lundbeck, UCB, MSD and Biogen. RS declares research support in the last 3 years from Janssen, GSK and Takeda. All other authors have no conflicts of interest to declare.
Data sharing
Data from SLaM are owned by a 3rd party SlaM BRC CRIS tool which provides access to anonymised data derived from SlaM electronic medical records. These data can only be accessed by permitted individuals from within a secure firewall (i.e., remote access is not possible and the data cannot be sent elsewhere) in the same manner as the authors. Our team is interested in supporting collaboration with interested researchers, subject to appropriate approvals and accreditation status. Requests to access data can be directed to jayati.das-munshi{at}kcl.ac.uk
Acknowledgements
We are grateful to Hitesh Shetty (SLAM-BRC CRIS team) for his support with data management.