TY - JOUR T1 - Protocol for an observational study evaluating new approaches to modelling diagnostic information from large administrative hospital datasets JF - medRxiv DO - 10.1101/19011338 SP - 19011338 AU - Thomas E Cowling AU - David A Cromwell AU - Linda D Sharples AU - Jan van der Meulen Y1 - 2019/01/01 UR - http://medrxiv.org/content/early/2019/11/08/19011338.abstract N2 - Introduction The Charlson and Elixhauser indices define sets of conditions used to adjust for patients’ comorbidities in administrative hospital data. A strength of these indices is the parsimony that results from including only 19 and 30 conditions respectively, but the conditions included may not be the ones most relevant to a specific outcome and population. Our objectives are to: (1) test an approach to developing parsimonious indices for the specific outcome and populations being studied, while comparing performance to the Charlson and Elixhauser indices; and (2) evaluate several approaches that involve models with more diagnosis-related terms and aim to improve prediction performance by capturing more of the information in large datasets.Methods and analysis This is a modelling study using a linked national dataset of administrative hospital records and death records. The study populations are patients admitted to hospital for acute myocardial infarction, hip fracture, or major surgery for colorectal cancer in England between 1 January 2015 and 31 December 2017. The outcome is death within 365 days of the date of admission (acute myocardial infarction and hip fracture) or procedure (colorectal surgery). In the ‘First analysis’, prognostic indices will be developed based on the presence/absence of individual ICD-10 codes in patients’ medical histories. Logistic regression will be used to estimate associations with a full set of sociodemographic and diagnostic predictors from which reduced models (with fewer diagnostic predictors) will be produced using a step-down approach. In the ‘Second analysis’, models will also account for the timing that each ICD-10 code was last recorded and allow for non-linear relationships and interactions between conditions and the timings of records. Validation will include an overall measure of performance (scaled Brier score) and measures of discrimination (c-statistic) and calibration (such as the Integrated Calibration Index) in bootstrap or cross-validation samples. Sensitivity analyses will include varying the length of medical history analysed, using a comparator that combines the Charlson and Elixhauser sets of conditions, and aggregating ICD-10 codes into clinical groups.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by a Medical Research Council fellowship awarded to TEC (grant number: MR/S020470/1).Author DeclarationsAll relevant ethical guidelines have been followed; any necessary IRB and/or ethics committee approvals have been obtained and details of the IRB/oversight body are included in the manuscript.YesAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesNo additional data are available. ER -