PT - JOURNAL ARTICLE AU - Jonathan Kennedy AU - Natasha Kennedy AU - Roxanne Cooksey AU - Ernest Choy AU - Stefan Siebert AU - Muhammad Rahman AU - Sinead Brophy TI - Predicting a diagnosis of ankylosing spondylitis using primary care health records – a machine learning approach AID - 10.1101/2021.04.22.21255659 DP - 2021 Jan 01 TA - medRxiv PG - 2021.04.22.21255659 4099 - http://medrxiv.org/content/early/2021/04/25/2021.04.22.21255659.short 4100 - http://medrxiv.org/content/early/2021/04/25/2021.04.22.21255659.full AB - Ankylosing spondylitis is the second most common cause of inflammatory arthritis. However, a successful diagnosis can take a decade to confirm from symptom onset (via x-rays). The aim of this study was to use machine learning methods to develop a profile of the characteristics of people who are likely to be given a diagnosis of AS in future.The Secure Anonymised Information Linkage databank was used. Patients with ankylosing spondylitis were identified using their routine data and matched with controls who had no record of a diagnosis of ankylosing spondylitis or axial spondyloarthritis. Data was analysed separately for men and women. The model was developed using feature/variable selection and principal component analysis to develop decision trees. The decision tree with the highest average F value was selected and validated with a test dataset.The model for men indicated that lower back pain, uveitis, and NSAID use under age 20 is associated with AS development. The model for women showed an older age of symptom presentation compared to men with back pain and multiple pain relief medications. The models showed good prediction (positive predictive value 70%-80%) in test data but in the general population where prevalence is very low (0.09% of the population in this dataset) the positive predictive value would be very low (0.33%-0.25%).Machine learning can be used to help profile and understand the characteristics of people who will develop AS, and in test datasets with artificially high prevalence, will perform well. However, when applied to a general population with low prevalence rates, such as that in primary care, the positive predictive value for even the best model would be 1.4%. Multiple models may be needed to narrow down the population over time to improve the predictive value and therefore reduce the time to diagnosis of ankylosing spondylitis.Competing Interest StatementThis work was supported by UCB Pharma, Health Data Research UK, and the infrastructure support of the National Centre for Population Health and Wellbeing and the SAIL Databank. The funders had no input in to the study design, analysis or interpretation or writing up of the work.Funding StatementThis study was funded by UCB. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. This work was supported by Health Data Research UK, which is funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation and the Wellcome Trust. Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Data held in the SAIL databank are anonymised, consequently, no ethical approval is required. All data contained in SAIL has permission from the relevant Caldicott Guardian or Data Protection Officer. SAIL-related projects are required to obtain Information Governance Review Panel (IGRP) approval and this study had governance approval.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe SAIL databank was used and all data used in this study are available through the SAIL application process, https://www.saildatabank.com/application-process. https://www.saildatabank.com/application-process ALFAnonymised Linking FieldASankylosing spondylitisASASAssessment of SpondyloArthritis SocietyCRPC-reactive proteinMRImagnetic resonance imagingNSAIDNon-steroidal anti-inflammatory drugsSAILSecure Anonymised Information LinkageSpAspondyloarthritis, HLA-B27, GP, MRI