Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Predicting a diagnosis of ankylosing spondylitis using primary care health records – a machine learning approach

Jonathan Kennedy, Natasha Kennedy, View ORCID ProfileRoxanne Cooksey, View ORCID ProfileErnest Choy, View ORCID ProfileStefan Siebert, Muhammad Rahman, View ORCID ProfileSinead Brophy
doi: https://doi.org/10.1101/2021.04.22.21255659
Jonathan Kennedy
1Data Science Building, Swansea University. SA2 8PP. Wales, UK
EngD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: j.i.kennedy@swansea.ac.uk
Natasha Kennedy
1Data Science Building, Swansea University. SA2 8PP. Wales, UK
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Roxanne Cooksey
1Data Science Building, Swansea University. SA2 8PP. Wales, UK
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Roxanne Cooksey
Ernest Choy
2CREATE Centre, Section of Rheumatology, Division of Infection and Immunity, School of Medicine, Cardiff University, Cardiff, CF10 3AT
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ernest Choy
Stefan Siebert
3Institute of Infection Immunity & Inflammation, University of Glasgow
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Stefan Siebert
Muhammad Rahman
1Data Science Building, Swansea University. SA2 8PP. Wales, UK
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sinead Brophy
1Data Science Building, Swansea University. SA2 8PP. Wales, UK
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sinead Brophy
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Ankylosing spondylitis is the second most common cause of inflammatory arthritis. However, a successful diagnosis can take a decade to confirm from symptom onset (via x-rays). The aim of this study was to use machine learning methods to develop a profile of the characteristics of people who are likely to be given a diagnosis of AS in future.

The Secure Anonymised Information Linkage databank was used. Patients with ankylosing spondylitis were identified using their routine data and matched with controls who had no record of a diagnosis of ankylosing spondylitis or axial spondyloarthritis. Data was analysed separately for men and women. The model was developed using feature/variable selection and principal component analysis to develop decision trees. The decision tree with the highest average F value was selected and validated with a test dataset.

The model for men indicated that lower back pain, uveitis, and NSAID use under age 20 is associated with AS development. The model for women showed an older age of symptom presentation compared to men with back pain and multiple pain relief medications. The models showed good prediction (positive predictive value 70%-80%) in test data but in the general population where prevalence is very low (0.09% of the population in this dataset) the positive predictive value would be very low (0.33%-0.25%).

Machine learning can be used to help profile and understand the characteristics of people who will develop AS, and in test datasets with artificially high prevalence, will perform well. However, when applied to a general population with low prevalence rates, such as that in primary care, the positive predictive value for even the best model would be 1.4%. Multiple models may be needed to narrow down the population over time to improve the predictive value and therefore reduce the time to diagnosis of ankylosing spondylitis.

Competing Interest Statement

This work was supported by UCB Pharma, Health Data Research UK, and the infrastructure support of the National Centre for Population Health and Wellbeing and the SAIL Databank. The funders had no input in to the study design, analysis or interpretation or writing up of the work.

Funding Statement

This study was funded by UCB. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. This work was supported by Health Data Research UK, which is funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation and the Wellcome Trust.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Data held in the SAIL databank are anonymised, consequently, no ethical approval is required. All data contained in SAIL has permission from the relevant Caldicott Guardian or Data Protection Officer. SAIL-related projects are required to obtain Information Governance Review Panel (IGRP) approval and this study had governance approval.

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Footnotes

  • j.i.kennedy{at}swansea.ac.uk

  • n.l.kennedy{at}swansea.ac.uk

  • r.cooksey{at}swansea.ac.uk

  • choyeh{at}cardiff.ac.uk

  • Stefan.Siebert{at}glasgow.ac.uk

  • m.a.rahman{at}swansea.ac.uk

  • s.brophy{at}swansea.ac.uk

  • Funding source: This work was supported by UCB Pharma, Health Data Research UK, and the infrastructure support of the National Centre for Population Health and Wellbeing and the SAIL Databank. The funders had no input in to the study design, analysis or interpretation or writing up of the work.

Data Availability

The SAIL databank was used and all data used in this study are available through the SAIL application process, https://www.saildatabank.com/application-process.

https://www.saildatabank.com/application-process

  • Abbreviations

    ALF
    Anonymised Linking Field
    AS
    ankylosing spondylitis
    ASAS
    Assessment of SpondyloArthritis Society
    CRP
    C-reactive protein
    MRI
    magnetic resonance imaging
    NSAID
    Non-steroidal anti-inflammatory drugs
    SAIL
    Secure Anonymised Information Linkage
    SpA
    spondyloarthritis, HLA-B27, GP, MRI
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
    Back to top
    PreviousNext
    Posted April 25, 2021.
    Download PDF
    Data/Code
    Email

    Thank you for your interest in spreading the word about medRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    Predicting a diagnosis of ankylosing spondylitis using primary care health records – a machine learning approach
    (Your Name) has forwarded a page to you from medRxiv
    (Your Name) thought you would like to see this page from the medRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    Predicting a diagnosis of ankylosing spondylitis using primary care health records – a machine learning approach
    Jonathan Kennedy, Natasha Kennedy, Roxanne Cooksey, Ernest Choy, Stefan Siebert, Muhammad Rahman, Sinead Brophy
    medRxiv 2021.04.22.21255659; doi: https://doi.org/10.1101/2021.04.22.21255659
    Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
    Citation Tools
    Predicting a diagnosis of ankylosing spondylitis using primary care health records – a machine learning approach
    Jonathan Kennedy, Natasha Kennedy, Roxanne Cooksey, Ernest Choy, Stefan Siebert, Muhammad Rahman, Sinead Brophy
    medRxiv 2021.04.22.21255659; doi: https://doi.org/10.1101/2021.04.22.21255659

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Rheumatology
    Subject Areas
    All Articles
    • Addiction Medicine (269)
    • Allergy and Immunology (549)
    • Anesthesia (134)
    • Cardiovascular Medicine (1746)
    • Dentistry and Oral Medicine (238)
    • Dermatology (172)
    • Emergency Medicine (310)
    • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (651)
    • Epidemiology (10770)
    • Forensic Medicine (8)
    • Gastroenterology (582)
    • Genetic and Genomic Medicine (2930)
    • Geriatric Medicine (286)
    • Health Economics (531)
    • Health Informatics (1917)
    • Health Policy (832)
    • Health Systems and Quality Improvement (740)
    • Hematology (290)
    • HIV/AIDS (627)
    • Infectious Diseases (except HIV/AIDS) (12494)
    • Intensive Care and Critical Care Medicine (684)
    • Medical Education (299)
    • Medical Ethics (86)
    • Nephrology (319)
    • Neurology (2778)
    • Nursing (150)
    • Nutrition (431)
    • Obstetrics and Gynecology (553)
    • Occupational and Environmental Health (596)
    • Oncology (1451)
    • Ophthalmology (440)
    • Orthopedics (172)
    • Otolaryngology (254)
    • Pain Medicine (190)
    • Palliative Medicine (56)
    • Pathology (378)
    • Pediatrics (863)
    • Pharmacology and Therapeutics (361)
    • Primary Care Research (333)
    • Psychiatry and Clinical Psychology (2626)
    • Public and Global Health (5335)
    • Radiology and Imaging (1001)
    • Rehabilitation Medicine and Physical Therapy (592)
    • Respiratory Medicine (721)
    • Rheumatology (329)
    • Sexual and Reproductive Health (288)
    • Sports Medicine (278)
    • Surgery (327)
    • Toxicology (47)
    • Transplantation (149)
    • Urology (124)