Abstract
High-throughput proteomics has emerged as a potentially rich data source to improve capacity to forecast disease. This study explores the utility of plasma proteomics for identifying novel predictors of Multiple myeloma (MM), combining machine learning with statistical approaches. Utilising data from the UK Biobank, including proteomic profiles of over 50k participants, we applied an “extreme gradient boosting” (XGBoost) algorithm with SHapley Additive exPlanation (SHAP) feature-importance measures to identify key proteomic biomarkers to predict onset of MM. At least seven of the top 10 identified proteins are related to immune function and activation of lymphoid cells; two are validated MM targets with approved therapies. The top 10 proteins along with key clinical predictors were further analysed using Cox proportional hazards models to assess their contribution to incident MM risk. 10 proteomic biomarkers ranked by SHAP value substantially outperformed traditional clinical predictors. This superior performance was maintained over the 12-year follow-up period, demonstrating the predictive ability of these proteomic biomarkers for early detection of MM. The demonstration of the dysregulated expression of proteins in serum from healthy individuals, if confirmed in prospective cohorts and independent datasets, could lead to novel approaches to screening for MM and precursor conditions.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Computing of this study used the Oxford Biomedical Research Computing (BMRC) facility, a joint development between the Wellcome Centre for Human Genetics and the Big Data Institute supported by Health Data Research UK and the NIHR Oxford Biomedical Research Centre. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health. DAC was supported by the Pandemic Sciences Institute at the University of Oxford; the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre (BRC); an NIHR Research Professorship; a Royal Academy of Engineering Research Chair; the Wellcome Trust funded VITAL project (grant 204904/Z/16/Z); the EPSRC (grant EP/W031744/1); and the InnoHK Hong Kong Centre for Cerebro-cardiovascular Engineering (COCHE). The ADH group at the Nuffield Department of Primary Care Health Sciences is supported by the National Institute for Health and Care Research (NIHR) Applied Research Collaboration Oxford and Thames Valley at Oxford Health NHS Foundation Trust. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This research has been conducted using the UK Biobank Resource under Application Number 83801. The UK Biobank has ethical approval from the National Health Service Northwest Multicentre Research Ethics Committee (06/MRE08/65), and all participants provided written informed consent.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data availability
The data reported in this paper are available via application directly to the UK Biobank, https://www.ukbiobank.ac.uk.