Abstract
Background The impact of artificial intelligence combined with advanced techniques is ever-increasing in the biomedical field appearing promising, among others, in chronic kidney disease (CKD) diagnosis. However, existing models are often single-aetiology specific. Proposed here is a pipeline for the development of single models able to distinguish and spatially visualize multiple CKD aetiologies.
Methods Acquired were from the Human Urinary Proteome Database the urinary peptide data of 1850 healthy control (HC) and CKD (diabetic kidney disease-DKD, IgA nephropathy-IgAN, vasculitis) participants. The uniform manifold approximation and projection (UMAP) method was coupled to a support vector machine (SVM) algorithm. Binary (DKD, HC) and multiclass (DKD, HC, IgAN, vasculitis) classifications were performed, including or skipping the UMAP step. Last, the pipeline was compared to the current state-of-the-art single-aetiology CKD urinary models.
Findings In an independent test set, the developed models (including the UMAP step) achieved 90.35% and 70.13% overall predictive accuracies, respectively, for the binary and the multiclass classifications (96.14% and 85.06%, skipping the UMAP step). Overall, the HC class was distinguished with the highest accuracy. The different classes displayed a tendency to form distinct clusters in the 3D-space based on their disease state.
Interpretation Urinary peptide data appear to potentially be an effective basis for CKD aetiology differentiation. The UMAP step may provide a unique visualization advantage capturing the relevant molecular (patho)physiology. Further studies are warranted to validate the pipeline’s clinical potential in the presented as well as other CKD aetiologies or even other diseases.
Competing Interest Statement
Harald Mischak is the founder and co-owner of Mosaiques Diagnostics (Hannover, Germany). Emmanouil Mavrogeorgis, Agnieszka Latosinska and Justyna Siwy are employed by Mosaiques Diagnostics; Tianlin He was employed by Mosaiques Diagnostics.
Funding Statement
This work was supported in part by the European Union Horizon 2020 research and innovation programs (860329 Marie-Curie ITN STRATEGY-CKD as well as 764474 Marie-Curie ITN CaReSyAn). The German Research Foundation also supported in part this work (SFB/TRR219 Consortium Project ID: 322900939). This work was also supported by BMBF founded project UPTAKE (01EK2105A-E). AV, JS, HM and JPS are members of the COST (European Cooperation in Science and Technology) action PERMEDIK CA21165.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This study was approved by the ethics committee of the Friedrich-Alexander Universität Erlangen-Nürnberg, Germany (ethic approval code 264_20 B for the nephrological biobank and ethic approval code 221_20 B for the urinary proteomics analysis).
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
Data sharing Data will be made available upon request directed to the corresponding author. Proposals will be reviewed and approved by the investigators and collaborators based on scientific merit. After approval of a proposal, data will be shared through a secure online platform after signing the data access and confidentiality agreement. Code Availability Code was generated based on the functions in the respective R packages as described in the methods and will be made available upon request directed to the corresponding author.