Abstract
Background The cutting-edge AI/ML techniques have proven effective at uncovering elucidative knowledge on disease-causing biomarkers and the biological underpinnings of a plethora of human diseases. However, the high-dimensional nature of multi-omics data presents numerous challenges in its effective presentation, annotation, and interpretation. Traditional 2D visualizations often fall short in capturing the intricate relationships between multi-omics features, hindering our ability to identify meaningful correlations.
Methods In this study, we focused on addressing such challenges by developing an innovative solution to better visualize results produced by AI/ML approaches on integrated clinical and multi-omics data for novel biomarker discovery and predictive analysis. We present an advanced version of our earlier published software with intuitive and interactive visualizations of multi-omics data in multi-dimensions i.e., 3D IntelliGenes, which offers deeper insights, most importantly by capturing greater variability in the patient data by understanding both linear and non-linear structures, evaluating AI/ML model performance, and delineating the joint impact of biomarkers on the corresponding disease states.
Results The overall functionality of 3D IntelliGenes is divided into two modules, data clustering and feature plotting. The data clustering module creates configurable 3D scatter plots to visualize the structure-preserving distribution of disease states, AI/ML classifier bias in the form of type I/II errors, and patient similarity through a robust density-driven clustering algorithm. Whereas the feature plotting module supports the joint analysis of pairs of multi-omics features to analyze the interdependence and discriminative power of co-expressed biomarkers.
Conclusion We report evaluated performance of 3D IntelliGenes using diverse cohorts of patients with cardiovascular and other diseases.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work has been supported by the Department of Medicine, Robert Wood Johnson Medical School, and Rutgers Institute for Health, Health Care Policy, and Aging Research at Rutgers, The State University of New Jersey.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
All human samples were used in accordance with relevant guidelines and regulations, and all experimental protocols were approved by the Institutional Review Board of Rutgers.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
The source code of 3D IntelliGenes is publicly available at GitHub.
Abbreviations
- ANOVA
- Analysis of Variance
- AI
- Artificial Intelligence
- AR
- Augmented Reality
- CVD
- Cardiovascular Disease
- χ2
- Chi-Squared
- CIGT
- Clinically Integrated Genomics and Transcriptomics
- FAIR4RS
- Findable, Accessible, Interoperable, Reproducible for Research Software
- GUI
- Graphical User Interface
- I-Gene
- Intelligent Gene
- KNN
- K-Nearest Neighbors
- KDE
- Kernel Density Estimation
- ML
- Machine Learning
- MLP
- Multi-Layer Perceptron
- PaCMAP
- Pairwise-Controlled Manifold Approximation and Projection
- PCA
- Principal Component Analysis
- RF
- Random Forest
- ROC
- Receiver Operating Characteristic curves
- RFE
- Recursive Feature Elimination
- ROC-AUC
- ROC Area Under Curve
- SHAP
- Shapley Additive exPlanations
- SVM
- Support Vector Machine
- 3D
- Three-Dimensional
- 2D
- Two-Dimensional
- UMAP
- Uniform Manifold Approximation and Projection
- VR
- Virtual Reality
- WGS
- Whole Genome Sequencing
- XGBoost
- Xtreme Gradient Boosting