TY - JOUR T1 - Comparison of Machine Learning Approaches in Prediction of Bone Mineral Density in Elderly Men JF - medRxiv DO - 10.1101/2020.01.20.20018143 SP - 2020.01.20.20018143 AU - Qing Wu AU - Fatma Nasoz AU - Jongyun Jung AU - Bibek Bhattarai Y1 - 2020/01/01 UR - http://medrxiv.org/content/early/2020/01/24/2020.01.20.20018143.abstract N2 - Bone mineral density (BMD) is a highly heritable trait with heritability ranging from 50% to 80%. Numerous BMD-associated Single Nucleotide Polymorphisms (SNPs) were discovered by GWAS and GWAS meta-analysis. However, several studies found that combining these highly significant SNPs together only explained a small percentage of BMD variance. This inconsistency may be caused by limitations of the linear regression approaches employed because these traditional approaches lack the flexibility and the adequacy to model complex gene interactions and regulations. Hence, we developed various machine learning models of genomic data and ran experiments to identify the best machine learning model for BMD prediction at three different sites. We used genomic data of Osteoporotic Fractures in Men (MrOS) cohort Study (N=5,133) for analysis. Genotype imputation was conducted at the Sanger Imputation Server. A total of 1,103 BMD-associated SNPs were identified and corresponding weighted genetic risk scores were calculated. Genetic variants, as well as age and other traditional BMD predictors, were included for modeling. Data were normalized and were split into a training set (80%) and a test set (20%). BMD prediction models were built separately by random forest, gradient boosting, and neural network algorithms. Linear regression was used as a reference model. We applied the non-parametric Wilcoxon signed-rank tests for the measurement of MSE in each model for the pair-wise model comparison. We found that gradient boosting shows the lowest MSE for each BMD site and a prediction model built using the machine learning models achieves improved performance when a large number of SNPs are included in the models. With the predictors of phenotype covariate + 1,103 SNPs, all of the models were statistically significant except neural network vs. random forest at femoral neck BMD and gradient boosting vs. random forest at total hip BMD.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThe research and analysis described in the current publication was supported by a COBRE grant from the National Institute of General Medical Sciences (5 U54 GM 104944), the Genome Acquisition to Analytics (GAA) Research Core of the Personalized Medicine Center of Biomedical Research Excellence in the Nevada Institute of Personalized Medicine, and the National Supercomputing Institute at the University of Nevada Las Vegas. The funding sponsor was not involved in the analysis design, genotype imputation, data analysis, and interpretation of the analysis results or in the preparation, review and/or approval of this manuscript.Author DeclarationsAll relevant ethical guidelines have been followed; any necessary IRB and/or ethics committee approvals have been obtained and details of the IRB/oversight body are included in the manuscript.YesAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe Osteoporotic Fractures in Men Study (MrOS) was used as the data source for this study. MrOS is a federal funded prospective, cohort study which was designed to investigate anthropometric, lifestyle, and medical factors associated with bone health in older, community‐dwelling men. Details of the MrOS study design, recruitment, and baseline cohort characteristics have been reported12 elsewhere. With the approval of the institutional review board at the University of Nevada, Las Vegas and National Institute of Health (NIH), the genotype and phenotype data of MrOS was acquired from dbGaP (Accession: phs000373.v1.p1). The data/analyses presented in the current publication are based on the use of study data downloaded from the dbGaP web site, under phs000373.v1.p1 (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000373.v1.p1).MrOSOsteoporotic Fractures in Men StudyBMDBone Mineral DensityMLMachine LearningGRSGeneric Risk ScoreLRLinear RegressionRFRandom ForestGBGradient BoostingNNNeural NetworkFNBMDFemoral Neck BMDTSBMDTotal Spine BMDTHBMDTotal Hip BMDSNPsSingle Nucleotide PolymorphismsGWASGenome Wide Association Study ER -