PT - JOURNAL ARTICLE AU - Qing Wu AU - Fatma Nasoz AU - Jongyun Jung AU - Bibek Bhattarai AU - Mira V Han TI - Machine Learning Approaches for the Prediction Bone Mineral Density by using genomic and phenotypic data of 5,130 older Men AID - 10.1101/2020.01.20.20018143 DP - 2020 Jan 01 TA - medRxiv PG - 2020.01.20.20018143 4099 - http://medrxiv.org/content/early/2020/05/19/2020.01.20.20018143.short 4100 - http://medrxiv.org/content/early/2020/05/19/2020.01.20.20018143.full AB - Background The study aimed to utilize machine learning (ML) approaches and genomic data to develop the prediction model for bone mineral density (BMD), and to identify the best modeling approach for BMD prediction.Method The genomic and phenotypic data of Osteoporotic Fractures in Men Study (n=5,130), was analyzed. Genetic risk score (GRS) was calculated from 1,103 associated SNPs for each participant after a comprehensive genotype imputation. Data were normalized and divided into a training set (80%) and a validation set (20%) for analysis. Random forest, gradient boosting, neural network, and linear regression were used to develop prediction models for BMD separately. The 10-fold cross-validation was used for hyperparameter optimization. Mean square error and mean absolute error were used to assess model performance. Results: When using GRS and phenotypic covariates as the predictors, the performance of all ML models and linear regression in BMD prediction is similar. However, when replacing GRS with the 1,103 individual SNPs in the model, ML models performed significantly better than linear regression, and the gradient boosting model performed the best. Conclusion: Our study suggested that ML models, especially gradient boosting, can improve BMD prediction in genomic data.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThe research and analysis described in the current publication were supported by a grant from the National Institute of General Medical Sciences (P20GM121325), and a grant from the National Institute on Minority Health and Health Disparities of the National Institutes of Health (R15MD010475). The funding sponsors were not involved in the analysis design, genotype imputation, data analysis, interpretation of the analysis results, or the preparation, review, or approval of this manuscript.Author DeclarationsAll relevant ethical guidelines have been followed; any necessary IRB and/or ethics committee approvals have been obtained and details of the IRB/oversight body are included in the manuscript.YesAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe Osteoporotic Fractures in Men Study (MrOS) was used as the data source for this study. MrOS is a federal funded prospective, cohort study which was designed to investigate anthropometric, lifestyle, and medical factors associated with bone health in older, community‐dwelling men. Details of the MrOS study design, recruitment, and baseline cohort characteristics have been reported12 elsewhere. With the approval of the institutional review board at the University of Nevada, Las Vegas and National Institute of Health (NIH), the genotype and phenotype data of MrOS was acquired from dbGaP (Accession: phs000373.v1.p1). The data/analyses presented in the current publication are based on the use of study data downloaded from the dbGaP web site, under phs000373.v1.p1 (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000373.v1.p1).AbbreviationsMrOS:Osteoporotic Fractures in Men StudyBMD:Bone Mineral DensityML:Machine LearningGRS:Generic Risk ScoreLR:Linear RegressionRF:Random ForestGB:Gradient BoostingNN:Neural NetworkSNPs:Single Nucleotide PolymorphismsGWAS:Genome-Wide Association Study