PT - JOURNAL ARTICLE AU - Rooney, James AU - Böse-O’Reilly, Stephan AU - Rakete, Stefan TI - Total correlation explanation of toxic metal concentrations and physiological biomarkers amongst NHANES participants AID - 10.1101/2021.09.30.21264332 DP - 2021 Jan 01 TA - medRxiv PG - 2021.09.30.21264332 4099 - http://medrxiv.org/content/early/2021/10/03/2021.09.30.21264332.short 4100 - http://medrxiv.org/content/early/2021/10/03/2021.09.30.21264332.full AB - Introduction Unravelling the health effects of multiple pollutants presents scientific and computational challenges. CorEx is an unsupervised learning algorithm that can efficiently discover multiple latent factors in highly multivariate datasets. Here, we used the CorEx algorithm to perform a hypothesis free analysis of demographic, biochemical, and toxic metal biomarker data.Methods Our data included 77 variables from 2,750 adult participants of the National Health and Nutrition Examination Survey (NHANES 2015-2016). We used an implementation of the CorEx algorithm designed to deal with the features of bioinformatic datasets including mixed data-types. Models were fit for a range of possible latent variables and the best fit model was selected as that which resulted in the largest Total Correlation (TC) after adjustment for the number of parameters. Successive layers of CorEx were run to discovered hierarchical data structure.Results The CorEx algorithm identified 20 variable clusters at the first layer. For the majority clusters, the associations between variables were consistent with known associations – e.g. gender and the hormones, estradiol and testosterone were included in the first cluster; blood organic mercury and blood total mercury were grouped in cluster 4, and cluster 6 included the liver function enzymes ALT, AST and GGT. At the second layer, 3 branches of were identified reflecting hierarchical structure. The first branch included numerous physiological biomarkers and several exogenous biomarkers. The second branch included a number endogenous and exogenous variables previously associated with hypertension, while the third branch included mercury biomarkers and some related endogenous biomarkers.Discussion We have demonstrated the CorEx algorithm as a useful tool for hypothesis free exploration of a biomedical dataset. This work extends previous implementations of CorEx by allowing mixed data-types to be modelled and the results showed that CorEx detected meaningful hierarchical structure. CorEx may facilitate exploration of novel datasets in future.Competing Interest StatementThe authors have declared no competing interest.Funding StatementJR was supported by European Union Horizon 2020 programme under the Marie Sklodowska-Curie grant agreement No 846794. The funders had no role in the research or interpretation of results.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The US National Health and Nutrition Examination Survey (NHANES) is a continuous and longitudinal survey since 1999. It received ethical approval from the NCHS Research Ethics Review Board (ERB) Approval (https://www.cdc.gov/nchs/nhanes/irba98.htm)All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesNHANES data is available from the NHANES website: https://wwwn.cdc.gov/nchs/nhanes/Default.aspx This manuscript analysis code is available here: https://github.com/jpkrooney/NHANESmetals_corex_Analysis https://wwwn.cdc.gov/nchs/nhanes/Default.aspx https://github.com/jpkrooney/NHANESmetals_corex_Analysis