Abstract
Excess accumulation of liver fat – termed hepatic steatosis when fat accounts for > 5.5% of liver content – is a leading risk factor for end-stage liver disease and is strongly associated with important cardiometabolic disorders. Using a truth dataset of 4,511 UK Biobank participants with liver fat previously quantified via abdominal MRI imaging, we developed a machine learning algorithm to quantify liver fat with correlation coefficients of 0.97 and 0.99 in hold-out testing datasets and applied this algorithm to raw imaging data from an additional 32,192 participants. Among all 36,703 individuals with abdominal MRI imaging, median liver fat was 2.2%, with 6,250 (17%) meeting criteria for hepatic steatosis. Although individuals afflicted with hepatic steatosis were more likely to have been diagnosed with conditions such as obesity or diabetes, a prediction model based on clinical data alone without imaging could not reliably estimate liver fat content. To identify genetic drivers of variation in liver fat, we first conducted a common variant association study of 9.8 million variants, confirming three known associations for variants in the TM6SF2, APOE, and PNPLA3 genes and identifying five new variants associated with increased hepatic fat in or near the MARC1, ADH1B, TRIB1, GPAM and MAST3 genes. A polygenic score that integrated information from each of these eight variants was strongly associated with future clinical diagnosis of liver diseases. Next, we performed a rare variant association study in a subset of 11,021 participants with gene sequencing data available, identifying an association between inactivating variants in the APOB gene and substantially lower LDL cholesterol, but more than 10-fold increased risk of steatosis. Taken together, these results provide proof of principle for the use of machine learning algorithms on raw imaging data to enable epidemiologic studies and genetic discovery.
Competing Interest Statement
J.P.P. has served as a consultant for Maze Therapeutics. R.L. serves as a consultant or advisory board member for Arrowhead Pharmaceuticals, AstraZeneca, Boehringer-Ingelheim, Bristol-Myer Squibb, Celgene, Cirius, CohBar, Galmed, Gemphire, Gilead, Glympse bio, Intercept, Ionis, Inipharma, Merck, Metacrine, Inc., NGM Biopharmaceuticals, Novo Nordisk, Pfizer, and Viking Therapeutics. In addition, his institution has received grant support from Allergan, Boehringer-Ingelheim, Bristol-Myers Squibb, Eli Lilly and Company, Galmed Pharmaceuticals, Genfit, Gilead, Intercept, Janssen, Madrigal Pharmaceuticals, NGM Biopharmaceuticals, Novartis, Pfizer, pH Pharma, and Siemens. He is also co-founder of Liponexus, Inc. J.R.H and A.Y.Z. are employees of Color Genomics. K.E.C. serves on the advisory boards of Novo Nordisk and BMS, has consulted for Gilead and has received grant funding from BMS, Boehringer-Ingelheim and Novartis. T.G.S. has served as a consultant for Aetion. A.P. is employed as a Venture Partner at GV, a venture capital group within Alphabet; he is also supported by a grant from Bayer AG to the Broad Institute focused on machine learning for clinical trial design. S.N.F and P.B. are supported by grants from Bayer AG and IBM applying machine learning in cardiovascular disease. P.B. has served as a consultant to Novartis. P.T.E. is supported by a grant from Bayer AG to the Broad Institute focused on the genetics and therapeutics of cardiovascular diseases. P.T.E. has also served on advisory boards or consulted for Bayer AG, Quest Diagnostics, MyoKardia and Novartis. A.V.K. has served as a consultant to Sanofi, Medicines Company, Maze Pharmaceuticals, Navitor Pharmaceuticals, Verve Therapeutics, Amgen, and Color; received speaking fees from Illumina, MedGenome, and the Novartis Institute for Biomedical Research; received a sponsored research agreement from the Novartis Institute for Biomedical Research, and reports a pending patent related to a genetic risk predictor (20190017119).
Funding Statement
This research has been conducted using the UK Biobank resource, application 7089. Funding support was provided by NIH grants 1K08HG010155 (to A.V.K.) from the National Human Genome Research Institute, 1R01HL092577, R01HL128914, K24HL105780 (to P.T.E), R01HL071739 (to M.B.) from the National Heart, Lung and Blood Institute, 5P42ES010337 (to R.L.) from the National Institute of Environmental Health Sciences, 5UL1TR001442 (to R.L.) from the National Center for Advancing Translational Sciences, R01DK106419, P30DK120515 (to R.L.), K23 DK122104 to (to T.G.S.) from the National Institute of Diabetes and Digestive and Kidney Diseases, CA170674P2 (to R.L.) from the Department of Defense's Peer Reviewed Cancer Research Program, a Hassenfeld Scholar Award from Massachusetts General Hospital (to A.V.K.), a Merkin Institute Fellowship from the Broad Institute of MIT and Harvard (to A.V.K.), a John S LaDue Memorial Fellowship (to J.P.P.) a sponsored research agreement from IBM Research (to A.P., A.V.K.), American Association for the Study of Liver Diseases Foundation Clinical and Translational Research Awards (to V.A. and T.G.S.). MESA and the MESA SHARe projects are conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with MESA investigators. Support for MESA is provided by contracts 75N92020D00001, HHSN268201500003I, N01-HC-95159, 75N92020D00005, N01-HC-95160, 75N92020D00002, N01-HC-95161, 75N92020D00003, N01-HC-95162, 75N92020D00006, N01-HC-95163, 75N92020D00004, N01-HC-95164, 75N92020D00007, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, UL1-TR-000040, UL1-TR-001079, UL1-TR-001420, and supported in part by the National Center for Advancing Translational Sciences, CTSI grant UL1TR001881, and the National Institute of Diabetes and Digestive and Kidney Disease Diabetes Research Center (DRC) grant DK063491 to the Southern California Diabetes Endocrinology Research Center. The authors thank the other investigators, the staff, and the participants of the MESA study for their valuable contributions. A full list of participating MESA investigators and institutions can be found at http://www.mesa-nhlbi.org.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The UK Biobank study was approved by the Research Ethics Committee (reference 16/NW/0274) and informed consent was obtained from all participants. Analysis of UK Biobank data was conducted under application 7089 and was approved by the Mass General Brigham institutional review board (protocol 2013P001840). Framingham Heart Study and MESA genotype and phenotype data were retrieved for analysis from NCBI dbGAP under procedures approved by the Mass General Brigham institutional review board (protocol 2016P002395). Mass General Brigham Biobank participants each provided written informed consent and analysis was approved by the Mass General Brigham institutional review board.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
Summary statistics for the liver fat CVAS, as well as the machine learning model architectures and learned weights will be available at the Cardiovascular Disease Knowledge Portal (http://broadcvdi.org/home/portalHome) and the ML4CVD modeling framework will be available via GitHub repository at time of publication.