## Summary

**Background** Artificial intelligence (AI) analytics have not been applied to global burden of disease (GBD) risk factor data to study population health. The comparative risk assessment (CRA) systematic literature review-based methodology for population attributable fractions (PAFs in percent’s) calculations has not been utilised for quantifying dietary and other risk factors for body mass index kg/M^{2} (BMI**)**.

**Methods** Institute of Health Metrics and Evaluation (IHME) staff and volunteer collaborators analysed over 12,000 GBD risk factor surveys of people from 195 countries and synthesized the data into representative mean cohort BMI and risk factor values. We formatted IHME GBD data relevant to BMI and associated risk factors. We empirically explored the univariate and multiple regression correlations of BMI risk factors with worldwide BMI to derive a BMI multiple regression formula (BMI formula). Main outcome measures included the performances of the BMI formula when tested with all nine Bradford Hill causality criteria each scored on a 0-5 scale: 0=negative to 5=very strong support.

**Findings** The BMI formula derived, with all foods in kilocalories/day (kcal/day), BMI formula risk factor coefficients were adjusted to equate with their PAFs. BMI increasing foods had “+” signs and BMI decreasing foods “-” signs. Total BMI formula PAF=80.96%. BMI formula=(0.37%*processed meat + 4.23%*red meat + 0.02%*fish + 2.24%*milk + 5.67%*poultry + 1.77%*eggs + 0.34%*alcohol + 0.99%*sugary beverages + 0.04%*corn + 0.72%*potatoes + 8.48%*saturated fatty acids + 3.89%*polyunsaturated fatty acids + 0.27%*trans fatty acids - 2.99%*fruit - 4.07%*vegetables - 0.37%*nuts and seeds - 0.45%*whole grains - 1.49%*legumes - 8.62%*rice - 0.10%*sweet potatoes - 7.45% physical activity (METs/week) - 20.38%*child underweight + 6.02%*sex (male=1, female=2))*0.05012 + 21.77. BMI formula versus BMI: r=0.907, 95% CI: 0.903 to 0.911, *p*<0.0001. Bradford Hill causality criteria test scores (0-5): (1) strength=5, (2) experimentation=5, (3) consistency=5, (4) dose-response=5, (5) temporality=5, (6) analogy=4, (7), plausibility=5, (8) specificity=5, and (9) coherence=5. Total score=44/45.

**Interpretation** Nine Bradford Hill causality criteria strongly supported a causal relationship between the BMI formula derived and mean BMIs of worldwide cohorts. The artificial intelligence methodology introduced could inform individual, clinical, and public health strategies regarding overweight/obesity prevention/treatment and other health outcomes.

**Funding** None

**Evidence before this study** Comparative risk assessment (CRA) systematic literature review-based methodology has been used in worldwide global burden of disease (GBD) analysis to determine population attributable fraction(s) (PAF(s)) for one or more risk factors for various health outcomes. So far, CRA has not been applied to derive PAFs for dietary and other risk factors for worldwide BMI. Artificial intelligence (AI) analytics has not yet been applied to worldwide GBD data as an alternative to the CRA methodology for determining risk factor PAFs for health outcomes.

**Added value of this study□** A multiple regression derived BMI formula (BMI formula) including PAFs of 20 dietary risk factors, physical activity, childhood severe underweight, and sex satisfied all nine Bradford Hill causality criteria. The BMI formula also plausibly predicted the long-term BMI outcomes related to various dietary and physical activity scenarios. All the BMI formula’s 24 risk factor PAFs were consistent in sign (+ or -) with the preponderance of previously published studies on those risk factors related to BMI.

**Implications of all the available evidence** The AI analytics methodology of GBD data modeling of BMI and associated risk factors infers causality of the BMI formula estimates with BMI worldwide and BMIs of subsets. This methodology may enable multiple regression formulas for risk factors of health outcomes for a range of non-communicable diseases—testable by Bradford Hill causality criteria.

## Introduction

Comparable risk assessment (CRA) methodology by the World Health Organization and the Institute of Health Metrics and Evaluation (IHME) is defined as, “the systematic evaluation of the changes in population health which result from modifying the population distribution of exposure to a risk factor or a group of risk factors.”^{1, 2} While CRA methodologies have used body mass index kg/M^{2} (BMI) in risk factor—health outcome pairs for numerous health outcomes (e.g., BMI-ischemic heart disease, BMI-colon cancer, BMI-type 2 diabetes, etc.), CRA has not been attempted to quantify the largely nutritional risk factors related to BMI itself.

Influential Stanford University meta researcher, Dr. John Ioannidis provided a likely reason for this in an editorial in the *BMJ* titled, “Implausible results in human nutrition research.” He said, “definitive solutions won’t come from another million observational papers or small randomized trials.”^{3} Dr. Ioannidis called for radical reform of nutritional epidemiology methodologies used to influence food/agricultural policies and to produce dietary guidelines for clinicians and the public.^{4}

According to IBM Cloud Education, “At its simplest form, artificial intelligence (AI) is a field, which combines computer science and robust datasets, to enable problem-solving.”^{5} AI will be used instead of the CRA methodology in this paper.

Satisfying Bradford Hill causality criteria is considered validating in epidemiological research.^{6} We used Bradford Hill causality criteria to test the BMI formula. We hypothesized that AI analytics with GBD big data would be able to quantify the determinants of worldwide cohort BMIs and thereby to better our understanding the dietary and other risk factors driving the obesity epidemic. We hope this paper answers Dr. Ioannidis’ call for innovation in epidemiology, particularly nutritional epidemiology.

## Methods

As volunteer collaborators with the IHME we received raw GBD ecological data (≈1.4 Gigabytes) on mean BMIs of male and female cohorts 15-49 years old and 50-69 years old from each year 1990-2017 from 195 countries and 365 subnational locations (n=1120 cohorts). We also utilised GBD data on exposures to 32 risk factors and covariates potentially related to BMI.

Food risk factors came from surveys of individuals as g/day. IHME dietary covariate data originally came from Food and Agriculture Organisation data on animal and plant food commodities available percapita in countries worldwide—as opposed to consumption data from participant interviews.^{7} Supplementary Table 1 lists the relevant GBD risk factors, covariates, and other available variables with definitions of those risk factor exposures.^{8}

GBD worldwide citations of over 12,000 surveys constituting ecological data inputs for this analysis are available from IHME.^{7} The main characteristics of IHME GBD data sources for BMI, the protocol for the GBD study, and all risk factor values have been published by IHME GBD data researchers and discussed elsewhere.^{9-12} These included detailed descriptions of categories of input data, potentially important biases, and methodologies of analysis. We did not clean or pre-process any of the GBD data. GBD cohort risk factor and BMI data from the IHME had no missing records. The updated 2019 raw data with variables we used for this analysis may be obtained by volunteer researchers collaborating with IHME.^{13}

To maximally utilise the available data, we averaged the values for ages 15-49 years old together with 50-69 years old for BMI and for each risk factor exposure for each male and female cohort for each year. Finally, for each male and female cohort, data from all 28 years (1990-2017) on mean BMI and on each of the risk factor exposures were averaged using the computer software program R.

To weigh the country and subnational data according to population, internet searches (mostly Wikipedia) yielded the most recent population estimates for countries and subnational states, provinces, and regions. World population data from the World Bank and the Organisation for Economic Co-operation and Development could not be used because they did not include all 195 countries or any subnational data.

Using the above-described formatted dataset of risk factors, covariates, and BMIs, a software program in R generated a population-weighted analysis dataset. Each male or female cohort in the population-weighted analysis dataset represented approximately 1 million people (range: < 100,000 to 1.5 million). The analysis dataset had n=7846 cohorts (rows of data), half male and half female, representative of over seven billion people.

Supplementary Table 2 details how we converted omega-3 fatty acid g/day to fish g/day using data on the omega-3 fatty acid content of frequently eaten fish from the National Institutes of Health Office of Dietary Supplements (USA).^{14} As shown in Supplementary Table 3, we converted all of the animal and plant food data, including alcohol and sugary beverage consumption, from g/day to kcal/day. For the g to kcal conversions, we used the Nutritionix track app,^{15} which tracks types and quantities of foods consumed. Saturated fatty acids (SFA) risk factor (0-1 portion of the entire diet) was not available with GBD data from 2017, so GBD SFA risk factor data from 2016 was used. Polyunsaturated fatty acid (PUFA) and trans fatty acid (TFA) GBD risk factor data from 2017 (0-1 portion of the entire diet) were also utilised. These fatty acid data were converted to kcal/day by multiplying the fatty acid 0-1 portion of the entire diet by the kcal/day available for each cohort.

### Statistical methods

To determine the strengths of the risk factor correlations with mean BMIs of population weighted worldwide cohorts (7846 cohorts) or subgroups of cohorts (e.g., continents, socio-demographic quartiles, etc.), we utilised Pearson correlation coefficients: r, 95% confidence intervals (95% CIs), and *p* values.

In this first attempt ever to use AI analytics with GBD data, we determined our methodology for deriving a BMI formula as we proceeded by experimenting with strategies to optimise the functioning of candidate BMI formulas. We

included as many as possible of the available dietary variables,

combined dietary variables if appropriate, and

included physical activity and other plausibly informative variables.

Appendix 1 explains the use of Bradford Hill causality criteria to assess whether the risk factors in the BMI formula derived accurately modeled the worldwide mean BMI. Briefly, we tested the BMI formula output with the Bradford Hill causality criteria (1) strength, (2) experimentation, (3) consistency, (4) dose response, (5) temporality, (6 analogy, (7) plausibility, (8) specificity, and (9) coherence. For each criterion, we used a 0-5 scale to assess the magnitude of support of the BMI formula output being causally related to the BMIs worldwide (0=no support of causality to 5=very strong support of causality, total possible points=45.

In determining the variables to include and exclude in worldwide BMI formula, we set the statistical threshold for a variable to enter and to remain in the formula at *p*<0.25. We used SAS and SAS Studio statistical software 9.4 (SAS Institute, Cary, NC) for the data analysis.

## Results

Table 1 shows the basic statistics and univariate correlations of mean cohort BMI worldwide with mean dietary and other risk factor values. Whole grains, legumes, rice, and sweet potatoes negatively correlated with BMI. We designated them, “BMI decreasing foods.” The six animal foods (processed meat, red meat, fish, milk, poultry, and eggs), alcohol, sugary beverages, corn availability, potato availability, added SFA, added PUFA, and added TFA all positively correlated with BMI. We designated them, “BMI increasing foods.”

In Supplementary Table 4, which shows BMI risk factor to risk factor correlations, corn availability (kcal/day percapita, a covariate) correlated moderately strongly with sugary beverages (r=0.419, 95% CI 0.400 to 0.437, *p*<0.0001), suggesting that high fructose corn syrup may have accounted for the unexpected positive correlation of corn with BMI.

The strong positive correlations of fruits, vegetables, and nuts and seeds with BMI (Table 1) were unexpected. These findings suggested likely multicollinearity of fruits, vegetables, and nuts and seeds with BMI increasing foods (when an independent variable is highly correlated with one or more known or unknown other independent variables). Supplementary Table 4 shows that fruits, vegetables and nuts and seeds strongly positively correlated with 10 of the 13 dietary variables that positively correlated with BMI (six animal-based foods, alcohol, SFA, PUFA, and TFA). Fruits, vegetables and nuts and seeds negatively correlated with the four other plant-based foods (whole grains, legumes, rice, and sweet potatoes), except fruits was not significantly correlated with whole grains. Together, these findings strongly suggested that multicollinearities accounted for the positive correlations of fruits, vegetables, and nuts and seeds with BMI.

Supplementary Table 5 shows the Excel spreadsheet calculations, which were coordinated with SAS multiple regressions involved in the derivation of the worldwide BMI formula. Appendix 2 detailed the steps in the multiple regression formula derivation of BMI (dependent variable) versus BMI risk factors (independent variables). As seen in Supplementary Table 5 and described in Appendix 2, the BMI formula versus BMI variance (R^{2}=0.8096) * 100 determined the total PAF (80.96%) accounted for by the BMI formula. The 23 risk factor coefficients could then be equated to the PAFs of those individual risk factors. The resulting BMI formula is shown below (all foods in kcal/day):

### BMI formula output analysed by Bradford Hill causality criteria

As mentioned in the methods and detailed in Appendix 1, the nine Bradford Hill causality criteria tested the functionality of the BMI formula:

Strength score=5 for r>0.500,

*p*<0.0001—The BMI formula’s output regressed with mean BMI of worldwide cohorts gave r= 0.907 (95% CI: 0.903 to 0.911)*p*<0.0001), R^{2}=0.8096, total percent weight=80.96%.Experiment score=5 for all 20 bootstrap trial BMI formulas with r>0.500,

*p*<0.0001)— Table 2 shows bootstrapping the BMI formula related to mean worldwide BMI with 20 trials (repeated sampling from the worldwide analysis dataset with replacements^{16}). Each trial had 100 randomly selected cohorts to generate a unique BMI formula. As Table 2 shows, the mean values for BMI and the risk factor PAFs were all quite close to the mean values for BMI and BMI risk factors in the worldwide BMI formula (last column).View this table:Consistency score=5 for the mean of absolute differences of BMI formula outputs and mean BMI < 0.300 kg/M

^{2}—In Table 3, we used 37 subsets of worldwide BMI and risk factor data to test the consistency of BMI formula outputs, utilising the 20 bootstrap trials (#2 Experiment) to generate 80% confidence intervals for each subset. Table 3 shows the average absolute difference between the 37 subgroups mean BMIs and BMI formula outputs averaged 0.252 kg/M^{2}.View this table:Dose-response (Biological gradient) score=5 for the mean of absolute differences of BMI formula outputs and mean BMI for the dose-response quartiles< 0.300 kg/M

^{2}—Table 3 also shows that the mean of the BMI absolute differences between the BMI formula estimates and mean BMIs in the four dose-response quartiles averaged 0.220 kg/M^{2}.Temporality score=5 for the BMI trend formula versus BMI trend r>0.500,

*p*<0.0001—Table 4 shows the mean trends the r values and the R^{2}values for each of the 22 risk factors. As is shown in Supplementary Table 6 and detailed in Appendix 1, a multiple regression analysis with the worldwide BMI trend 1990-2007 (dependent variable) versus 22 of the 23 risk factor mean trends and R^{2}values from Table 4 (independent variables excluding sex) generated a BMI trend formula (All foods are in Kcal/day.):View this table:As with the BMI formula, the coefficients of the BMI risk factor trends formula were equated to their PAFs. BMI trend formula r=0.592 (H24 in Supplementary Table 6), 95% CI: 0.577 to 0.606,

*p*<0.0001, and total PAF=35.02%.Analogy score=4—The three metabolic risk factors correlated with BMI to test analogy were systolic blood pressure (SBP), low density lipoprotein cholesterol (LDL-C), and fasting plasma glucose (FPG).

BMI correlated with SBP: r=0.102, 95% CI: 0.080 to 0.124,

*p*<0.0001.BMI correlated with FPG: r=0.558, 95% CI: 0.542 to 0.573,

*p*<0.0001.BMI correlated with LDL-C: r=0.756, 95% CI: 0.746 to 0.765,

*p*<0.0001. Since BMI versus SBP r<0.500, Bradford Hill causality score=4.

Plausibility: Score=5—Based on systematic medical literature reviews, physical activity inversely correlated with BMI,

^{17}and BMI directly correlated with intakes of sugary beverages,^{18}alcohol,^{19}and animal foods.^{20}The relationship of adult BMI with early childhood severe underweight has not been reported worldwide. Since people in poor countries with high infant/childhood malnutrition have fewer resources to obtain animal foods, fatty acids, and alcohol and more need for physical exercise than in developed countries, it is highly plausible that childhood severe underweight negatively correlated with lower BMI in adulthood.Specificity: Score=5 for the BMI formula being unique—the BMI formula was specific to worldwide BMI and would have been different from risk factor formulas modeling SBP, FPG, LDL-C, or any health outcome.

Coherence: Score=5—As evidenced by the near perfect score 39/40 on the first eight criteria, the BMI formula accurately modeled worldwide mean BMI—total causality criteria score: 44/45.

Table 5 shows BMI formula estimates for various patterns of diet and/or other BMI formula risk factors. For instance, increasing physical activity by adding a run for 1 hour/day at six miles/hour on average along with decreasing BMI increasing foods by isocalorically shifting 25% of BMI increasing foods (Kcal/day) to BMI decreasing foods was projected to reduce the mean cohort BMI from 26.66 kg/M^{2} to mean BMI=22.26 kg/M^{2}.

## Discussion

This paper’s analysis dataset formatted from GBD worldwide data used with the BMI formula enabled quantifying 23 risk factor PAFs for worldwide mean BMI for cohorts of 15-69-year-old people. This quantitatively rigorous methodology satisfied all nine Bradford Hill criteria for showing the BMI formula derived accurately modeled BMI.

The BMI formula PAFs are quite consistent with a 20-year-long observational study with 120,877 US male and female adults reported by Mozaffarian, et. al. They measured weight change per four-year period per 1.00 increased serving per day of various foods. It found the following foods associated with increased weights over four years: potato chips (0.77 kg), potatoes (0.58 kg), sugar-sweetened beverages (0.45 kg), unprocessed red meats (0.43 kg), and processed meats (0.42 kg). With the same methodology, the following foods associated with decreased weights: vegetables (−0.10 kg), whole grains (−0.17 kg), fruits (−0.22 kg), nuts (−0.26 kg), and yogurt (−0.37 kg) (*p*≤0.005 for each comparison).^{21}

According to the International Potato Center, potato availability (a covariate), which positively correlated with BMI, included ≥50% highly processed potato products worldwide.^{22} This likely partially accounted for the positive correlation of potatoes with BMI. Higher intakes of ultra-processed foods overall have been associated with overweight/obesity.^{23} Ultra-processed food intakes have increased dramatically in developed countries in recent decades and are now increasing in middle-income countries.^{24}

With the results of this paper consistent with and adding to previous findings, we should consider including the quantification of BMI risk factors demonstrated here into US Department of Health and Human Services (USDHHS)/ US Department of Agriculture (USDA) health policy discussions and clinical nutrition guideline deliberations. For instance, the Supplemental Nutrition Assistance Program (SNAP—formerly Food Stamps) spent an estimated 22.6% of its $73 billion/year budget in 2019^{25} on payments to low-income Americans for “sweetened beverages, prepared desserts, salty snacks, candy, and sugar.”^{26} Additionally, the USDA has subsidised crops that go primarily for animal feed or that are processed into sugars while not subsidising fruits and vegetables.^{27} While the USDA recognises the relatively low intake of fruits and vegetables in the USA and sponsors a publicity campaign to increase fruits and vegetables consumption,^{28} USDA expenditures should promote reduced prices of BMI decreasing foods and increased prices of BMI increasing foods.

A review of the literature on food costs relative to nutrient quality found that the median costs of fruits and vegetables (€0.82/100 kcal) were quite high relative meat/eggs/fish (€0.64/100 kcal), fresh dairy (€0.32/100 kcal), nuts (€0.25/100 kcal), and starches (€0.14/100 kcal).^{29} The economics of food, including the high cost of fruits and vegetables relative to animal foods, starchy potatoes, and sugary beverages, may contribute to overweight and obesity in the USA and in other affluent countries.

Following a low-carbohydrate, high-fat diet has been demonstrated to cause modest short term weight loss in obese people,^{30} However, the BMI formula projects that the long-term effect of a low-carb, high fat diet would be a mean cohort BMI of 29.05 kg/M^{2} 80% CI: 27.92 to 29.54 kg/M^{2} (Table 5).

### Limitations

The GBD data on animal foods, plant foods, alcohol, and fatty acids were not comprehensive and comprised only 1191.4 Kcal/day on average worldwide. Subnational data were available on only four countries. Because the data formatting and statistical methodology were new, this was necessarily a post hoc analysis and no pre-analysis protocol or Bradford Hill causality criteria scoring system was possible. As detailed in the Foresight Report on obesity,^{31} obesity is affected by a complex system of interacting factors besides diet, physical activity, and childhood feeding patterns, and sex. So genes,^{32} gut microbiome,^{33} ultra-processing of food,^{34, 35} and other influences on BMI were outside of the purview of this analysis.

### Generalisability

Over 70% of the cohorts in analysis dataset had low to moderate BMI levels (BMI < 23 kg/M^{2}), so the results could be further refined for relatively high SDI countries. For results more applicable to high BMI countries, BMI formulas might also be derived from (1) the four countries that have subnational data (UK, USA, Mexico, and Japan n=730 cohorts, mean BMI=24.85 kg/M^{2}) (2) the cohorts with mean BMI ≥ 23 kg/M^{2} (n=2256 cohorts, mean BMI=24.84 kg/M^{2}), or (3) high SDI countries (n=1926 cohorts, mean BMI=24.49 kg/M^{2}).

## Conclusions

Nine Bradford Hill causality criteria strongly supported that the BMI formula derived with 23 risk factors in proportion to their PAFs accurately modeled worldwide cohort mean BMI. While this study dealt only with dietary and other available risk factors for BMI (overweight/obesity), the AI methodology introduced could easily apply to estimating PAFs of multiple dietary and other risk factors that pertain to dozens of non-communicable disease health outcomes, for which the IHME have GBD data.

## Data Availability

The raw, unformatted data used in this analysis is now out of date. The 2019 GBD data on all the variables in this analysis may be obtained from the IHME by volunteer collaborating researchers. The formatted database, SAS codes and Excel spreadsheets on which this analysis is based are posted on the Mendeley data repository.

https://data.mendeley.com/v1/datasets/publish-confirmation/g6b39zxck4/6

## Article Information

### Corresponding Author

David K. Cundiff, Independent researcher, davidkcundiff{at}gmail.com

### contributions

DKC acts as guarantor; conceived and designed the study, acquired and analysed the data, interpreted the study findings, drafted the manuscript, critically reviewed and edited the manuscript and tables, and approved the final version for publication.

CW designed software programs in R to format and population weight the data, aided with the SAS statistical analysis, critically reviewed the manuscript, and approved the final version for publication.

### Conflict of Interest Disclosures

None reported. Both authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

### Funding/Support

This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. The Bill and Melinda Gates Foundation funded the acquisition of the data for this analysis by the IHME. The data were provided to the authors as volunteer collaborators with IHME.

### Role of the Funder/Sponsor

While IHME GBD faculty and staff by virtue of the Bill and Melinda Gates Foundation grants provided the raw data for this analysis, they did not vet the analysis or sponsor the manuscript.

### Additional Contributors

Martin Sebera, from the Department of Kinesiology, Faculty of Sports Studies Masaryk University, Czech Republic, critiqued statistical aspects of the manuscript and provided useful input. Pavel Grasgruber, from Masaryk University, Czech Republic, provided suggestions after reviewing the manuscript. We thank Scott Glenn and Brent Bell from IHME who supplied us with the GBD risk factor exposure data for the risk factors and for BMI data.

### Data sharing statement

The raw, unformatted data used in this analysis is now out of date. The 2019 GBD data on all the variables in this analysis may be obtained from the IHME by volunteer collaborating researchers. The formatted database, SAS codes and Excel spreadsheets on which this analysis is based are posted on the Mendeley data repository: https://data.mendeley.com/datasets/g6b39zxck4/6

### STROBE checklist

This report follows the STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) guidelines for reporting global health estimates.^{36}

## Footnotes

Title changed, manuscript revised and edited along with the tables.