## Abstract

**Objectives** (1) Derive a worldwide body mass index (BMI) multiple regression formula (BMI formula), (2) Quantify the percent weights attributable to dietary and other risk factors from this BMI formula, and (3) Test relevant dietary guidelines and diets using the BMI formula generated BMI estimates

**Design** BMI and risk factor cohort data from 1990-2017 from the Institute of Health Metrics and Evaluation (IHME) were formatted and population-weighted. We empirically explored the univariate and multiple regression correlations of risk factors with BMI to maximize BMI formula functionality.

**Setting** Worldwide

**Ecological cohort data** Over 12,000 Global Burden of Disease (GBD) risk factor surveys of people 15-69 years old from 195 countries analysed and synthesised into representative cohort risk factor values by staff and volunteer collaborators with the IHME

**Main outcome measures** Performance of a worldwide BMI formula when tested with eight Bradford Hill causality criteria, each scored with 0-5 scale: 0=negative to 5=very strong support (40 possible score)

**Results** In the BMI formula derived, all foods were expressed in kilocalories/day (kcal/day), and all risk factor coefficients were adjusted to equate with their percent weight impacts on BMI. BMI formula =0.29*processed meat +11.87*red meat +0.02*fish +3.74*milk +11.09*poultry +5.46*eggs +1.95*alcohol +6.29*sugary beverages +0.29*corn +3.53*potatoes +2.71 * saturated fatty acids+ 0.64*polyunsaturated fatty acids+0.06*trans fatty acids-0.37*fruit - 0.52*vegetables -0.03*nuts and seeds -0.12 *whole grains -0.47*legumes-8.26*rice -18*sweet potatoes -9.93*physical activity (METs/week) +5.78*total kcals/day available-4.13*child underweight+0.92*discontinued breast feeding. BMI increasing foods have + signs in the BMI formula and BMI decreasing foods have - signs. Bradford Hill causality criteria test scores (0-5): strength=5, (2) consistency=5, (3) dose-response=5, (4) temporality=5, (5) analogy=4, (6) plausibility=5, (7) experimentation=5 and (8) coherence=5. Total=39/40. Based on the USA BMI trend data, the mean adult BMI in 1990 was 25.45, is projected to be 28.13 in 2020, and 31.81 in 2050. BMI formula based mean BMI estimates included the following: Dietary Guidelines for Americans=26.34, USA with 50% reduction of BMI increasing food intake =23.34, USA with 25% reduction of BMI increasing food intake and increasing physical activity by running at 6 mph for 1 hour/day on average=23.67, vegetarian diet=22.54, low carbohydrate, high fat diet=31.76.

**Conclusions** Eight relevant Bradford Hill causality criteria strongly supported that the BMI formula derived was causally related to mean BMIs of worldwide cohorts, subgroups, and individual risk factor patterns. The statistical analysis methodology introduced could inform individual, clinical, and public health strategies regarding overweight/obesity prevention/treatment and other health outcomes.

## Introduction

The scientific validity of the Dietary Guidelines for Americans for 2015-2020 was challenged by Journalist Nina Teicholz in the *BMJ*.^{1} The Center for Science in the Public Interest called for the *BMJ* to retract the article. The peer reviewers selected to adjudicate the far reaching dispute concluded that, “Teicholz’s criticisms of the methods used by Dietary Guidelines for Americans Committee are within the realm of scientific debate.”^{2} Regarding scientific study of treatments of overweight/obesity, *JAMA* published a representative randomised trial comparing a low fat diet with a low carbohydrate diet. It reported no difference in the very modest weight loss achieved at one year.^{3} This suggests that current short term dietary intervention trials (< 5 years) are meaningless in determining the long-term relationship of diet (> 20 years) with overweight/obesity.

Influential Stanford University meta researcher, Dr. John Ioannidis, called for radical reform of all nutritional epidemiology methodologies used to influence food/agricultural policies and to produce dietary guidelines for clinicians and the public.^{4} Currently, no methodology for relating body mass index (BMI) long-term or BMI change/year to food intake and physical activity has been generally accepted as rigorous, replicable, and scientifically valid. This study analysing worldwide data for dietary and other risk factors for BMI attempts to answer Dr. Ioannidis’ call for a more rigorous and reliable nutritional epidemiology methodology to base public health policies and individual dietary guidance.

The first objective of this analysis was to derive a multiple regression risk factor formula using worldwide ecological data on BMIs of male and female cohorts (dependent variable) and dietary and other risk factors (independent variables). Secondly, we used Bradford Hill causality criteria to test the BMI formula containing worldwide diet and other risk factors. Satisfying Bradford Hill causality criteria is considered validating in epidemiological research.^{5} Third we wanted to apply the BMI formula to assorted diet and other risk factor patterns to derive mean BMI estimates associated with long-term (>20 years) following of those diet and other risk factor patterns. We only hypothesised that the objectives were achievable.

## Methods

As volunteer collaborators with the Institute of Health Metrics and Evaluation (IHME), we received raw Global Burden of Disease (GBD) ecological data (≈1.4 Gigabytes) on mean BMIs of male and female cohorts 15-49 years old and 50-69 years old from each year 1990-2017 from 195 countries and 365 subnational locations (n=1120 cohorts). We also utilised GBD data on exposures to 32 risk factors and covariates potentially related to BMI. IHME dietary covariate data originally came from Food and Agriculture Organization surveys of animal and plant food commodities available percapita—as opposed to consumed per capita—in countries worldwide.^{6} Food risk factors came as g/day consumed percapita. Other variables were utilised as part of deriving the BMI formula and testing the BMI formula according to the Bradford Hill causality criteria: physical activity (METs/day), kilocalories per day available percapita (kcal/day, a covariate), severe child underweight (2 SD below the mean weight for age), and discontinued breast feeding before 6 months. Other available variables tested included socio-demographic index, LDL-cholesterol, fasting plasma glucose and, systolic blood pressure. Supplementary Table 1 lists the relevant GBD risk factors, covariates, and other available variables with definitions of those risk factor exposures.^{7}

GBD worldwide citations of over 12,000 surveys constituting ecological data inputs for this analysis are in Appendix 1.^{6,8} The main characteristics of IHME GBD data sources for BMI and all risk factor values have been published by IHME GBD data researchers and discussed elsewhere.^{9-11} These included detailed descriptions of categories of input data, potentially important biases, and methodologies of analysis. We did not clean or pre-process any of the GBD data. GBD cohort risk factor and BMI data from the IHME had no missing records.

Because this was a post hoc analysis and the GBD data for the study came from IHME, no ethics committee approval or institutional review board review was needed for this statistical analysis. The raw data for this analysis may be obtained by volunteer researchers collaborating with IHME.^{12}

IHME made available GBD BMI, risk factor, and covariate exposure data for each year from 1990-2017 for male and female cohorts from ages 15-49 years old and from 50-69 years old. To maximally utilise the available data, we averaged the values for ages 15-49 years old together with 50-69 years old for BMI and for each risk factor exposure for each male and female cohort for each year. Finally, for each male and female cohort, data from all 28 years (1990-2017) on mean BMI and on each of the risk factor exposures were averaged using the computer software program R.

To weigh the country and subnational data according to population, internet searches (mostly Wikipedia) yielded the most recent population estimates for countries and subnational states, provinces, and regions. World population data from the World Bank and the Organisation for Economic Co-operation and Development could not be used because they did not include all 195 countries or any subnational data.

Using the above described formatted dataset of risk factors, covariates, and BMIs, a software program in R generated a population-weighted analysis dataset. Each male or female cohort in the population-weighted analysis dataset represented approximately 1 million people (range: < 100,000 to 1.5 million). The analysis dataset had n=7886 cohorts (rows of data), half male and half female, representative of over seven billion people. For example, India with about 1.234 billion people had 617 rows of the same data for males and 617 rows for females. Maldives, with about 445,000 people, had a single row of data for males and another for females. Without population-weighting the data, cohorts in India and Maldives each would have had one row in the analysis dataset, invalidating the analysis results.

This report follows the STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) guidelines for reporting global health estimates.^{13}

Supplementary Table 2 details how omega-3 fatty acid g/day was converted to fish g/day using data on the omega-3 fatty acid content of frequently eaten fish from the National Institutes of Health Office of Dietary Supplements (USA).^{14} As shown in Supplementary Table 3, we converted all of the animal and plant food data, including alcohol and sugary beverage consumption, from g/day to kcal/day. For the g to kcal conversions, we used the Nutritionix track app,^{15} which tracks types and quantities of foods consumed. Saturated fatty acids risk factor (0-1 portion of the entire diet) was not available with GBD data from 2017, so GBD saturated fatty acids risk factor data from 2016 was used. Polyunsaturated fatty acid and trans fatty acid GBD risk factor data from 2017 (0-1 portion of the entire diet) was also utilised. These fatty acid data were converted to kcal/day by multiplying by the kcal/day available for each cohort.

### Statistical methods

To determine the strengths of the risk factor correlations with mean BMIs of population weighted worldwide cohorts (7886 cohorts) or subgroups of cohorts (e.g., continents, socio-demographic quartiles, etc.), we utilised Pearson correlation coefficients: r, 95% confidence intervals (95% CIs), and *P* values.

In this first of its kind, post hoc, exploratory analysis; the methodology was determined as we proceeded by experimenting with strategies to optimise the functioning of the BMI formula derived. We sought to derive a BMI multiple regression formula from worldwide data with the following characteristics:

including as many as possible of the available animal and plant food variables,

including physical activity and other plausibly informative available ecological variables, and

combining dietary variables if appropriate to utilise more variables and improve BMI formula strength (i.e., R

^{2}with BMI) and consistency at predicting subgroup mean BMIs.

Appendix 2 further details the statistical methodology and explains the use of Bradford Hill causality criteria to assess whether the worldwide BMI was causally related to the risk factors in the BMI formula. Briefly, we tested the BMI formula output with the Bradford Hill causality criteria (1) strength, (2) consistency, (3) dose response, (4) temporality, (5) analogy, (6) plausibility, (7) experimentation, and (8) coherence. For each criterion, we used a 0-5 scale to assess the magnitude of support of the BMI formula output being causally related to the BMIs worldwide (0=no support of causality to 5=very strong support of causality).

In determining the variables to include and exclude in worldwide BMI formula, we set the statistical threshold for a variable to enter and to remain in the formula at *P* < 0.25. We used SAS and SAS Studio statistical software 9.4 (SAS Institute, Cary, NC) for the data analysis.

## Results

Table 1 shows the basic statistics and univariate correlations of mean BMI with dietary and other risk factors and with the BMI formula derived from worldwide cohorts. Table 2 shows the univariate correlations of each of the 24 BMI formula risk factors with each other. All six animal foods (processed meat, red meat, fish, milk, poultry, and eggs), alcohol, sugary beverages, corn availability, potato availability, saturated fatty acids, polyunsaturated fatty acids, and trans fatty acids all positively correlated with BMI. Table 2 shows that corn availability (kcal/day percapita, a covariate) correlated moderately strongly with sugary beverages (r=0.419, 95% CI 0.400 to 0.437, *P*<0.0001), suggesting that high fructose corn syrup may account for the positive correlation with BMI. Potato availability (kcal/day percapita, a covariate), which positively correlated with BMI, included ≥50% highly processed potato products worldwide according to the International Potato Center.^{16} Four starchy plant foods (whole grains, legumes, rice, and sweet potatoes) all negatively correlated with BMI.

The strong positive correlations of fruits, vegetables, and nuts and seeds with BMI were unexpected since nutritional literature broadly supports these foods as non-obesogenic. It suggested possible multicollinearity (when an independent variable is highly correlated with one or more known or unknown other independent variables). Supplementary Table 4 demonstrates the multicollinearity of fruits, vegetables, and nuts and seeds with BMI increasing foods:

BMI increasing foods (animal foods + alcohol + sugary beverages + corn + potatoes) as a group were very strongly correlated with BMI (r= 0.597, 95% CI 0.583 to 0.611,

*P*< 0.0001).Fruits, vegetables, nuts and seeds grouped together were strongly positively correlated with BMI (r= 0.655, 95% CI 0.642 to 0.677,

*P*< 0.0001) but also with BMI increasing foods (r= 0.323, 95% CI 0.304 to 0.343,*P*< 0.0001).

Supplementary Table 4 also shows that the top sociodemographic index (SDI) cohort quartile’s mean BMI was about 3.6 BMI units higher than the three lowest SDI quartiles (24.54 versus 20.88), meaning that the more developed countries had much higher mean BMIs. BMI increasing foods in the top SDI quartile exceeded intake in the three lowest quartiles by 58% (905/kcal/day / 571 kcal/day=1.58). Fruits, vegetables, and nuts and seeds in the top SDI quartile exceeded intake in the three lower quartiles by 1.9 to 1 (199/kcal/day / 106 kcal/day=1.88). Because of the multicollinearities of fruits, vegetables and nuts and seeds with the 10 BMI increasing foods, we grouped fruits, vegetables and nuts and seeds with BMI decreasing foods (including whole grains, legumes, rice, and sweet potatoes) in a combination variable used to derive the BMI formula, gives the coefficients of risk factors equated to their percent weights (shown in Supplementary Table 5):

BMI formula =0.29*processed meat +11.87*red meat +0.02*fish +3.74*milk +11.09*poultry +5.46*eggs +1.95*alcohol +6.29*sugary beverages +0.29*corn +3.53*potatoes +2.71 * saturated fatty acids+ 0.64*polyunsaturated fatty acids+0.06*trans fatty acids-0.37*fruit -0.52*vegetables -0.03*nuts and seeds - 0.12 *whole grains -0.47*legumes-8.26*rice-0.18*sweet potatoes -9.93*physical activity (METs/week) +5.78*total kcals/day available-4.13*child underweight+0.92*discontinued breast feeding. Percent weights of risk factors were equated to risk factor coefficients. Total percent weights=78.64. (BMI formula R

^{2}=0.7864, r=0.887, 95% CI 0.882 to 0.891,P<0.0001, + signs mean BMI increasing and - signs mean BMI decreasing).

As detailed in the methods and Appendix 2, the standardised BMI formula coefficients of the dietary and other risk factors were all adjusted to equate to the percent weights of risk factors in the BMI formula.

As shown in Supplementary Table 5, the 13 BMI increasing foods and seven BMI decreasing foods underwent three adjustments: (1) multiplied times their mean kcal/day values (Column C), (2) multiplied times their R^{2} values in univariate correlation with BMI (Column E), and (3) multiplied times the ratio of the mean kcal/day of the second, third, and fourth sociodemographic index (SDI) quartiles divided by the mean kcal/day of the first SDI quartile (Column G). Table 3 demonstrates that these adjustments improved the consistency of the BMI formula as measured by the closeness of fit of the multiple regression derived BMI formula outputs with the mean BMIs of 37 subgroups of the worldwide data.

### BMI formula output analysed by Bradford Hill causality criteria

Eight Bradford Hill causality criteria tested the functionality of the BMI formula. With the methodology detailed in Appendix 2, we assessed the relationship of the worldwide BMI formula output versus the mean BMI as follows:

Strength=5—The correlation coefficient of the BMI formula regressed with BMI: r= 0.887 (95% CI: 0.882 to 0.891),

*P*< 0.0001.Consistency score=5— Table 3 shows the average absolute differences between subgroup mean BMIs and BMI formula estimates of the BMI formula compared with two other variations of the BMI formula. The BMI formula with fruits, vegetables, and nuts and seeds grouped with BMI decreasing foods as negatively correlated with BMI had a slightly closer fit than a comparison BMI formula with fruits, vegetables, and nuts and seeds grouped with BMI increasing foods positively correlated with BMI (average BMI - BMI formula estimate (absolute difference): 0.249 versus 0.255 BMI units). Without the three adjustments to each of the food variables, the average BMI - BMI formula estimate absolute difference was 0.368 BMI units.

Dose-response (Biological gradient) score=5—Table 3 shows that the mean of the BMI absolute differences between the BMI formula estimates and mean BMIs and in the four dose-response quartiles was 0.302 BMI units.

Temporality score=5—a multiple regression with the worldwide BMI trend 1990-2007 (dependent variable) versus the risk factor trends (independent variables) generated a BMI trend formula versus risk factor BMI trends: r= 0.615, (95% CI: 0.601 to 0.628),

*P*<0.0001). Fifteen out of 24 risk factor trends contributed to the BMI trend formula. The derivation of standardised BMI trend versus risk factors trends formula with sign concordant coefficients adjusted to equal the percent weights of the significant risk factors shown in Supplementary Table 7 was as follows:BMI trend formula=4.84 * Red meat + 0.63 * Milk + 4.61 * Poultry + 2.62 * Eggs + 3.91 * Alcohol + 2.69 * Sugary beverages + 0.40 * corn + 3.60 * Potatoes + 3.90 * PUFA - 0.65 * Legumes - 3.08 * Rice - 0.80 * Sweet potatoes + 6.06 * kcal available (total percent weight 37.79, R

^{2}=0.3779).Analogy score=4—Systolic blood pressure had 15/24 risk factors concordant in sign with BMI. SBP correlated weakly with the BMI formula (r= 0.024, 95% CI 0.002 to 0.046,

*P*=0.0361). The low density lipoprotein cholesterol (LDL-C) and the fasting plasma glucose (FPG) formulas had 23/24 and 21/24 risk factors concordant in sign with the BMI formula, respectively. The LDL-C and the FPG were both strongly correlated with BMI (r=0.757, 95% CI 0.747 to 0.766,*P*< 0.0001 and (r=0.565, 95% CI 0.550 to 0.580,*P*< 0.0001, respectively). They were also strongly correlated with the BMI formula (r=0.757, 95% CI 0.747 to 0.766,*P*< 0.0001 and r=0.581, 95% CI 0.566 to 0.595),*P*< 0.0001, respectively.Plausibility: Score=5—Based on systematic medical literature reviews, physical activity inversely correlated with BMI

^{17}and BMI directly correlated with intakes of sugar,^{18}alcohol,^{19}and animal foods.^{20}The relationship of adult BMI with early childhood severe underweight has not been reported worldwide. Since people in poor countries have less animal foods and alcohol and more physical exercise than in developed countries, it is plausible that childhood severe underweight positively correlates with lower BMI in adulthood.Experiment: Score=5—Table 4 demonstrates cross validation of the BMI formula with a very high degree of reproducibility in the percent weight distributions from 20 randomly selected subgroups of the dataset (n=100 for each, which are shown in Supplementary Table 7).

View this table:Coherence: Score=5—As evidenced by the near perfect score 34/35 on the first seven criteria, the Bradford Hill causality criteria overall were strongly supportive that the worldwide BMI formula accurately modeled the 24 risk factors that led to increased and decreased BMI—total causality criteria score=39/40.

Relative to other countries, the USA had one of the world’s highest levels of kcal/day of BMI increasing foods (exclusive of fatty acids), 1015 kcal/day and below average consumption of BMI decreasing foods, 326 kcal/day, corresponding to an average adult BMI=26.66. Based on the USA BMI trend data, the mean adult BMI in 1990 was 25.45 and is projected to be 28.13 in 2020, and 30.81 in 2050. Table 5 shows BMI formula estimates for various relevant patterns of diet and/or other BMI formula risk factors. Following the Dietary Guidelines for Americans 2015 -2020 would result in a mean adult BMI of 26.34, which is almost the mean USA adult BMI from 1990 - 2017 (26.66). A major increase in exercise on a regular basis alone would decrease the BMI formula estimated mean long term BMI. In addition to the current mean level of physical activity (3853 METs/week), running two hours/day on average at 6 miles per hour (8400 METs/week)^{21} would reduce the estimated mean BMI by 2.68 BMI units to 23.98.

## Discussion

The eight Bradford Hill criteria used in this analysis of long term worldwide data all support that the BMI formula is causally related to adult BMI.

Short of extreme and unrealist for most people increases in exercise to control weight, dietary change from the USA current dietary patterns or even the USA dietary guidelines recommended diet would be required to reach the normal range for BMI (BMI ≥ 18.5 and BMI < 25). For example, shifting 50% of kcal/day from BMI increasing foods to BMI decreasing foods equates to a BMI formula estimate=23.26 BMI units over the long term. While following a low carbohydrate, high fat diet has been demonstrated to cause modest short term weight loss in obese people,^{22} the BMI formula projects that the long-term effect (> 20 years) would be obesity (projected mean BMI=31.76).

The perhaps counterintuitive findings that fruits, vegetables, and nuts and seeds were positively correlated with BMI (r=0.655, 95% CI 0.642 to 0.677, *P* < 0.0001) may be largely explained by the multicollinearity between fruits, vegetables, and nuts and seeds and BMI increasing foods (r=0.323, 95% CI 0.304 to 0.343, *P* < 0.0001, Table 2).The high cost of fruits, vegetables, and nuts and seeds may account for much of this multicollinearity both worldwide and within wealthy countries that have high economic inequality levels like the USA. A systematic review of the literature on food cost relative to nutrient quality found that the median costs of starches (€0.14/100 kcal) was quite low relative to fruits and vegetables (€0.82/100 kcal), meat/eggs/fish (€0.64/100 kcal), fresh dairy (€0.32/100 kcal), and nuts (€0.25/100 kcal).^{23}

The findings in this study, particularly those in Table 5, should influence government food policy decisions. The Supplemental Nutrition Assistance Program (SNAP—formerly Food Stamps) spends an estimated 22.6% of its $73 billion/year budget^{26} on payments to low-income Americans for “sweetened beverages, prepared desserts, salty snacks, candy, and sugar.”^{27} Additionally, the US Department of Agriculture (USDA) subsidises crops that go primarily for animal feed or that are processed into sugars while not subsidizing fruits and vegetables.^{24} While the USDA recognises the relatively low intake of fruits and vegetables in the USA and sponsors a publicity campaign to increase fruits and vegetables,^{25} USDA expenditures should promote reduced prices of BMI decreasing foods and increased prices of BMI increasing foods.

The strong correlation of discontinuation of breastfeeding with BMI (r=0.802, 95% CI 0.794 to 0.810, *P*<0.0001) was in accord with a meta-analysis of breast feeding related to subsequent childhood and adult BMI by Jeanne Stolzer.^{26} However, the estimated benefit in reducing adult BMI of breastfeeding in this study was very modest (0.19 BMI unit, see Table 5). Breastfeeding for at least six months is recommended by the American Academy of Pediatrics^{27,28} and the National Institute of Clinical Excellence.^{29}

### Limitations

The GBD data on animal and plant foods were not comprehensive and comprised only 1199 kcal/day on average. Subnational data were available on only four countries. Because the data formatting and statistical methodology were new, this was necessarily a post hoc analysis and no pre-analysis protocol was possible. As detailed in the Foresight Report on obesity,^{30} obesity is affected by a complex system of interacting factors besides diet, physical activity, and breast feeding. So genes,^{31} gut microbiome,^{32} ultra processing of food,^{33,34} and other influences on BMI were outside of the purview of this analysis.

## Conclusion

Eight Bradford Hill causality criteria strongly supported that the worldwide obesity epidemic is causally related to the 24 risk factors in proportion to their coefficients in the BMI formula. The findings in this study should be considered by health policymakers drafting dietary guidelines for healthy weight management. While this study deals only with dietary and other risk factors for BMI (overweight/obesity), the methodology introduced could easily apply to estimating percent weights of multiple dietary and other risk factors that pertain to dozens of non communicable diseases, for which the IHME have GBD data.

## Data Availability

The raw, unformatted data used in this analysis 2017 Global Burden of Disease data) are now out of date. The 2019 GBD data on all the variables in this analysis may be obtained from the IHME by volunteer collaborating researchers. Our data formatting software code in R and SAS and our formatted database are available on request to researchers who are collaborators with IHME.

## Authors’ contributions

DKC acts as guarantor; conceived and designed the study, acquired and analysed the data, interpreted the study findings, drafted the manuscript, critically reviewed and edited the manuscript and tables, and approved the final version for publication.

CW designed software programs in R to format and population weight the data, aided with the SAS statistical analysis, critically reviewed the manuscript, and approved the final version for publication.

The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

## Copyright

The retain the copyright to the paper

The Patient and Public Involvement

When and how were patients/public first involved in the research?

IHME acquired, catalogued and extracted information from over 12,000 surveys from government and non government agencies in order to compile the GBD database.

How were the research question(s) developed and informed by their priorities, experience, and preferences?

The diverse surveys contributing to the GBD database had many different priorities and intentions unrelated to this post hoc analysis. The current analysis research questions were developed to give researchers, policymakers, and the public the methodological tool to quantify the BMI impacts of dietary and other risk factor patterns.

How were patients/public involved in

the design and conduct of the study?

The way patients/public were involved in the collection of the surveys has not been systematically studied and reported on. Patients/public are the intended beneficiaries of this analysis of the GBD data.

choice of outcome measures?

The outcome measures chosen all related to human health and to adult BMI.

recruitment to the study?

The recruitment methods varied by the survey.

How were (or will) patients/ public be involved in choosing the methods and agreeing plans for dissemination of the study results to participants and linked communities?

IHME and all the health surveyors that contributed will make these decisions. This analysis of the data will be available to all by open access.

## Competing interests statement

Both authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

## Contributors

Martin Sebera, from the Department of Kinesiology, Faculty of Sports Studies Masaryk University, Czech Republic, critiqued statistical aspects of the manuscript and provided useful input. Pavel Grasgruber, from Masaryk University, Czech Republic, provided suggestions after reviewing the manuscript.

## Transparency declaration

The manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned have been explained.

## Ethics

Studies based solely on data from IHME GBD database do not need approval from any bioethics committee.

## Funding

This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. The Bill and Melinda Gates Foundation funded the acquisition of the data for this analysis by the IHME. The data were provided to the authors as volunteer collaborators with IHME.

## Details of the role of the study sponsors

While IHME GBD faculty and staff by virtue of Gates Foundation grants provided the raw data for this analysis, they did not vet the analysis or sponsor the manuscript.

## Statement of independence of researchers from funders

The researchers have received no funding. Gates Foundation funded IHME to collect and analyse the GBD data.

## Dissemination declaration

Dissemination of this manuscript to the participants of the more than 12,000 surveys is not possible individually, but the information will become in the public domain.

## Disclosures

We thank Scott Glenn and Brent Bell from IHME who supplied us with the GBD risk factor exposure data for the risk factors and for BMI data.

## Ethics

Studies were based solely on data from the IHME GBD database and do not need approval from any bioethics committee.

## Participant informed consents

Not applicable.

## Author access to data

As volunteer collaborators with the Institute of Health Metrics and Evaluation, we received about 1.4 gigabytes of raw data on BMI and 32 relevant risk factors for BMI and other health outcomes.

## Data sharing statement

The raw, unformatted data used in this analysis is now out of date. The 2019 GBD data on all the variables in this analysis may be obtained from the IHME by volunteer collaborating researchers. Our data formatting software code in R and SAS and our formatted database are available on request to researchers.

## Protocol, submitted as a supplementary file

Not applicable.

## STROBE checklist

Submitted.

## Patient consent

Not applicable.

## Clinical trial registration

Not applicable.

## Acknowledgments

## Appendix 1. Worldwide surveys contributing to the IHME GBD risk factor data.

Online only.

## Appendix 2. Bradford Hill causality criteria based assessment methodology detailed

A literature search revealed no published methodological precedents for statistically modeling the correlation between mean BMIs of worldwide countries and subnational regions/provinces/states and their corresponding dietary and other risk factors. Since the Bradford Hill causality criteria^{1} (enumerated by the English occupational physician and epidemiologist Sir Austin Bradford Hill) are the gold standard assessment tools to test causality of risk factors related to health outcomes, we explored the data looking for mean BMI to risk factor correlations that could be tested with the Bradford Hill criteria.

The relevant causality criteria included #1 strength, #2 consistency, #3 dose-response (biological gradient), #4 temporality, #5 analogy, #6 plausibility, #7experimentation, and #8 coherence. The non applicable criterion is specificity. Since diet, physical activity, and other risk factors may affect many non-communicable disease health outcomes, specificity is not relevant. As Dr. Bradford Hill said, “In short, if specificity exists we may be able to draw conclusions without hesitation; if it is not apparent, we are not thereby necessarily left sitting irresolutely on the fence.”^{1}

In considering many candidate methodologies involving univariate and multiple regression analysis, we sought a technique that produced the worldwide BMI versus risk factors multiple regression formula which most accurately predicted the mean BMI of subgroups (e.g., continents, socio-economic index quartiles, etc.). Such a BMI formula with quantifiable high accuracy in predicting the mean BMI of subgroups would necessarily score high with the Bradford Hill criteria strength, consistency, and dose-response (biological gradient). Of the candidate statistical modeling strategies, the methodology that created the worldwide BMI formula that most accurately estimated subgroup mean BMIs had the following steps:

Transform all the food group risk factors (i.e., alcohol, total sugar, plant foods, animal foods, and fatty acids) from g/day to kilocalories/day (kcal/day).

Multiply each food group risk factor by its mean kcal/day consumption.

Multiply each risk factor by the R

^{2}(coefficient of determination) of its univariate correlation with BMI.Adjust for multicollinearities (any animal or plant risk factors in the combination variables that have significant multicollinearity (when an independent variable is highly correlated with one or more of the other independent variables in a multiple regression equation).

After performing #2, #3, and #4 above, form combination dietary variables with individual foods each expressed in kcal/day to be added together as the data suggests are appropriate:

For example: processed meat * processed meat kcal/day * processed meat R

^{2}* processed meat kcal/day in bottom three SDI quartiles/top quartile + red meat * red meat R^{2}* red meat kcal/day * red meat kcal/day in bottom three SDI quartiles/ kcal/day in the top quartile + …+ eggs * eggs kcal/day, etc.Explore methods to adjust for multicollinearities based in part on maximizing the closeness of fit of the multiple regression derived BMI formula outputs with the mean BMIs of 37 subgroups of the worldwide data.

Add physical activity before performing multiple regression.

Perform multiple regression analysis with the food combination variables in #5 and physical activity to generate a BMI formula. (e.g., BMI formula= combination variable

_{a}* parameter estimate_{a}+ combination variables_{b}* parameter estimate_{b}…+ physical activity * parameter estimate of physical activity.Explore integrating the variables kcal available, child underweight, and discontinued breast feeding into the BMI formula.

Discard any of these variables that’s sign in regression with BMI switches from univariate to the multivariate analysis (e.g., + in univariate and - in multivariate formula).

To simplify the calculations, use the Excel spreadsheet and transfer back and forth to SAS Studio.

To test the BMI formula with subgroups, match the BMI formula mean and standard deviation (SD) with that of mean worldwide BMI:

Adjust the SD of the BMI formula output to equal the SD of the worldwide BMI by multiplying the BMI formula output times the BMI SD / BMI formula SD.

Adjust the BMI formula constant to equal the worldwide population’s mean BMI by adding the difference between the mean BMI and the BMI formula constant.

Explore the BMI formula functionality with different Bradford Hill criteria with standardising the BMI and BMI risk factors or with all variables non standardised.

We estimated percent weights attributable to each risk factor with the standardised BMI formula from the above methodology with the following additions:

Equate the sum of the individual food and other risk factor parameter estimates to the total R

^{2}of the BMI formula:Total the absolute values of the parameter estimates for each food and any non food risk factors.

Divide the worldwide BMI formula R

^{2}by the sum of the risk factor parameter estimates.Multiply the result of b above times each of the individual risk factor parameter estimates.

Multiply the results in #1 above times 100 to generate percent weights for each risk factor. The sum of the risk factor percent weights will equal the BMI formula R

^{2}times 100.

To cross validate the generated worldwide BMI formula experimentally, create 20 random subgroups of 100 cohorts with the macro function in SAS. Determine each of the 20 subgroup BMI formulas, and, from those 20 BMI formulas, create a table with: mean, SD, minimum, and maximum for each of the risk factors.

### Bradford Hill causal criteria testing methodology applied to current study

The relevant eight of the nine original Bradford Hill criteria were each scored as:

“5” very strongly supporting causality,

“4” strongly supporting causality,

“3” moderately strongly supporting causality,

“2” supporting causality,

“1” weakly supporting causality, and

“0” not supporting causality

The scoring for each Bradford Hill causal criterion was as follows:

Strength: The correlation coefficient, r, of the worldwide multiple regression derived BMI formula with BMI (dependent variable) and BMI risk factors (independent variables) assessed strength.

Scoring of strength:

5=BMI formula regressed with BMI r ≥0.50 and

*P*<0.00014=BMI formula regressed with BMI 0.50>r≥0.40 and

*P*<0.00013=BMI formula regressed with BMI 0.40>r≥0.30 and

*P*<0.00012=BMI formula regressed with BMI 0.30>r≥0.20 and

*P*<0.00011=BMI formula regressed with BMI 0.20>r≥0.10 and

*P*<0.00010=BMI formula regressed with BMI r< 0.10 or

*P*≥0.0001Consistency: For the purposes of this study, consistency between BMI and BMI formula output was determined by comparing the mean BMI and the mean BMI formula output in each of the following 37 subgroups:

A variable socio-demographic index (SDI)—see Supplementary Table 1 for definition of SDI—divided the world’s population by quartiles of SDI.

A variable “continents” allowed for analyses of countries from each of the six inhabited continents.

The four countries (UK, USA, Mexico, and Japan) with subnational data on BMI and the risk factors were grouped and assessed to compare the BMI formula output with the overall mean BMI in those countries.

Based on the total kcal/day of all foods that increased BMI, a combination variable was constructed and the world’s population divided into quartiles from the highest to lowest total kcal/day.

Similarly, based on the total kcal/day of all foods that decreased BMI, we divided the world’s population into quartiles from the highest to lowest by total kcal/day.

Based on physical activity (METs/week), the world’s population was divided into quartiles.

We evaluated dose response by dividing the BMI formula output into quartiles from the highest to lowest.

The four countries with subnational data were individually evaluated.

The first four of the 20 random number generated database subgroups were included in the consistency analysis.

Male and female were separately assessed.

Assessing dose-response was done in quartiles by the BMI formula outputs after the BMI formula was harmonized with the mean and SD of worldwide mean BMI. See the dose-response criteria (#4).

For each of the 37 subgroups, the absolute differences between the means of BMI and the BMI formula output in were totaled (e.g., continent Africa BMI formula output - mean BMI for Africa, etc.).

Scoring of consistency for BMI compared with BMI formula output for each of the 37 subgroups:

5=The mean of the absolute differences between mean BMI and BMI formula output was ≤ 0.40 units.

4=The mean of the absolute differences between mean BMI and BMI formula output was ≤ 0.50 units.

3=The mean of the absolute differences between mean BMI and BMI formula output was ≤ 0.60 units.

2=The mean of the absolute differences between mean BMI and BMI formula output was ≤ 0.70 units.

1=The mean of the absolute differences between mean BMI and BMI formula output was ≤ 1.0 units.

0=The mean of the absolute differences between mean BMI and BMI formula output was > 1.0 units.

Dose-response (biological gradient): Dr. Bradford Hill thought that a clear dose-response effect on the incidence of disease with exposure to a single risk factor was the clearest evidence of a causal relationship. More recently, it has been realized that cause and effect relationships are often more complex. In this analysis of dose-response, instead of using single risk factor levels related to BMI, levels of a multivariable regression derived BMI formula outputs in quartiles were related to mean BMIs in those quartiles. Dose-response of BMI formula estimates versus mean BMI were included in the testing of consistency above.

Scoring of dose-response (biologic gradient) was based on this mean absolute difference when the BMI formula output was divided into quartiles:

5= The BMI formula output versus mean BMI absolute differences from each of the four quartiles averages ≤ 0.40 BMI units.

4= The BMI formula output versus mean BMI absolute differences from each of the four quartiles averages ≤ 0.50 units.

3= The BMI formula output versus mean BMI absolute differences from each of the four quartiles averages ≤ 0.60 BMI units.

2= The BMI formula output versus mean BMI absolute differences from each of the four quartiles averages ≤ 0.70 BMI units.

1= The BMI formula output versus mean BMI absolute differences from each of the four quartiles averages ≤ 1.0 BMI units.

0= The BMI formula output versus mean BMI absolute differences from each of the four quartiles averages > 1.0 BMI units

Temporality: The dictionary defines the noun temporality as: “The state of existing within or having some relationship with time.” Bradford Hill said, “Temporality refers to the necessity that the cause to precede the effect in time. This criterion is unarguable, insofar as any claimed observation of causation must involve the putative cause C preceding the putative effect D. It does not, however, follow that a reverse time order is evidence against the hypothesis that C can cause D. Rather, observations in which C followed D merely shows that C could not have caused D in these instances; they provide no evidence for or against the hypothesis that C can cause D in those instances in which it precedes D.”

Dr. Bradford Hill was an occupational physician before the current availability in nutritional epidemiology of data on trends over 28 years of 18 components of worldwide diets along with the global BMI trend. Consequently, now it is fair to test temporality by deriving a standardised multiple regression formula with BMI trend, measured by the slope of the least squared regression line (LSRL) over 1990-2017, as (dependent variable). The independent variables would consist of the LSRL trends over 1990-2017 of the same dietary components, physical activity, and other variables as in the original BMI formula. We considered the strength (r) of the BMI trend formula versus the BMI trend an appropriate measure of temporality. Any risk factor was excluded from the trend BMI formula if its sign (+ or -) did not match the sign in the original BMI formula.

With a standardised multiple regression methodology similar to that of deriving risk factor percent weights above, a standardised BMI trend (dependent variable) versus standardised risk factor trends (independent variables) formula was derived with the risk factor parameter estimates adjusted to equate to trend percent weights.

Scoring of temporality: The Pearson correlation coefficient, r, of the resulting multiple regression derived BMI trend versus risk factors trends formula:

5=r ≥0.50 and

*P*<0.0001.4=0.50>r ≥0.40 and

*P*<0.0001.3=0.40> r ≥0.30 and

*P*<0.0001.2=0.30>r ≥0.20 and

*P*<0.0001.1=0.20>r ≥0.10 and

*P*<0.0001.0=r<0.10 or

*P*≥0.0001Analogy: High BMI is among the four metabolic risk factors that are strongly associated with cardiovascular diseases, cancers, and other non-communicable diseases. The other major metabolic risk factors for non-communicable diseases are high systolic blood pressure (SBP), high low density lipoprotein cholesterol (LDL-C), and high fasting plasma glucose (FPG).

We tested analogy by the number of the diet, physical activity, and other variables in the percent weight of risk factor BMI formula that were also in the percent weight of risk factor formulas of the other metabolic factors and including only risk factors concordant in direction (+ or -) with the coefficients of percent weights of risk factors in the BMI formula.

Scoring of analogy:

5=At least three-quarters of BMI formula risk factors are also in the risk factor formulas of all three of the other metabolic factors (SBP, LDL-C, and FPG) and have concordant signs (+ or -) for each risk factor.

4= At least three-quarters of BMI formula risk factors are also in the risk factor formulas of two of the three other metabolic factors (SBP, LDL-C, and FPG) and have concordant signs (+ or -) for each risk factor.

3= At least three-quarters of BMI formula risk factors are also in the risk factor formulas of one of the three other metabolic factors (SBP, LDL-C, and FPG) and have concordant signs (+ or -) for each risk factor.

2= At least two-thirds of BMI formula risk factors are also in the risk factor formulas of one of the three other metabolic factors (SBP, LDL-C, and FPG) and have concordant signs (+ or -) for each risk factor.

1=At least half of BMI formula risk factors are also in the risk factor formulas of one of the three other metabolic factors (SBP, LDL-C, and FPG) and has concordant signs (+ or -) for each risk factor. 0= None of the above.

Plausibility: To test plausibility, we looked to find if any of our findings were at variance with the preponderance of studies published. We searched the medical literature particularly for systematic reviews of the relationships of foods and other variables with BMI.

Scoring of plausibility:

5=None of the current findings were at variance with the preponderance of the medical literature.

4= One of the current findings was at variance with the preponderance of the medical literature.

3= Two of the current findings were at variance with the preponderance of the medical literature.

2=Three of the current findings were at variance with the preponderance of the medical literature.

1=Four of the current findings were at variance with the preponderance of the medical literature.

0= Five or more of the current findings were at variance with the preponderance of the medical literature.

Experiment: Dr. Bradford Hill thought that evidence drawn from experimentation, including in epidemiologic studies, may lead to the strongest support for causal inference.

^{1}We used a cross validation method to assess Bradford Hill’s “experiment” criterion. Random number generation of 20 subgroups each with 100 cohorts derived 20 standardised BMI formulas to compare with the standardised worldwide BMI formula. Scoring of experiment:

5=At least 15 subgroups BMI formulas included all same risk factors with the same signs of coefficients as the worldwide BMI formula.

4=At least 10 subgroups of BMI formulas included all same risk factors with the same signs of coefficients as the worldwide BMI formula.

3=All 20 subgroups of BMI formulas included at least three-quarters of same risk factors with the same signs of coefficients as the worldwide BMI formula.

2=At least 10 subgroups of BMI formulas included at least three-quarters of same risk with the same signs of coefficients factors as the worldwide BMI formula.

1= At least 10 subgroups of BMI formulas included at least half of same risk factors with the same signs of coefficients as the worldwide BMI formula.

0=None of the above.

Coherence: According to Dr. Bradford Hill, “…cause and effect interpretation of our data should not seriously conflict with the generally known facts of the natural history and biology of the disease.”

^{1}In this analysis of BMI associated with BMI formula estimates, coherence was the numerical total score of the above seven relevant causality criteria each on a 0-5 scale. The maximum score was 40.Scoring of coherence:

5=Score on the first seven Bradford Hill causation criteria=35-40.

4=Score on the first seven Bradford Hill causation criteria=30-34.

3=Score on the first seven Bradford Hill causation criteria=25-29.

2=Score on the first seven Bradford Hill causation criteria=20-24.

1=Score on the first seven Bradford Hill causation criteria=15-24.

0=Score on the first seven Bradford Hill causation criteria<15.

## References

## Reference

- 1.