## Abstract

**Background** Goal 3.2 from the Sustainable Development Goals (SDG) calls for reductions in national averages of Under-5 Mortality. However, it is well known that within countries these reductions can coexist with left behind populations that have mortality rates higher than national averages. To measure inequality in under-5 mortality and to identify left behind populations, mortality rates are often disaggregated by socioeconomic status within countries. While socioeconomic disparities are important, this approach does not quantify within group variability since births from the same socioeconomic group may have different mortality risks. This is the case because mortality risk depends on several risk factors and their interactions and births from the same socioeconomic group may have different risk factor combinations. Therefore mortality risk can be highly variable within socioeconomic groups. We develop a comprehensive approach using information from multiple risk factors simultaneously to measure inequality in mortality and to identify left behind populations.

**Methods** We use Demographic and Health Surveys (DHS) data on 1,691,039 births from 182 different surveys from 67 low and middle income countries, 51 of which had at least two surveys. We estimate mortality risk for each child in the data using a Bayesian hierarchical logistic regression model. We include commonly used risk factors for monitoring inequality in early life mortality for the SDG as well as their interactions. We quantify variability in mortality risk within and between socioeconomic groups and describe the highest risk sub-populations.

**Findings** For all countries there is more variability in mortality within socioeconomic groups than between them. Within countries, socioeconomic membership usually explains less than 20% of the total variation in mortality risk. In contrast, country of birth explains 19% of the total variance in mortality risk. Targeting the 20% highest risk children based on our model better identifies under-5 deaths than targeting the 20% poorest. For all surveys, we report efficiency gains from 26% in Mali to 578% in Guyana. High risk births tend to be births from mothers who are in the lowest socioeconomic group, live in rural areas and/or have already experienced a prior death of a child.

**Interpretation** While important, differences in under-5 mortality across socioeconomic groups do not explain most of overall inequality in mortality risk because births from the same socioeconomic groups have different mortality risks. Similarly, policy makers can reach the highest risk children by targeting births based on several risk factors (socioeconomic status, residing in rural areas, having a previous death of a child and more) instead of using a single risk factor such as socioeconomic status. We suggest that researchers and policy makers monitor inequality in under-5 mortality using multiple risk factors simultaneously, quantifying inequality as a function of several risk factors to identify left behind populations in need of policy interventions and to help monitor progress toward the SDG.

## 1 Introduction

Goal 3.2 from the Sustainable Development Goals (SDG) requires reductions in under-5 mortality (http://www.un.org/sustainabledevelopment/health/). However, these reductions can co-exist with socioeconomic inequalities within countries where some groups have much higher mortality risk than others.^{1} Studies have suggested that some of the Millennium Development Goals, which preceded the SDG, have not been achieved within many countries because of high levels of inequality.^{2} Monitoring and reducing inequities in under-5 mortality requires the identification of births that are at highest risk of death such that policy interventions can target them.^{3} The United Nations (UN) General Assembly Resolution 68/261, which highlights the Sustainable Development Indicators as a central framework for making progress on reducing early-life mortality, recommends that health indicators should be disaggregated, where relevant, by income, sex, age, and other characteristics.^{4,5} Disaggregation of inequality by several demographic groups has a clear policy implication: leave no one behind.

The literature that monitors progress towards SDG often quantifies gaps in either key health outcomes, such as neonatal or under-5 mortality, or in the coverage of health services, such as prenatal care or sanitation. Researchers and policy makers monitor progress toward SDG by evaluating mortality rates broken down by stratifiers, including wealth quintiles, rural/urban residence, maternal education, maternal age, gender of the child and geographic location (see https://www.equidade.org/indicators).^{5} Even outside SDG monitoring, equity based strategies to reduce under-5 mortality usually measure gaps in average mortality rates between large groups of births, such as births from different socioeconomic groups within the same country.^{6-10} Studies have also documented significant under-5 mortality inequities across other demographic categories such as race, ethnicity, and geographic location.^{11-13}

Public health policies seeking to reduce inequality in early-life mortality often target births from an easily defined group with a high average mortality rates, usually the poorest.^{9,14-19} A recent meta-analysis shows that most targeted interventions aiming to improve maternal and child health often address economic disparities through various incentive schemes like conditional cash transfers and voucher schemes.^{20} For example, Cash Transfer Programs (CTP), currently implemented in many low and middle income countries (LMIC), often improve infant and child health.^{21,22} In Burkina Faso, families enrolled in conditional cash transfer schemes were required to obtain quarterly child growth monitoring at local health clinics for all children under 60 months of age.^{23} In India, the randomized controlled trial (RCT) *Lentils for Vaccines* targeted the poor, as do most RCTs that aim to increase vaccine uptake, good nutrition, or child health more generally.^{24}

One important assumption underlying these approaches to measure inequality and target populations is that most of the variability in mortality risk exists between groups of births, not within them. If that is the case, (a) comparing average mortality rates between groups provides us with a complete picture of the inequality in mortality risk faced by children in the population and (b) targeting the group with the highest average mortality risk will reach most high risk births in the population and reduce overall inequalities. However, if the grouping factors used to monitor inequality have high levels of within-group variation in mortality risk, then monitoring inequality based solely on between group comparisons will miss most of the variability in mortality risk and monitors will not be able to identify important left behind populations that require intervention.^{7} Using data from India a recent study shows that most of the variation in mortality risk exists within groups, not between groups, and that program targeting based on poverty alone can be inefficient.^{25} This makes sense as it is well known that multiple risk factors are associated with under-5 mortality risk.

In this paper we develop a novel framework to monitor disparities in mortality risk and to identify high risk subpopulations that cannot be identified otherwise. Our novel approach uses data from several demographic variables and a Bayesian hierarchical model to estimate mortality risk for each birth in our data set. We use these estimates to investigate within and between group variability across several commonly used demographic stratifiers that are used to monitor progress toward the SDG’s and make international comparisons in inequality in under-5 mortality. We identify children with the highest mortality risk in the population and show how to construct a targetable group that contains more deaths than other targetable groups of the same size that are based on only one risk factor, such as poverty. We identify the groups at highest risk in each country to gain insight on their needs. Our methodology supports UN recommendations to disaggregate health indicators by demographic stratifiers to guide inequality monitoring so that countries can meet SDG targets with equity. We offer a more comprehensive approach that considers the effects of multiple risk factors and their interactions on mortality risk.

## 2 Methods

Births are the units of our analysis. We first estimate mortality risk for each child in our data and then we use these estimates as inputs in our subsequent equity analysis.

### 2.1 Data Sources

The data used in this study comes from multiple Demographic and Health Surveys (DHS) (https://dhsprogram.com/). These are nationally representative surveys that have been conducted in more than 100 low and middle income countries since 1984.^{26,27} We analyze under-5 mortality and we exclude births that did not occur at least five years prior to the survey. We exclude all births that happen 10 years or more before the date of the survey to minimize measurement error and censoring issues. The final data set includes information on 1,691,039 births from a total of 182 different surveys from 67 countries, 51 of which had at least two surveys.

### 2.2 Estimating Mortality Risk

Mortality risk is a latent variable that must be estimated from data. Given our goal to improve inequality monitoring of the SDG, we base our estimation on predictors that are commonly used in studies that quantify progress toward SDG (https://www.equidade.org/indicators): maternal age, wealth, gender, year of birth, place of residence (urban/rural), maternal education in years.

The probability density functions (pdf) of the the original wealth index scores do not have a common range across countries. To make them more comparable across surveys we transform these pdf’s into cumulative distribution functions (cdf). This approach gives wealth scores from different countries and surveys a common range, the unit interval (0,1) and makes the results interpretable in terms of relative wealth, a proxy for socioeconomic status within the countries. Details of the transformation are given in the appendix.

We also include three other variables that are available in DHS surveys and could aid inequality monitoring and targeting. Geographical locations are well known risk factors for mortality, as mortality risk tends to be geographically clustered. Using sampling clusters from DHS in our model allows us to capture unmeasured variables at the local level that were not otherwise recorded in the data. Further, geographic locations can potentially be targeted by policy makers. Similarly, we also construct a 0 — 1 indicator variable for whether a child was born to a mother that had already experienced a death of a previous child. Prior death summarizes a number of risk factors at the maternal level that are not measured by existing variables. It is a forward looking variable because it only uses information on prior births to inform risk for the current birth. In particular, information on future siblings deaths are not used to predict past deaths and it is coded zero for a mother’s first birth. It is also an actionable risk factor because policy makers can potentially target births from those mothers, as they are identifiable. Finally, we include birth order, coded as a continuous variable.

We estimate child mortality for each birth in our data as a function of these predictors and their interactions in a Bayesian hierarchical logistic regression model. We fit one model to the data from each survey. To avoid model misspecification and allow for all important interactions among the risk factors, we include all two-way, three-way, and four-way interaction terms for all covariates in the model. We include piecewise linear splines to capture non-linear trends in mortality as a function of the continuous variables. To aid in the estimation and avoid overfitting, we place increasingly restrictive priors on the variance parameters of the random effects for the higher order interaction terms, which shrink effects toward zero. We incorporate a location random effect to model differences in risk between births from different locations.

### 2.3 Equity Analysis

We use estimates of the posterior distribution of mortality risk for each child in our data to feed our equity analysis. We use 1000 Markov Chain Monte Carlo (MCMC) samples from our model todo so. For the boxplots we use these samples to calculate the expected mortality risk for each child and then we plot these quantities.

We use box plots to display the within and between group variability in fitted mortality risk stratified by the DHS-assigned wealth quintile. We formally quantify how much of the variability in mortality risk is explained by the wealth quintiles using a Bayesian ANOVA, which allows us to get point and interval estimates of the R^{2}. Details of the ANOVA methods are given in the appendix.

Finally, we investigate whether using multiple risk factors simultaneously can help to identify high risk births that should be targeted by policy interventions. Using the last survey from each country, we compare how many actual deaths occur among the 20% highest risk births from our model versus the 20% poorest births based on the wealth CDF variable. Under the assumption that intervention has the same cost for each birth, we calculate the efficiency gain in targeting the highest risk births versus the poorest births by dividing the difference in mortality rates between highest risk births and poorest births by mortality rates among the poorest times 100. We thus define the efficiency gain as , where “HRDeaths” is mortality among the 20% highest risk births and “PoorDeaths” is defined as mortality among the poorest 20% of births. For each survey, we compare births in the high risk group to births not in the high risk group based on the following covariates: wealth, maternal education, maternal age, place of residency (urban/rural), whether the birth was born to a mother who has experienced a prior death of another child. We compare lower and higher mortality risk groups by using either risk ratios for categorical risk factors or mean risk difference for continuous risk factors.

#### 2.3.1 Incorporating Uncertainty in the Equity Analysis

We use estimates of the posterior distribution of mortality risk for each child in our data to feed our equity analysis. We use 1000 Markov Chain Monte Carlo (MCMC) samples from our model to do so. For the boxplots we use these samples to calculate the expected mortality risk for each child and then we plot these quantities. For ANOVA and other tabulations, we calculate a quantity for each MCMC sample so that we have a distribution of these quantities that can be used to calculate posterior means and intervals. These also allow us to implement significant tests.

### 2.4 Role of the funding source

We acknowledge financial support from the Eunice Kennedy Shriver National Institute Of Child Health & Human Development of the National Institutes of Health under Award Number K99HD088727 and CCPR’s Population Research Infrastructure Grant P2C from NICHD: P2C-HD041022. The sponsor of the study had no role in study design, data analysis, data collection, data interpretation, or writing of the report. The corresponding author had full access to all the data in the study; all authors had final responsibility for the decision to submit for publication.

## 3 Results

### 3.1 Mortality by Wealth Quintile in the Raw Data

All results use individual births as the unit of the analysis. Summaries of the Demographic and Health Surveys (DHS) are presented in Table 1. Each row presents data for one survey. From left to right, the columns in Table 1 are the number of births in each survey (N); the under-5 mortality rate (U5MR), defined as the fraction of births who die before age five, both overall and for each wealth quintile; and the proportion of deaths that occurred to the top 80% in wealth, which we call the non-poor deaths (NPD) fraction. If there is perfect equity in mortality across socioeconomic groups, then the NPD would be exactly 80%. If the poorest 20% contain more than their share of deaths, then the NPD would be lower than 80%. Under-5 mortality rates are generally higher for the poorest wealth quintiles, reflecting a socioeconomic gradient in mortality. Some countries, such as Egypt, exhibit a consistent decrease in mortality with increasing wealth quintile. In a few countries, mortality increases from the poorest to the second poorest quintile, such as in Burkina Faso (2003). In general, the NPD are typically between 50% and 75%. These results show that there are high risk children in all socioeconomic groups.

### 3.2 Quantifying Within and Between Group Variability

Figure 1 presents box plots showing the distribution of mortality risk for the last survey of each country. Countries are ordered from the highest median mortality risk (Sierra Leone) to the lowest median mortality risk (Ukraine). As the median mortality risk gets smaller, variance decreases as well. There is considerable overlap in mortality risk across countries. This suggests that country of birth explains only a small fraction of mortality risk and that all countries have some children with very high mortality risk.

Figure 2 presents the distribution of mortality risk across countries stratified by wealth quintile. Only the most recent survey is shown, and countries are ranked from highest to lowest median mortality risk, from top left to bottom right. Outliers are not shown and all graphs are presented on the same scale. For all countries and surveys in our sample, there is considerable overlap in mortality risk across socioeconomic groups within countries and this is true irrespective of a country’s average mortality level. Among higher mortality countries, Sierra Leone and the Central African Republic have clear socioeconomic gradients in mortality risk. Among lower mortality countries, Bolivia, Brazil, Nigeria, and Cameroon have the largest socioeconomic gradients in mortality risk. High mortality countries like Niger and Lesotho exhibit no socioeconomic gradients in mortality, and this is also true for some lower mortality countries, such as Ukraine, Armenia and Jordan. Conclusions from Figure 2 are thus consistent with those from Table 1.

Table 2 presents results from our analysis. The first column gives the country and year in which the survey was taken, and first row presenting the results across all surveys combined. Columns two through five show the mean, median, and standard deviation of the mortality risk distribution from our analysis, and the *R*^{2} of our ANOVA, which quantifies how much of the variance in mortality risk is explained by wealth quintile.

Globally, wealth quintile only explains about 3% of the variability in mortality risk. However, there is substantial country to country heterogeneity. The countries with the highest *R*^{2} values are India (23%), Nigeria (17%), Indonesia (14%), and Cameroon (14%). In contrast, Eswatini, Lesotho, Tanzania, Moldova, Sao Tome and Principe, Kyrgyzstan, Uzbekistan, Kenya, Ukraine, and Comoros all have *R*^{2} point estimates that are less than 1%. Further there is not a clear relationship between *R*^{2} and mean/median mortality risk. Using country of birth in the ANOVA gives a posterior mean *R*^{2} of 19%. Thus the ANOVA results confirm the findings from the boxplots of mortality rates in Figures 1 and 2 which show that while there is substantial country to country heterogeneity, within a given country wealth does not explain much of the variability in mortality risk.

Mortality risk distributions have a long right tail and in Table 2 the mean mortality risk is always higher than the median. In every country, there are individuals that face much higher mortality risk than the national average.

### 3.3 Comparing Mortality among Highest Risk and Poorest Children

Poverty status alone is often used to decide which families will be targeted by health interventions. However, high within group variability for socioeconomic groups suggests that targeting based on a single demographic variable is inefficient because there are high risk births in all socioeconomic groups. We formally demonstrate the validity of this hypothesis for the last survey of each country, comparing efficiency gains of targeting the 20% poorest compared to targeting the 20% highest risk. Results are presented in Table 3. For all surveys and all countries, our approach is much more efficient in identifying high risk births than targeting the poor. Efficiency gains range from 26% in Mali (1996), to more than 550% in Guyana (2009). Efficiency gains are not strongly related to a country’s average mortality rates.

### 3.4 Who are the Highest Risk Children?

We define the high risk (low risk) births for a particular country and survey as those in the top 20% (bottom 80%) of all births in terms of mortality risk as estimated by our model. For each of the continuous (categorical) variables, we calculate means of the variable for high and low risk births and the difference (odds ratio). Results are presented in Tables 1-7 in the appendix for the last survey in each country. Higher risk births have younger mothers on average compared to lower risk births, but the differences are not substantively important: mothers from low risk group are usually less than a year older than mothers from the high risk group. High risk and low risk groups are also comparable for birth gender. For maternal education, there is often a significant difference between high risk and low risk births, but the difference is not substantively important. There is on average less than a year of additional education for mothers from the low risk group. There is also often a statistical, but not substantive difference in birth order.

The most substantial differences between the higher and lower risk groups are for residency (urban/rural), wealth, and previous death of a sibling. High risk births are substantively poorer than the remaining 80% of the population. In Cambodia, high risk births average at the poorest 32^{nd} percentile of wealth while the low risk births average around the 53^{rd} percentile of wealth. We find similar results for other countries: Bolivia: 32% against 52%; Brazil: 31% against 53%; Peru: 30% against 53%; Nigeria: 32% against 53%.

High risk births are disproportionately born to mothers that have already experienced a prior death of another child. The odds ratio is 18.8 (13.1, 26.7) in Benin; 16.3 (10.9, 24.1) in Mali; and 15.4 (11.9, 19.9) in Nigeria. Even for relatively wealthier countries, the odds ratio for another death is high for mothers that have experienced a prior death. The only countries in which a prior death is not a significant risk factor for a subsequent birth are Moldova and Vietnam. Ukraine seems an exception, but the fractions of the births with a prior death are small, and this makes the odds ratio for Ukraine not very meaningful.

## 4 Discussion

In this study we have investigated inequality in under-5 mortality within and between socioeconomic groups for a large pool of LMIC. We have made three related contributions to the existing research. First, we show that for all 67 countries in our sample, most of the variability in mortality risk exists within socioeconomic groups, not between groups. Second, we show that within countries the average mortality risk — which is closely related to national averages of child mortality — is far from the typical (modal) mortality risk experienced by most births. Third, we show that poverty status alone, while important, is a poor proxy for being at the higher risk of an an early death than the general population. All these findings have important policy implications. In addition, we have developed new methods to analyse inequality in mortality risk which have broad applicability.

While quantifying inequality in under-5 mortality between socioeconomic groups is important it misses a larger within-group inequality. In particular, we have shown that for most countries socioeconomic group explains less than 5% of the total variability in mortality. Even in countries where socioeconomic inequality matters the most, socioeconomic group explains very little of the variation in U5MR. For example, socioeconomic status explains 11% of U5MR in Bolivia and 22% in India. This means that there is a large overlap in mortality risk among births from different socioeconomic groups and, as a consequence, there is a large a number of high risk individuals outside that poorest group. In addition, being born to a particular country does not predict your mortality risk very well, which means that between country comparisons also miss most of the variability in mortality risk.

In addition of being incomplete, between country comparisons are often done in terms of average level of child mortality. However, we show that countries’ distributions of mortality risk are right skewed because some births experience substantially higher mortality risk than the national averages. These are left behind populations who are largely unnoticed when we only look at average mortality in socioeconomic groups. The typical modal mortality rate in each country is very different from the national averages of child mortality. Thus between-country comparisons using national averages are not comparing typical mortality levels between countries.

Finally, most equity based policy strategies that target births are based on a single risk factor, usually poverty status. However, efficiency gains from targeting the 20% highest risk births versus the 20% poorest are substantively important for all countries that we have data for, with efficiency gains ranging from 26% in Mali (1996), to more than 550% in Guyana (2009), likely due to the fact that it is one of the few countries with an apparent decrease in mortality risk with increasing wealth. Although the 20% highest risk births are usually the poorest and from rural areas, as might be expected, including other risk factors and their interactions considerably improves the identification of left behind individuals.

One previously overlooked characteristic is the importance of having experienced a prior death of a child.^{28,29} This is likely the case because this variable represents several unmeasured risk factors at the maternal level. However, it is an observable variable and can be the object of policy targeting. And it should be used to do so. We find that this is a particularly important characteristic for Sub-Saharan Africa countries in our sample. For these countries, just targeting mothers that have already experienced the death of a child could be an effective way to reach high risk populations.

Taken together these results support the view that measuring national averages of under-5 mortality is insufficient to identify left behind groups.^{5,30-34} The concerns raised by United Nations General Assembly Resolution 68/261 are real and important, and we have shown that policy makers and international agencies should routinely implement disaggregation of inequality measures by several demographic variables simultaneously.^{4} However, our findings suggest that monitoring inequality between socioeconomic groups of births may not enable policy makers to accurately identify many left behind children. We recommend using nationally representative surveys or administrative data to estimate mortality risk at the individual level to identify left behind populations that can be the target of interventions. We also recommend our methods to properly quantify and monitor high risk populations.

Our findings should not be interpreted as recommending against targeting the poor. Poverty *alone* is not the best guide for equity based policies because other risk factors are also important. Poverty status needs to be combined with other available information to identify high risk births. This is important for both low and high mortality countries, because children in need are spread out across socioeconomic groups. Further, since high risk children tend to be poor and from rural areas, most interventions that work for the poorest children will probably work for the highest risk children. Thus we are not suggesting major changes in interventions targeting high risk populations. Instead, we are proposing a new methodology that combines information from multiple well known risk factors simultaneously to identify high risk births. Our approach considers interactions among risk factors that are readily available for LMIC via nationally representative health surveys, and frees researchers and policy makers from having to decide which risk factors capture most of the inequality in each country-year.

The methods developed in this paper have broader applicability and are flexible enough to be applied to a number of different scenarios. For example, some countries with good vital registration system could use their administrative data instead of surveys. When people wish to implement an intervention in a particular country, our methodology points the way to a more targeted and impactful intervention. Implementers will need to choose variables, and they may choose different predictors than we have chosen, depending on data available and political and medical considerations. This is acceptable and something we consider a necessary part of implementing our methods in practice.

Our recommendations are also related to a large body of literature in medicine and public health that develops risk scores for individuals to identify those at risk of some event. These scores have been applied to a variety of outcomes and our results suggest the possible usefulness of such scores for identification of high risk children.^{35} Our approach requires representative surveys of the population, such as DHS or Multiple Indicator Cluster Surveys (MICS) so that we can rank children by mortality risk based on demographics. Policy makers could use mobile apps, which are now widely used for data collection, to collect and combine information on the children, calculate their risk, and then check whether their score is above or below a pre-determined threshold. We would not suggest a single risk score for the entire world. Rather, we would develop a score for each country, and we would update the score as new data became available.

The calculus of the efficiency gains assumes that interventions have the same costs for each birth. In reality, costs need to be adjusted according to local conditions. However, our approach provides a baseline to which any other allocation algorithm should be compared. Every comparison allocation scheme also needs to accommodate costs, not just our allocation scheme. For example, targeting the poor is likely easier in urban settings than in rural settings, and this would be a differential cost for the simple “intervene with the poor” intervention. It is possible to incorporate costs; one would multiply estimated probability of mortality times cost, then follow our same procedure to identify a combination of cheapest and most at risk to intervene with, until the budget had been spent. Instead of identifying the 20% most at risk, one would tabulate costs until the allocation funds had been spent. No matter differential costs, combining information from multiple observable risk factors better identifies high risk populations. Having identified higher risk populations, public health officials can then work to bring down costs, and best target at-risk births.

Our methodology has not explicitly included the complex sampling design from the DHS. We did this to create a more parsimonious set of methodological innovations. We treated DHS samples as a random sample. However, we have included all variables used to stratify the surveys, which implicitly incorporates some of the sample design in our analysis. Future research should explicitly incorporate survey design.

In conclusion, our results show that despite progress toward reducing national averages of under-5 mortality, we still have substantial inequality within groups of births defined by commonly used stratifiers that measure progress toward SDG’s. Our results suggest that researchers and policy makers should also quantify inequality in mortality risk within groups of births in addition to between-groups comparisons. Quantifying both between and within group inequality helps us to have an accurate picture of inequality in under-5 mortality and to identify left behind populations that otherwise cannot be easily identified.

## Data Availability

Only secondary data sources were used. Data is available at https://dhsprogram.com/. Computer code is available upon request.

## Appendix

### A1: Tranforming the Original Wealth Index

We use the original wealth information from DHS files to construct our own wealth scores. We made the original scores more comparable across surveys, while preserving the richness of their numerical variation.

The original wealth indices were constructed using Principal Components Analysis (PCA) on household ownership information: of radios, TVs, and other domestic equipment; whether the household has electricity and clean water; type of materials used in the walls, floor and roof; and the type of toilet in the household.^{36} Scores are calculated at the household level, survey-by-survey. There are two original versions: a numeric version and a categorical wealth quintile version, based on the numerical version. Neither version is standardized across surveys and the numeric version’s range varies from survey to survey.

Previous studies using these scores used the wealth quintile. Although being in a particular quintile in a particular survey is not comparable with being in the same quintile from another survey — even within the same country — quintiles can still be interpreted as the relative wealth or socioeconomic household rank in each survey. Thus being in the poorest quintile always means to be among the 20% poorest in each survey, although poverty levels are not the same. However, these scores can be and are used in a comparative fashion within each survey as a socioeconomic gradient.

We constructed a numerical variable that has the same interpretation as the quintile, while preserving within-quintile variability in wealth. This is particularly useful to aid the estimation of mortality risk. Our solution to make the numerical scores comparable across surveys is to convert them from the original numerical version, to a cumulative density function (cdf), which is bounded by the unit interval (0,1). Although our new score is a numerical score, it has the same interpretation as quintiles in terms of relative socioeconomic rank within surveys. However, it no longer ignores within quintile variability and thus provides us a richer source of information.

The original scores are calculated at the household level, not birth level. However, we assign scores to births, as our analysis is at the birth level. Since mothers from the poorest households generally have higher fertility than mothers from richer households, quintiles of births and of household do not match perfectly. In particular, the lowest household quintile will always have more than 20% of all births and the richest household quintile will always have less than 20% of all births.

The unit of our analysis is the birth. We use our wealth quintiles in the estimation stage. We use our wealth quintiles — at the birth level, not the household level — in our inequality analysis in tabulation, box plots and ANOVA, and in the statistical model.

### A 2: Model Notation and Formulation

Let *k* = 1,…, 182 index surveys, *i* = 1,…, *N _{k}* index births in survey

*k*,

*j*= 1,…,

*J*index covariates, and

*m*(

*i*) ∈ {1,…,

*M*} is the

_{k}*i*

^{th}child’s geographic location (sampling cluster) in the

*k*th country out of

*M*clusters in survey

_{k}*k*. Let

*y*be a binary indicator that the

_{ik}*i*

^{th}birth in country

*k*results in death prior to five years of age,

*y*= 1, else,

_{ik}*y*= 0.

_{ik}Let * X_{k}* be an

*N*×

_{k}*L*design matrix with rows containing the sex of the infant, residence (urban/rural), whether or not the mother already experienced the death of a previous child, the maternal age at birth, wealth CDF birth order, birth year, mother’s education in years, and functions of these variables. The continuous covariates were included in the model using piecewise transformations. For maternal age, we use a piecewise linear spline with knots at 18, 23, and 35. For wealth CDF, we use a piecewise linear spline with knots at 0.25, 0.50, and 0.75. For maternal education, we include three terms: a binary indicator for maternal education greater than 13 years and two terms corresponding to a piecewise linear spline with knot at 5 years for maternal education less than 13 years. For birth order, we include a binary indicator that birth order equal one, a binary indicator that birth order is six or more, and a linear term for birth order between two and 6. Additionally, all two, three, and four-sway interactions were included in the model, using untransformed values for the continuous variables instead of the splines.

We model *y _{ik}* using a random effects logistic regression,
where

*α*

_{0}

*is an intercept and*

_{k}*α*is a vector of regression coefficients, and in the

_{k}*k*

^{th}survey is a random effect for location m.

#### Prior Specification

For all *k* surveys, the variance parameters are given Inverse-Gamma(3,1.5) priors, and the elements *α _{jk}* of

*α*are given normal priors, where

_{k}*c*is the

_{j}*order*of the interaction so that

*c*= 1 for the intercept and main effects,

_{j}*c*= 2 for two-way interactions,

_{j}*c*= 3 for three-way interactions, and

_{j}*c*= 4 for four-way interactions. These priors shrink higher order interactions terms closer to zero to avoid overfitting.

_{j}### A 3: Bayesian ANOVA

Let n be the distribution of mortality risk in a country and Var(n) the variance of mortality risk. Var(II) can be expressed as the between group variance plus the sum of the variances within each group. Let **X** be a categorical or continuous covariate. Using the law of total variance we have the decomposition
where for categorical variables E(Var[II|X]) is the average within group variance and Var(E[II|X]) is the between-group variance of the group means. We fit linear regression models using OLS methods where mortality risk is the outcome and group membership is the predictor. We use *R*^{2} to measure how much of the total variance in n can be explained by membership in a particular socioeconomic group.

To propagate uncertainty from the estimation stage to the analysis of inequality stage we calculate an ANOVA for each MCMC sample giving a distribution of *R*^{2}. We use 1000 MCMC samples. We can also use this approach to make probabilistic statements, such as what is the probability that inequality is greater in one year than in another year.

### A 4: Additional Tables

## Footnotes

↵* Contact Author: tomramos{at}ucla.edu