Quantifying the impact of association of environmental mixture in a type 1 and type 2 error balanced framework

In environmental epidemiology, analysis of environmental mixture in association to health effects is gaining popularity. Such models mostly focus on inferences of hypotheses or summarizing strength of association through regression coefficients and corresponding estimates of precision. Nonetheless, when a decision is made against alternative hypothesis, it becomes increasingly difficult to tease apart whether the decision is influenced by sample size or represents genuine absence of association and whether the result warrants further investigation. Similarly, in case of a decision made in favour of alternative hypothesis, a significant association may indicate influence of large sample and not a strong effect. Moreover, the disparate type 1 and type 2 errors, might render these inferences unreliable. Using Cohen's f2 to evaluate the strength of explanatory associations in a more fundamental way, we herein propose a new concept, optimal impact, to quantify the maximum explanatory association solely contributed by an environmental mixture after controlling for confounders and covariates such that the type 2 error remains at its minimum. Optimal impact is built upon a novel hypothesis testing procedure in which the rejection region is determined in a way that type 1 and type 2 errors are balanced. Even when an association does not achieve statistical significance, its optimal impact might deem it meaningful and strong enough for further investigation. This idea was naturally extended to estimate sample size in designing studies by striking a balance between explanatory precision and utility. The properties of this framework are carefully studied and detailed results are established. A straightforward application of this procedure is illustrated using an exposure-mixture analysis of per and poly fluoroalkyl substances and metals with serum cholesterols using data from 2017 & 2018 US National Health and Nutrition Examination Survey.


Introduction
There has been a welcome surge of interest in estimating health effects of exposure-mixture in environmental epidemiology (Bobb et al. (2014), Carrico et al. (2015), Colicino et al. (2020), Keil et al. (2020), Wheeler et al. (2021, Ferrari and Dunson (2021)). These developments are certainly promising but most of these methods use traditional null hypothesis significance testing (NHST) for exposure mixture-outcome associations. However, NHST has been severely criticized as a contributor to replication crisis in psychology, and biomedical sciences (Nakagawa and Cuthill (2007), Szucs and Ioannidis (2017)). NHST may contribute to selective reporting and subjectivity since it does not require us to designate what the data under alternative hypothesis should predict. Even for large sample sizes, it guarantees that any irrelevant and tiny effect sizes are detectable (Ioannidis et al. (2014), Wasserstein and Lazar (2016)). Additionally, the strength of association in these models is determined through regression coefficients and precision estimates like standard errors and p-values. Therefore as a direct consequence, when a decision is made against alternative hypothesis, it becomes increasingly difficult to tease apart whether the decision is influenced by sample size or it genuinely represents an absence of association. Similarly, in case of a decision made in favour of the alternative hypothesis, a significant association might indicate the influence of large sample and not a strong effect. In addition, the dependence on sample size and disparate type 1 and type 2 errors, further complicates reliability of any inference.
To circumvent such issues, researchers are starting to report in-sample scale-independent partial R 2 or F 2 type statistics along with regression estimates to indicate strength of explanatory association to quantify the effect of environmental mixture on health outcomes. But simply reporting these statistics does not alleviate the curses of NHST or imbalances of type 1 and type 2 errors. A long established index to report strength of explanatory association in a more fundamental way is Cohen's f 2 (Cohen (1988)), which evaluates the impact of additional variables in the context of multiple linear regression. Through the past three decades, Cohen's f 2 continues to be extensively used in behavioral sciences, sociology and biomedical sciences, due to its immense practical utility and ease of interpretation.
In this paper, we propose herein optimal impact using Cohen's f 2 to evaluate strength of explanatory association in a more fundamental and scale-independent way, by quantifying the maximum explanatory association solely contributed by an environmental-mixture on top of confounders and covariates such that the type 2 error remains at its minimum. optimal impact is built upon a novel hypothesis testing procedure in which the rejection region is determined in a way that type 1 and type 2 errors are balanced and both exponentially diminish to 0 as n → ∞, under a meaningful deviation from null (and not just any deviation). This is similar to formulating a medical diagnostic test in which the threshold is adjusted to balance sensitivity and specificity (Zhou et al. (2011)). Utilizing the nuances in optimal impact, we also shed light on sample size estimation in designing time and cost effect studies from the perspective of explanatory power.
In subsections 2.1 and 2.2, we discuss Cohen's f 2 in linear and generalized linear models. Next, in 2.3, we develop the framework of type 1 and type 2 calibrated hypothesis testing. In subsection 2.4, we discuss theoretical implications of this hypothesis testing framework and consequently in subsection 2.5 we develop the concept of optimal impact. In section 3 we present a simulated example for illustration. In section 4, we estimate optimal impact of per-and-poly-fluoroalkyl substances and metals for serum cholesterols based on data from 2017-2018 US National Health and Nutrition Examination Survey (NHANES). Finally, we end this paper with a discussion.
2 Methods X 1 b 1 + ε, and we are interested to know the impact of the association of X 1 after adjusting for X 0 and formulate the hypothesis, where δ is a pre-defined meaningful quantity and δ > 0. For example, Selya et al. (2012) reports that after controlling for gender and smoking quantity, the additional impact of the association between the outcome (nicotine dependence) and exposure (smoking frequency) is found to be 0.32.
Let, S(y) be a test statistic based on observed data y and T be a type 1 and type 2 error calibrated cutoff which depends on sample size n and unknown parameters p 1 and effect size δ. Then one can define a testing procedure by its type 1 and type 2 errors as below type 1 error = P (S(y) > T|Additional impact = 0) type 2 error = P (S(y) < T|Additional impact = δ). (2)

Cohen's f 2 in Linear Regression
Consider standard multiple linear regression model with error ϵ ∼ N (0, σ 2 I n ), where I n is an identity matrix of dimension n × n. Letb 0,H0 be the maximum likelihood estimate (MLE) for model with only design matrix X 0 whereasb 0,H1 andb 1,H1 be the MLEs for the model with design matrices X 0 and X 1 . The standard test to compare a null and alternative is through F statistic, F (y) = (SSR0−SSR1)/p1 SSR1/(n−p0−p1) , where SSR 0 = (y − X 0b0,H0 ) t (y − X 0b0,H0 ) is the sum of squared errors under H 0 and SSR 1 = (y − X 0b0,H1 − X 1b1,H1 ) t (y − X 0b0,H1 − X 1b1,H1 ) is the sum of squared errors under H 1 . Then F (y) ∼ F p1,n−p0−p1 (γ n ), where p 1 and n − p 0 − p 1 are the degrees of freedom and γ n is the non-centrality parameter. As n → ∞ while p 0 , p 1 remain fixed, this F distribution can be approximated by chi-squared distribution, lim n→∞ p 1 F (y) ∼ χ 2 p1 (γ n ). The non-centrality parameter γ n equals 0 when y is generated under H 0 . When y is generated under the alternative, γ n has the form of γ n = is the projection matrix on to the linear space spanned by the column vectors 4 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 4, 2022. ; of X 0 (Wilks (1938), Brown et al. (1999)) (Section S.1 of the supplementary material). γ n quantifies the additional impact in y due to X 1 relative to the error variance σ 2 . For the common regression design in which the predictor vector of each subject is drawn from a common population, γ n grows linearly on n.
Note that γ n does not depend on y but depends on the design matrix X and underlying parameter b 1 and σ 2 . A long established index of quantifying additional impact in linear regression is Cohen's f 2 , where R 2 y,X0,X1 and R 2 y,X0 are the squared multiple correlation for X 0 , X 1 under H 1 and X 0 under H 0 respectively. The f 2 quantifies the proportion of impact in y accounted by X 1 on top of the impact accounted by X 0 , a concept that most researcher can relate to intuitively (Selya et al. (2012)).We then establish the following Lemma to connect Cohen's f 2 and non-centrality parameter γ n in linear regression.
The proof is presented in Section S.2 of the supplementary material. We can borrow the common convention for f 2 (Cohen (1988)) and call f 2 ≥ 0.02, f 2 ≥ 0.15 and f 2 ≥ 0.35 as representing small, moderate and large effect size respectively. This can serve as the guidance in understating the effect size obtained from the data.

2.2
Cohen's f 2 in Generalized Linear Models.
µ is related to the canonical parameter θ through the function µ = b ′ (θ), where b ′ denotes the first derivative of b. Then the model is completed by η = g(µ) = X 0 b 0 +X 1 b 1 . Here the standard test is likelihood ratio test for testing of hypothesis, Λ(y) = 2 {ℓ(b 1,H1 ,b 0,H1 |X 0 , X 1 ) − ℓ(b 0,H0 |X 0 )}. As the sample size n → ∞, the likelihood ratio statistic Λ(y) follows a central chi-squared distribution χ 2 p1 with p 1 degrees of freedom, when y is generated under the model in H 0 . Λ(y) follows a non-central chi-squared distribution χ 2 p1 (γ n )

5
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 4, 2022. ;https://doi.org/10.1101https://doi.org/10. /2022 with degrees of freedom p 1 and non-centrality parameter γ n , when y is generated under H 1 . However, there is no simple and explicit form for γ n . Self et al. (1992) and Shieh (2000) defined γ n in likelihood ratio test for generalized linear models as γ n ∶= E y∼H1 {Λ(y)}. Similar to the non-centrality parameter in linear regression, γ n grows linearly with n since E y∼H1 {Λ(y) o(1) (where θ and θ * denote the canonical parameter values evaluated at (b 0 , b 1 ) and (b * 0 , b 1 = 0) and b * 0 is the limiting value ofb 0,H0 as described in equation (2.2) of Self and Mauritsen (1988) (see Cordeiro (1983) and section 2 of Shieh (2000) for a detailed derivation)).
Consider the adjusted coefficient of determination for generalized linear models as shown below (note that the definition of squared multiple correlation is generally accepted for linear regression but it is not directly applied to a generalized linear models (for example a logistic regression) and there remains a lack of general consensus discussed in literature (see Liao and McGee (2003) for more details)), R 2 l = 1 − ℓ(y| Any predictor X) ℓ(y| Intercept Only Model) . Then Cohen's f 2 in generalized linear models can be written as, where, R 2 l,H0 and R 2 l,H1 are the adjusted coefficient of determinations under the null and the alternative respectively. Similar to Lemma 1, we connect f 2 and γ n in generalized linear models as below, Lemma 2. .
For simplicity and keeping similarity with Lemma 1 the O(n) is not expanded further. The proof and more details are presented in Section S.3 of the supplementary material. Both the Lemma (1) and (2) help connect the non-centrality parameters and Cohen's f 2 , which will be utilized later in the following sections.
6 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Let the test statistic be S(y) = p 1 F (y) for a linear regression and S(y) = Λ(y) for other generalized linear models as in Section 2.1. For a given cutoff T, the type 1 error and type 2 error are given by where χ 2 p1 denotes central chi-squared random variable with p 1 degrees of freedom and χ 2 p1 (γ n ) denotes non-central chi-squared random variable with p 1 degrees of freedom and γ n as the non-centrality parameter.
Our central idea is to choose T so that type 1 error α and the type 2 error β satisfy the relationship, α(T) = β(T). Using the chi-square approximation to test statistic S(y), we can solve for the calibrated cutoff T by equation When T is fixed, the left size of equation (5) remains constant as n → ∞ while the right side diminishes to 0 rapidly under non-centrality parameter nδ. Therefore, equation (5) implies T → ∞ as n → ∞. In the Theorem stated below we elaborate more on T. The results in theorem 1 depend on the normality approximation of the non-central chi-square distribution, i.e. for large n, equation (5) was rewritten as, Cohen's f 2 . Assume data y is generated under the alternative with f 2 = δ. Then following the constraint α = β as in (5) and for large n, the error calibrated cutoff T has the following expression, 7 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 4, 2022. ; Further, the type 1 error (α) or the type 2 error (β) rates can be expressed as, (1)) and c 1 is a constant of integration.
The proof is presented in Section S.4 of the supplementary material. Theorem 1 sheds light on the structure of the cutoff T and the rates of the corresponding type 1 or type 2 errors when the sample size n is large. Since both the errors go to 0 as n → ∞, this procedure for testing of hypothesis is consistent while keeping the error rates equal. It should be noted that both the errors decay at an exponential rate and therefore deems useful even at moderate sample sizes. In order to convince the accuracy of Theorem 1, we presented the type 1 and type 2 error rates as well as the rate of change of T with respect to n using the results from theorem 1 and corresponding numerical results from equation (6). As seen from Table 1, irrespective of the Cohen's f 2 , as n increases, the rate of change of T, log (type 1) and log (type 2) converge to the corresponding theoretical rates specified in Theorem 1. Further, the error rates only depend f 2 but not on p 1 (the number of the exposures on top of the baseline covariates). In addition, through Monte Carlo simulation we showed the calibrated type 1 and type 2 errors remain approximately same under T in a linear regression framework (Section S.5 and Table S.1 of the supplementary material).

Type 2 error function
Theorem 1 suffices when the data is generated under the Cohen's f 2 = δ and the alternative is set at H 1 ∶ But what happens when the true f 2 , under which the data is generated, is not δ ? To highlight and investigate these subtleties, let us assume that the true Cohen's f 2 = ϵ, under which the data is generated.

Case 1: when
The ϵ * is the root to 8 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
Corollary 1. Under the hypothesis (1) and as n → ∞, First note that β(ϵ) is monotonically decreasing in ϵ (see Section S.6 of the supplementary material). Therefore the type 2 error is even smaller when the true Cohen's f 2 is larger than δ. Hence asymptotically the β(ϵ) → 0 (see supplementary material). Now we investigate what happens to the type 2 error when y is generated while ϵ lies between (0, δ).

Neutral effect size, null and alternative neighborhood.
When true Cohen's f 2 , ϵ ∈ [0, ϵ * ), T prefers the H 0 with probability greater than 1 2 . Therefore, we can think of the interval [0, ϵ * ) as an expanded null hypothesis which nicely connects to the interval null hypothesis discussed in literature (Morey and Rouder (2011), Kruschke (2013) and Liao et al. (2020), Midya and Liao (2021)). We name ϵ * as the "neutral effect size". Note that, ϵ * decreases as n increases although very slowly due to a term O ( 1 n 1− 1

2K
) and eventually converges to δ 2K−1 . The interpretation of "neutral effect size" is that, as long as the true Cohen's f 2 , denoted by ϵ, originates below ϵ * , the probability of rejecting the H 0 gets smaller than 1 2 even when the true effect size ϵ is far from zero. Whereas, if true effect size ϵ originates above ϵ * , the probability of rejecting the null becomes greater than 1 2 . Similarly, an interval of the form {x|x > ϵ * } can be conceived such that, for any ϵ > ϵ * , the probability of rejecting the null remains greater than 1 2 . The interval denoted is named as the "alternative neighborhood". To demonstrate the concepts discussed above, β(ϵ), i.e. the type 2 error function is plotted for n = 250 and n = 1000 at p 1 = 5 and δ = 10% in Figure 1 with ϵ from 0 to 10% on x-axis. β(ϵ) decreases smoothly starting from 0 as ϵ increases. The neutral effect sizes, ϵ * are 2.6% and 2.2% for n = 250 and n = 1000 respectively. The shaded red region denotes the "alternative neighborhood" with type 2 error below 1 2 , whereas the shaded blue region denotes the "null neighborhood" with type 2 error above 1 2 . As long as the true effect size (ϵ) of the underlying data, is greater than the neutral effect size, the error calibrated cutoff T will favour the alternative. Whereas the null will be favoured only when the true effect sizes are less than 9 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 4, 2022. ; https://doi.org/10.1101/2022.03.02.22271732 doi: medRxiv preprint the neutral effect sizes. Note that, neutral effect size, ϵ * decreases from 2.6% to 2.2% as the sample size n increases but the rate of decrease is very slow. This plot therefore reinforces the idea that no matter how large the sample size is, this error calibrated cutoff will only reject null in the favour of the alternative if and only if the true effect size originates from a neighbourhood {f 2 |f 2 > ϵ * }.
2.5 Putting estimation in the type 1 and type 2 error balanced hypothesis testing framework 2.5.1 Notion of optimal impact .
What should be a prudent choice of δ ? The aim should be to choose larger δ with minimum type 2 error.
Since for any data, f 2 ≥ 0, choosing a slightly smaller deviation from zero, minimizes the type 2 error but rapidly increases the type 1 error.
For any given δ, we reject the null if and only if Λ(y) ≥ T(δ). Given this simplicity of the decision rule, one can choose the maximum value of δ such that the alternative H 1 ∶ f 2 = δ will always be preferred against the null; but when that maximum value of f 2 is crossed, the null can no longer be rejected. Denote this particular choice of Cohen's f 2 by δ * . Since T(δ) is an increasing function of δ, one can obtain an unique δ * by solving, Note that, while considering δ * , we exclude those scenarios where Λ(y) < T(δ) for all δ, implying that the null H 0 ∶ f 2 = 0 will be accepted, no matter what δ is chosen.
Corollary 2. Under the hypothesis in (1), the maximum value of Cohen's f 2 such that the asymptotic type 2 error is at its minimum, is given by: To contextualize and interpret δ * , consider the following hypothesis and the null and alternative neighborhood it induces.
10 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 4, 2022. ; https://doi.org/10. 1101/2022 Note that an asymptotic estimate of the true Cohen's f 2 , is given by Λ(y) n (Lemma (1) and (2)). The neutral effect size for the hypothesis in (9) is Λ(y)−c1n 1 2K n = ϵ * (say) (from Corollary (1) and (2)). Therefore asymptotically, the null neighborhood is [0, ϵ * ) and the alternative neighborhood is {f 2 |f 2 ≥ ϵ * }. As long as the true Cohen's f 2 ≥ ϵ * , the null will be rejected in support of the alternative. But if one chooses a larger δ = δ * + h, for any h > 0, in hypothesis (9) the alternative neighborhood will be squeezed to {f 2 |f 2 ≥ ϵ * + h 2K−1 }. Hence, even if the true Cohen's f 2 is larger than ϵ * and lies within [ϵ * , ϵ * + h 2K−1 ), the null will no longer be rejected. Further, note that the type 2 error function β(h) = P (χ 2 p1 (γ n ) < T(δ * + h)|γ n = nδ) can be expressed as, )+o(1) and it attains minimum when h = 0, i.e. when δ attains its maximum at δ * . Any h > 0 therefore rapidly increases the type 2 error as the sample size increases. In summary, for a given data, δ * quantifies the maximum "impact" by any exposure-mixture in a larger model on top of the smaller baseline model, such that the type 2 error is at its minimum.
Definition 1. Considering the stochasticity in y and given sample size n, we define, impact ∶= E y {δ * } and optimal impact ∶= E y {δ * } with n → ∞.
impact depends on the sample size n. But it is a function of Λ(y) and undertakes asymptotic convergence based on weak law of large numbers (Lemma 3.1 of Vuong (1989)). Under this framework, therefore, the expected impact converges to an optimal quantity, E y { Λ(y) n (2K − 1)} as n → ∞. The implication of optimal impact is very vital. Under the hypothesis in (9), it captures the optimal "sample size stabilized" impact in outcome impacted solely by the larger exposure-mixture model on top of baseline covariate only model, such that the asymptotic type 2 error is at its minimum.
2.5.2 Data driven estimation of optimal impact and sufficient sample size.
optimal impact can be estimated by bootstrapping a large size N (say N = 5000 or 10000) with replacement from the original sample of size n, with n < N . Moreover, because of its convergence, one can find a sample size and a corresponding impact such that it will be in a "practically close neighbourhood" of the optimal impact.
Consider the equivalence tests for the ratio of two means with prespecified equivalence bounds (Schuirmann (1987) and Phillips (1990)). N and n s be the sample sizes under which we estimate optimal impact and im-11 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 4, 2022. ; https://doi.org/10.1101/2022.03.02.22271732 doi: medRxiv preprint pact respectively. Let δ s and δ opt be the underlying random variables for the impact and optimal impact respectively. We are interested in the distribution of the log transformed ratios of δ s and δ opt , i.e. log { δ s δ opt }. Consider the hypothesis of non-equivalence as below, where, l R and l U are the lower and upper equivalence bounds with l R < 0 and l U > 0. The null hypothesis will be rejected to favour the alternative if a two-sided 100(1 − 2α)% CI is completely included within l R and l U . We will assume l R = log(0.8) and l U = log(1.25) following typical practice (Phillips (2009)) but less stricter values can be chosen for practical purposes. We approximate µ ( δ s δ opt ) and σ ( δ s δ opt ) by using Taylor series expansions (detailed in Section S.2 of supplementary material). The mean and variance after logarithmic transformation are found using direct application of delta theorem on δ s δ opt . Finally, we declare alternative hypothesis if the 2α level CI on µ (log { δ s δ opt }) is within the equivalence limits, i.e., where, t 1−α,M −1 is the 100(1 − α)% th quantile in a standard t-distribution. As long as the hypothesis of non-equivalence in (10) is rejected in favour of the alternative, n s can be regarded as a "sufficient sample size" at equivalence bounds of [log( 8 10 ), log( 10 8 )] with a corresponding impact ofμ(δ s ).

Simulated examples
Consider a normally distributed outcome and one single exposure with five baseline covariates based on sample size of 300. Further assume, the R 2 for the baseline covariate only model is 20%, and the true and unknown impact due to the exposure is 5.8%. Therefore, the R 2 for the larger model with a single exposure and five covariates is 20.8% (the mean correlation between the covariates is set at 0.3 and the error variance is assumed to be 5). See Section S.7 of the supplementary material for the data generating process.
Assume a researcher collected this data and intends to find the association between the outcome and the 12 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 4, 2022. ; https://doi.org/10.1101/2022.03.02.22271732 doi: medRxiv preprint exposure after controlling for the five baseline covariates. As a first step, the optimal impact is estimated by bootstrapping a size N = 5000 based on the original sample of n = 300. We obtain an estimate of optimal impact : 6.1% (which is very close to the true impact of 5.8%). Similarly, impacts are estimated at bootstrapped samples of sizes N = 200, 300, 400, 500, 600 and 2500 to illustrate the gradual convergence of optimal impact as the bootstrap size increases (Fig. 2-A).
Further note that, as precision increases with sample size, the absolute value of the regression coefficient remain stable (Fig. 2-B) while the p-values keep getting smaller (Fig. 2-D). For the original sample size of n = 300, the corresponding p-value of the regression estimate of the exposure, is not significant. The researcher therefore might want to collect more data and increase the original sample size based on statistical power calculation and sample size determination -which estimates that a total sample size of around 1000 is required assuming 80% power and type 1 error fixed at 5%.
These impacts are within a close neighbourhood of the estimated optimal impact of 6%. The optimal R 2 of the larger exposure mixture models at sample sizes 600 and 300, will be 20.9% and 21.1% -making them equivalent in most practical purposes.

optimal impact and p-value
Assume a population of size 5000. A researcher is interested to find the association between an exposure and health-outcome after controlling for some baseline covariates. They plan to conduct a preliminary study with a sample size of 300 and then eventually increase the sample to 1000. For the true population, the optimal impact is a set at 2.9% with β(p-value): 0.06(1.41 × 10 −5 ). In Figure S.1 of supplementary material, we 13 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 4, 2022. ;https://doi.org/10.1101https://doi.org/10. /2022 show that as sample size increases, the p-value corresponding to the regression coefficient of the exposure decreases and eventually crosses the canonical cutoff 0.05; whereas the mean of optimal impacts converge to the true value. Even at original sample size 300, the mean of optimal impacts remain very close to the true value. Thus, irrespective of the size of p-values, scale-independent optimal impacts remain practically unaltered with the change of sample size.
4 Application in exposure-mixture association of Per-and poly-fluoroalkyl substances (PFAS) and metals with serum lipids among US adults Endocrine-disrupting chemicals (EDCs) are a diverse class of environmental pollutants with "emerging concern" that interfere with multiple metabolic and hormonal systems in human (Futran Fuhrman et al. (2015)).
EDCs include pesticides, pharmaceutical agents, toxic metals, plasticizers which are used in many commercial products (Buhari et al. (2020)). PFAS are exclusively man-made EDCs and environmentally persistent chemicals which are used to manufacture a wide variety of consumer and industrial products, non-stick, stain and water resistant coatings, fire suppression foams, and cleaning products (Liu et al. (2018) and Jain and Ducatman (2018)). Both PFAS and metals have been associated with increase in cardiovascular disease (CVD) or death using many cross-sectional and longitudinal observational studies and experimental animal models (Meneguzzi et al. (2021)). Hypercholesterolemia is one of the significant risk factors for CVD and it is characterized by the presence of high levels of cholesterol in the blood. High serum low-density lipoprotein (LDL), total serum cholesterol levels, and low levels of high-density lipoprotein (HDL) in the blood are one of the incriminating factors for the pathogenesis of this disorder (Buhari et al. (2020)).
Using the theory discussed in the sections above, we aim to quantify and contrast the optimal impacts by PFAS and metal mixture on serum lipoprotein-cholesterols after adjusting for baseline covariates. The goal therefore is to estimate optimal impacts and corresponding sufficient sample sizes for both the PFAS and metal mixture.
14 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Study Population
We have used a cross-sectional data from the 2017-2018 US NHANES (CDC and NCHS (2018)). The present study has data on 683 adults. Data on baseline covariates (age (in years), gender, ethnicity, body mass index (bmi) (in kg/m 2 ), smoking status, ratio to family income to poverty) were downloaded and matched by IDs of the NHANES participants. See Table 2 for details on characteristics of the study population. To adjust for oversampling of non-Hispanic black, non-Hispanic Asian, and Hispanic in NHANES 2017-2018, a weight variable was added in the regression models. List of individual PFAS, metals and their lower limit of detection can be found in Section S.8 in the supplementary material.

Methods for exposure-mixture analysis
We will use WQS regression as our explanatory model but other exposure-mixture models such as Bayesian weighted quantile sum regression and Bayesian kernel machine regression can also be utilized, so long as the likelihood ratio test statistic can be estimated. All the PFAS and metals were converted to deciles. Further intentionally, no validation set was used to keep the problem of estimating optimal impact and finding best partitioning the dataset, separate from one another. As an additional analysis, both the serum cholesterols were also dichotomized using their 90 th percentile, to demonstrate the utility of optimal impact on binary outcomes. The optimal impacts were estimated using bootstrapped samples of 5000 from the original sample of size 683 and Monte Carlo simulations were repeated 100 times.

Results
4.1.1 PFAS and Metal mixture have higher optimal impacts on LDL-C than HDL-C For metals and PFAS, the optimal impacts of continuous HDL-C were 9.6%[95% CI: (9.1%, 10.0%)] and 10.7%[95% CI: (10.2%, 11.1%)] respectively, whereas for continuous LDL-C, those were 14.7%[95% CI: (14.2%, 15.2%)] and 16.2%[95% CI: (15.6%, 16.7%)] respectively. Both the EDC mixture have relatively higher impact on LDL-C than HDL-C. Further, for both the cholesterols, metal-mixture has slightly higher impact than the PFAS-mixture (Fig.3 -A and B). After dichotomizing both the cholesterols at their 90 th percentile, the optimal impacts for metal-mixture remained similar to the continuous choles-15 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
As a post-hoc analysis and simple demonstration, we also calculated the sufficient sample sizes for this data-set at the equivalence bounds of [log( 75 100 ), log( 100 75 )]. For both metal and PFAS-mixture, the mean of log ratio estimates and their corresponding 95% CI for the original sample size at 683, lie well within the equivalence bounds. Further even at a decreased sample size of 483, means of log ratio estimates and their 95% CIs , still remain with the equivalence bounds. Therefore, N = 483, is a sufficient sample size at equivalence bounds [log( 75 100 ), log( 100 75 )] for both metal and PFAS-mixture (Fig. 4). But further decrease in the sample size, would not be sufficient, at this pre-fixed equivalence bounds.

Discussion
In this paper, we presented the idea of optimal impact of exposure mixture in association to health outcomes within a type 1 and type 2 error balanced hypothesis testing framework to evaluate the strength of explanatory associations. The utility of an explanatory model is evaluated through its strength of association (Shmueli (2010)) and therefore this framework provide a systematic way for theory building in environmental epidemiology. optimal impact does not get perturbed by study sample sizes as long as the studies are well representative of the target population. For an exposure mixture to have large optimal impact but statistically non-significant regression coefficient might be a result of smaller sample size but not a genuine absence of association. Further, this idea was naturally extended to estimate sample size in designing studies by striking a balance between explanatory precision and utility of association estimates. This framework has its limitation, the bootstrapped estimation of optimal impact assumes the original sample is well representative of the true target population. Any estimation of optimal impact, therefore carries this implicit assumption. But such an assumption is at the core of many statistical analyses and a well designed study can ideally alleviate such issues or could be corrected to be well representative. In addition, this current theory is based on likelihood ratio test of nested models but future work can extend this 16 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 4, 2022. ; framework to strictly non-nested or overlapping models. Progress can be made to estimate optimal impact on high dimensional setting where the number of parameters is strictly lesser than the sample size but p → ∞.
optimal impact might be of practical importance when one finds null associations at the time of data analysis with prefixed sample sizes. Further sample size determination based on preliminary data might utilize optimal impact in designing more cost-efficient human studies. Because of its connect to Cohen's f 2 , optimal impact remains a standardized effect size, which is unitless and allows for direct comparisons between several outcomes measured on different scales or separate studies or in meta-analysis. Thereafter, optimal impact can be potentially used to compare and choose between multiple outcomes with varying units and scales. Additionally, by connecting the error balanced testing of hypothesis framework to Cohen's f 2 , we circumvented the issues of NHST. In the end, quantifying the impact of exposure-mixture on several health-outcomes might have direct implication for policy decisions and when used together with regression estimates, might prove to be very informative.

Software
The codes used in the article is available on GitHub (vishalmidya/Quantification-of-variation-in-environmentalmixtures)

Supplementary Material
Supplementary material is provided in a separate file.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 4, 2022. ;https://doi.org/10.1101https://doi.org/10. /2022 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 4, 2022. ;https://doi.org/10.1101https://doi.org/10. /2022  CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

25
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.