Random-effect based test for multinomial logistic regression: choice of the reference level and its impact on the testing

Qianchuan He; Yang Liu; Meiling Liu; Michael C. Wu; Li Hsu

doi:10.1101/2021.04.13.21255272

ABSTRACT

Random-effect score test has become an important tool for studying the association between a set of genetic variants and a disease outcome. While a number of random-effect score test approaches have been proposed in the literature, similar approaches for multinomial logistic regression have received less attention. In a recent effort to develop random-effect score test for multinomial logistic regression, we made the observation that such a test is not invariant to the choice of the reference level. This is intriguing because binary logistic regression is well-known to possess the invariance property with respect to the reference level. Here, we investigate why the multinomial logistic regression is not invariant to the reference level, and derive analytic forms to study how the choice of the reference level influences the power. Then we consider several potential procedures that are invariant to the reference level, and compare their performance through numerical studies. Our work provides valuable insights into the properties of multinomial logistic regression with respect to random-effect score test, and adds a useful tool for studying the genetic heterogeneity of complex diseases.

Random-effect based score test has been widely used to investigate the association between a set of genetic variants and a health outcome/trait (Wu et al., 2011; Maity et al., 2012; Sun et al., 2013). While various outcomes/traits have been considered for random-effect based score test, the multinomial outcome has received little attention until recently. Multinomial outcome analysis has important practical applications, such as the subtype analysis which concerns the association between genetic variants and multiple subtypes of a disease (Eckel-Passow et al., 2019). In multinomial analysis, one level is specified as the reference level, and the other levels are compared to this level to examine the association between the outcome and genotypes. It is generally anticipated that a statistical test should be invariant to the choice of the reference level. However, in a recent study, we made the observation that such a test in general is not invariant to the choice of the reference level (Liu et al., 2021). This is intriguing, because the logistic regression model -a model often considered as a special case of the multinomial logistic regression model -has long been observed to possess the invariance property. Moreover, the lack of invariance property for multinomial logistic regression is highly undesirable in practice, because practitioners may make potentially contradictory conclusions due to different choices of reference levels. Here, we elaborate this issue and conduct investigations to understand the fundamental cause of the problem. We first explain why the considered test is not invariant to the choice of the reference level, and then derive the analytical form of the power function when a given level is used as the reference. We next use simulations to compare several potential ways to deal with the non-invariance issue, and then provide practical guidelines at the end of the letter.

Consider a multinomial logistic regression model with J levels and n subjects. For j = 1, …, J and i = 1, …, n, let Y_ji = 1 if the ith person belongs to jth level, and Y_ji = 0 otherwise. Let X_i be the adjusting covariates with the first element being the intercept and G_i be the genotypes of p variants. Assume that the J th level is the reference level, then the model can be written as , for j = 1, …, J − 1, where α_j and β_j are the regression parameters. Let P (Y_ji = 1) = µ_ji for j = 1, …, J. Note that and . Then under H₀ : β₁ = … = β_(J−1) = 0, the log-likelihood can be written as where . Let be the maximal likelihood estimator of (α₁, …, α_(J−1)) under H₀. Then, the estimated µ_ji’s are . Let G = (G₁, …, G_n)^T, Y_j = (Y_j1, …, Y_jn)^T and . Then, the half score of the random effects for β_j(j = 1, …, J − 1) can be derived as Let I_J−1 = diag(1, …, 1)_{(J−1)×(J−1)}, X = (X₁, …, X_n)^T, 𝕏= I_J−1 ⊗ X, 𝔾 = I_J−1 ⊗ G, and , where and Then the score statistic , where λ_j’s are eigenvalues of V. Let the p-value of this score statistic be P_J.

Suppose that we now wish to consider a different level as the reference level. Not to lose generality, let us consider the first level as the reference level. The model can be written as , for j = J, 2, …, J − 1 and j′ = 1, 2, …, J − 1, where γ_j′ and ξ_j′ are the regression parameters. Then under the null hypothesis that ξ₁ = … = ξ_(J−1) = 0, the log-likelihood has a similar form as equation (1) and can be written as where . Since the likelihood in (2) is equal to that in (1), one can show that the parameter estimators Therefore based on equation (2) is the same as that based on equation (1). Then the half score of the random effects for ξ_j(j = 1, …, J − 1) can be written as It follows that Then the score statistic , where ψ_j’s are eigenvalues of V^∗, where V^∗ is the counterpart of V. Let the p-value of this score statistic be P₁. In a similar manner, one can obtain Q_j and P_j when jth level is chosen as the reference level.

Recall that S₁, …, S_J−1 are the scores when J th level is set as the reference, while R₁, …, R_J−1 are the scores when the first level is set as the reference. The above derivation shows that there is a close relationship between S₁, …, S_J−1 and R₁, …, R_J−1. Indeed, using these results, we can further derive that matrix, then it follows that R = AS and the covariance matrix of R, Cov(R) = ACov(S)A^T. Therefore we obtain the key results that R^TR = S^TA^TAS and V^∗ = AV A^T.

The above results indicate that, when a different level is chosen as the reference level, the random-effect score statistics (Q_j and Q_j′) will have different values, and the covariance matrices for the scores will also differ. Thus, P_j in general is not equal to P_j′. In other words, the p-values of the described statistics are not invariant to the choice of the reference level. Then an interesting question arises, that is, why does the logistic regression model, which is a special case of the multinomial logistic regression model, indeed have the invariance property? It turns out that for J = 2, one has A = −I_p and R = AS = −S. Then it follows that R^TR = S^TS and V^∗ = V. Thus, when J = 2, i.e., in the case of logistic regression, the p-value remains the same regardless of which level is chosen as the reference level.

Since the p-value varies with the choice of reference level, we investigate how this choice influences the statistical power. For ease of presentation, let us consider J = 3, i.e., three levels for the outcome. Using the relationship between R and S, we have To facilitate presentation, define , then . Subsequently, we have that

the score statistic using Y₃ as the reference is ;
the score statistic using Y₁ as the reference is ;
the score statistic using Y₂ as the reference is .

To study the asymptotical distribution of the test statistic Q_j under the alternative hypothesis, let us consider a special case: X = 1_n ≡ (1, …, 1)^T. Then it can be shown that and . Thus where . It is known that asymptotically G^T HY₁ ∼ N (G^T Hµ₁, Δ₁ = G^T HΣ₁HG), where Σ₁ = diag(µ₁(1 − µ₁)). It follows that asymptotically follows a mixed noncentral chi-squared distribution: where λ_1r’s are the eigenvalues of is the noncentral parameter, and u_1r’s are the corresponding eigenvectors of Δ₁. Similarly, let Σ₂ = diag(µ₂(1 − µ₂)) and Δ₂ = G^T HΣ₂HG, then asymptotically, where λ_2r’s are the eigenvalues of Δ₂, and u_2r’s are the corresponding eigenvectors of Δ₂. To find the distribution for , let µ₁₂ = (G^T Hµ₁, G^T Hµ₂)^T, Σ₁₂ = Cov(Y₁, Y₂) = diag(−µ₁µ₂), and Then one can derive that asymptotically. Therefore, asymptotically, where λ_r’s are eigenvalues of Δ₁₂, and u_r’s are the corresponding eigenvectors of Δ₁₂. In a similar manner, we can derive the asymptotical distributions of Q₁ and Q₂, respectively.

Recall that the power function is Ψ_j(Q_j ≥ c_j), where Ψ_j is the cumulative distribution function of Q_j under H₁, and c_j is the critical value determined by the distribution of Q_j under H₀. It is tempting to directly compare Q_j’s power functions using the above derived asymptotical distributions, but it is challenging to do so. This is because when the reference level is replaced, the Q_j’s asymptotical distributions under both the null and the alternative hypotheses will change, making it extremely difficult to compare the power across difference reference levels. On the other hand, when the J subtypes have similar proportions among the n subjects and there is no adjusting covariate, one can show that the asymptotical distributions of the Q_j’s are approximately equal to each other under the null hypothesis. Then it follows that the larger the statistic Q_j is, the more likely one will reject the H₀. Recall that . This suggests that, to maximize the power, the level with the smallest should be chosen as the reference. Our simulation studies confirmed this derivation (see Supplementary Material). The size of has practical interpretations. Recall that S_j is the inner product between and G. Thus, can be roughly seen as the correlation between Y_j and genotype. This suggests the level that has the weakest correlation with the genotype should be chosen as the reference level, which well matches intuitions.

The above analysis provides theoretical insights into the power of the random effects score test. However, in practice, it is generally unknown which is the smallest among the J levels. This can be seen from the following. Taking S₁ as an example, we have where P (Y₁ = 1|G, X) is a n-length vector with each element being Clearly, E(S₁|G, X) is a quantity related to G, X, α₁, α₂, β₁, β₂. Since α₁, α₂, β₁, β₂ are unknown parameters, it is difficult to evaluate the size of accordingly. Hence, practical data analysis will need statistical tests that are invariant to the choice of the reference level. In the following, we consider three procedures to tackle this issue, and compare the performance of the three methods through simulation studies.

I. A Bonferroni procedure

We use each of the J levels as the reference level, and based on Q_j’s, obtain the corresponding p-values P₁, …, P_J. Then, use as the final p-value. The multiplication of J is a Bonferroni correction to ensure that correct type I error is maintained.

II. A Cauchy procedure

We propose to adapt to a Cauchy procedure (Liu and Xie, 2020) to combine the J p-values, P₁, …, P_J. Specifically, let , where and c_j is the pre-specified weight to accommodate prior knowledge on jth level. When there is no prior knowledge on the J levels, all c_j = 1/J. Then under the null hypothesis, the p-value of T₀ can be approximated by (1/2 − (arctan T₀)/π) based on the Cauchy distribution.

III. An integrative procedure

Consider a statistic L = (WDS)^T (WDS), where W = diag(w₁, …, w_J) ⊗ I_p, D = (I_J−1, −1_J−1)^T ⊗ I_p, and w_j is a pre-specified weight for jth level. When w_j’s are all equal, this statistic reduces to a statistic in Liu et al. (2021). Using the relationship that , we can show that , which is invariant to the choice of the reference level. Alternatively, L can be written as , where is a weighted version of Q_j. Thus, L can be seen as an integrative statistic that consists of all the Q_j’s. L asymptotically follows , where λ_r are the eigen values of WDV D^TW ^T, and and are independent random variables.

We conducted simulation studies to examine the type I errors of these procedures. We considered three levels for the response variable, and generated an adjusting covariate from N (0, 1). The regression coefficients for the intercept and the adjusting covariate were set as γ₁ = (0.3, 1.2)^T and γ₂ = (0.3, 0.9)^T. Next we simulated a p-vector of mutations with each element generated from a Bernoulli(0.05). To examine the type I error, we set ξ_j = 0 for j = 1, 2 and considered n ∈ {300, 500, 1000} for p = 10, 15. We evaluated the type I error at significance level α = 10⁻³. A total of 10⁶ simulated datasets were generated for each setting. As shown in Table 1, all considered procedures are able to control the type I error. Next, we examined the power of these procedures. We considered two scenarios:

View this table:

Table 1:

Empirical type I error (×10⁻³)

60% of ξ_js’s were generated from Uniform(0.3, 1.5), and 40% of ξ_js’s were generated from Uniform(−1.5, −0.3).
60% of ξ_js’s were generated from N (0, 1.4²), and 40% of ξ_js’s were set to 0.

ξ_js’s were fixed over all replicates. Each scenario was replicated 10⁴ times. The power for scenarios I and II is summarized in Tables 2 and 3. The Bonferroni procedure has the lowest power, due to its conservativeness in controlling type I error. The integrative procedure properly accounts for the correlations among the J statistics, and tends to have the best performance among the considered procedures. Thus, we recommend the integrative procedure for practical data analysis.

View this table:

Table 2:

Power for scenario I

View this table:

Table 3:

Power for scenario II

In summary, we have shown that the random-effect score test for multinomial logistic regression is not invariant to the choice of the reference level. Our results provide analytical explanation to this issue, and simulation studies confirmed that the choice of the reference level influences the statistical power. We considered several procedures that can yield p-values (or statistics) that are not dependent upon the reference level, and the integrative procedure appears to have a more favorable performance. Overall, our study provides new insights into the random-effect score test for multinomial logistic regression, and will aid in the ongoing study of genetic heterogeneity for complex diseases.

Data Availability

Data used in this manuscript are simulated.

REFERENCES

↵
Eckel-Passow, J.E., Decker, P.A., Kosel, M.L., Kollmeyer, T.M., Molinaro, A.M., Rice, T., Caron, A.A., Drucker, K.L., Praska, C.E., Pekmezci, M. and Hansen, H.M. (2019). Using germline variants to estimate glioma and subtype risks. Neurooncology, 21(4), 451–461.
OpenUrl
↵
Liu, M., Liu, Y., Wu, M.C., Hsu, L. and He, Q. (2021). A Method for Subtype Analysis with Somatic Mutations. Bioinformatics, doi: 10.1093/bioinformatics/btaa1090. Online ahead of print.
OpenUrl CrossRef
↵
Liu, Y. and Xie, J. (2020). Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures. Journal of the American Statistical Association, 115(529), 393–402.
OpenUrl
↵
Maity, A., Sullivan, P.F. and Tzeng, J.I. (2012). Multivariate phenotype association analysis by marker-set kernel machine regression. Genetic epidemiology, 36(7), 686–695.
OpenUrl CrossRef PubMed
↵
Sun, J., Zheng, Y. and Hsu, L. (2013). A unified mixed-effects model for rare-variant association in sequencing studies. Genetic epidemiology, 37(4), 334–344.
OpenUrl CrossRef PubMed
↵
Wu, M.C., Lee, S., Cai, T., Li, Y., Boehnke, M. and Lin, X. (2011). Rare-variant association testing for sequencing data with the sequence kernel association test. The American Journal of Human Genetics, 89(1), 82–93.
OpenUrl CrossRef PubMed