Introduction to Statistical Modeling with SAS/STAT Software

Test of Hypotheses

Consider a general linear hypothesis of the form upper H colon bold upper L bold-italic beta equals bold d, where bold upper L is a left-parenthesis k times p right-parenthesis matrix. It is assumed that bold d is such that this hypothesis is linearly consistent—that is, that there exists some bold-italic beta for which bold upper L bold-italic beta equals bold d. This is always the case if bold d is in the column space of bold upper L, if bold upper L has full row rank, or if bold d equals bold 0; the latter is the most common case. Since many linear models have a rank-deficient bold upper X matrix, the question arises whether the hypothesis is testable. The idea of testability of a hypothesis is—not surprisingly—connected to the concept of estimability as introduced previously. The hypothesis upper H colon bold upper L bold-italic beta equals bold d is testable if it consists of estimable functions.

There are two important approaches to testing hypotheses in statistical applications—the reduction principle and the linear inference approach. The reduction principle states that the validity of the hypothesis can be inferred by comparing a suitably chosen summary statistic between the model at hand and a reduced model in which the constraint bold upper L bold-italic beta equals bold d is imposed. The linear inference approach relies on the fact that ModifyingAbove bold-italic beta With caret is an estimator of bold-italic beta and its stochastic properties are known, at least approximately. A test statistic can then be formed using ModifyingAbove bold-italic beta With caret, and its behavior under the restriction bold upper L bold-italic beta equals bold d can be ascertained.

The two principles lead to identical results in certain—for example, least squares estimation in the classical linear model. In more complex situations the two approaches lead to similar but not identical results. This is the case, for example, when weights or unequal variances are involved, or when ModifyingAbove bold-italic beta With caret is a nonlinear estimator.

Reduction Tests

The two main reduction principles are the sum of squares reduction test and the likelihood ratio test. The test statistic in the former is proportional to the difference of the residual sum of squares between the reduced model and the full model. The test statistic in the likelihood ratio test is proportional to the difference of the log likelihoods between the full and reduced models. To fix these ideas, suppose that you are fitting the model bold upper Y equals bold upper X bold-italic beta plus bold-italic epsilon, where bold-italic epsilon tilde upper N left-parenthesis bold 0 comma sigma squared bold upper I right-parenthesis. Suppose that normal upper S normal upper S normal upper R denotes the residual sum of squares in this model and that normal upper S normal upper S normal upper R Subscript upper H is the residual sum of squares in the model for which bold upper L bold-italic beta equals bold d holds. Then under the hypothesis the ratio

left-parenthesis normal upper S normal upper S normal upper R Subscript upper H Baseline minus normal upper S normal upper S normal upper R right-parenthesis slash sigma squared

follows a chi-square distribution with degrees of freedom equal to the rank of bold upper L. Maybe surprisingly, the residual sum of squares in the full model is distributed independently of this quantity, so that under the hypothesis,

upper F equals StartFraction left-parenthesis normal upper S normal upper S normal upper R Subscript upper H Baseline minus normal upper S normal upper S normal upper R right-parenthesis slash normal r normal a normal n normal k left-parenthesis bold upper L right-parenthesis Over normal upper S normal upper S normal upper R slash left-parenthesis n minus normal r normal a normal n normal k left-parenthesis bold upper X right-parenthesis right-parenthesis EndFraction

follows an F distribution with normal r normal a normal n normal k left-parenthesis bold upper L right-parenthesis numerator and n minus normal r normal a normal n normal k left-parenthesis bold upper X right-parenthesis denominator degrees of freedom. Note that the quantity in the denominator of the F statistic is a particular estimator of sigma squared—namely, the unbiased moment-based estimator that is customarily associated with least squares estimation. It is also the restricted maximum likelihood estimator of sigma squared if bold upper Y is normally distributed.

In the case of the likelihood ratio test, suppose that l left-parenthesis ModifyingAbove bold-italic beta With caret comma ModifyingAbove sigma With caret squared semicolon bold y right-parenthesis denotes the log likelihood evaluated at the ML estimators. Also suppose that l left-parenthesis ModifyingAbove bold-italic beta With caret Subscript upper H Baseline comma ModifyingAbove sigma With caret Subscript upper H Superscript 2 Baseline semicolon bold y right-parenthesis denotes the log likelihood in the model for which bold upper L bold-italic beta equals bold d holds. Then under the hypothesis the statistic

lamda equals 2 left-parenthesis l left-parenthesis ModifyingAbove bold-italic beta With caret comma ModifyingAbove sigma With caret squared semicolon bold y right-parenthesis minus l left-parenthesis ModifyingAbove bold-italic beta With caret Subscript upper H Baseline comma ModifyingAbove sigma With caret Subscript upper H Superscript 2 Baseline semicolon bold y right-parenthesis right-parenthesis

follows approximately a chi-square distribution with degrees of freedom equal to the rank of bold upper L. In the case of a normally distributed response, the log-likelihood function can be profiled with respect to bold-italic beta. The resulting profile log likelihood is

l left-parenthesis ModifyingAbove sigma With caret squared semicolon bold y right-parenthesis equals minus StartFraction n Over 2 EndFraction log left-brace 2 pi right-brace minus StartFraction n Over 2 EndFraction left-parenthesis log left-brace ModifyingAbove sigma With caret squared right-brace right-parenthesis

and the likelihood ratio test statistic becomes

lamda equals n left-parenthesis log left-brace ModifyingAbove sigma With caret Subscript upper H Superscript 2 Baseline right-brace minus log left-brace ModifyingAbove sigma With caret squared right-brace right-parenthesis equals n left-parenthesis log left-brace normal upper S normal upper S normal upper R Subscript upper H Baseline right-brace minus log left-brace normal upper S normal upper S normal upper R right-brace right-parenthesis equals n left-parenthesis log left-brace normal upper S normal upper S normal upper R Subscript upper H Baseline slash normal upper S normal upper S normal upper R right-brace right-parenthesis

The preceding expressions show that, in the case of normally distributed data, both reduction principles lead to simple functions of the residual sums of squares in two models. As Pawitan (2001, p. 151) puts it, there is, however, an important difference not in the computations but in the statistical content. The least squares principle, where sum of squares reduction tests are widely used, does not require a distributional specification. Assumptions about the distribution of the data are added to provide a framework for confirmatory inferences, such as the testing of hypotheses. This framework stems directly from the assumption about the data’s distribution, or from the sampling distribution of the least squares estimators. The likelihood principle, on the other hand, requires a distributional specification at the outset. Inference about the parameters is implicit in the model; it is the result of further computations following the estimation of the parameters. In the least squares framework, inference about the parameters is the result of further assumptions.

Linear Inference

The principle of linear inference is to formulate a test statistic for upper H colon bold upper L bold-italic beta equals bold d that builds on the linearity of the hypothesis about bold-italic beta. For many models that have linear components, the estimator bold upper L ModifyingAbove bold-italic beta With caret is also linear in bold upper Y. It is then simple to establish the distributional properties of bold upper L ModifyingAbove bold-italic beta With caret based on the distributional assumptions about bold upper Y or based on large-sample arguments. For example, ModifyingAbove bold-italic beta With caret might be a nonlinear estimator, but it is known to asymptotically follow a normal distribution; this is the case in many nonlinear and generalized linear models.

If the sampling distribution or the asymptotic distribution of ModifyingAbove bold-italic beta With caret is normal, then one can easily derive quadratic forms with known distributional properties. For example, if the random vector bold upper U is distributed as upper N left-parenthesis bold-italic mu comma bold upper Sigma right-parenthesis, then bold upper U prime bold upper A bold upper U follows a chi-square distribution with normal r normal a normal n normal k left-parenthesis bold upper A right-parenthesis degrees of freedom and noncentrality parameter 1 slash 2 bold-italic mu prime bold upper A bold-italic mu, provided that bold upper A bold upper Sigma bold upper A bold upper Sigma equals bold upper A bold upper Sigma.

In the classical linear model, suppose that bold upper X is deficient in rank and that ModifyingAbove bold-italic beta With caret equals left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline bold upper X prime bold upper Y is a solution to the normal equations. Then, if the errors are normally distributed,

ModifyingAbove bold-italic beta With caret tilde upper N left-parenthesis left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline bold upper X prime bold upper X bold-italic beta comma sigma squared left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline bold upper X prime bold upper X left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline right-parenthesis

Because upper H colon bold upper L bold-italic beta equals bold d is testable, bold upper L bold-italic beta is estimable, and thus bold upper L left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline bold upper X prime bold upper X equals bold upper L, as established in the previous section. Hence,

bold upper L ModifyingAbove bold-italic beta With caret tilde upper N left-parenthesis bold upper L bold-italic beta comma sigma squared bold upper L left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline bold upper L prime right-parenthesis

The conditions for a chi-square distribution of the quadratic form

left-parenthesis bold upper L ModifyingAbove bold-italic beta With caret minus bold d right-parenthesis prime left-parenthesis bold upper L left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline bold upper L prime right-parenthesis Superscript minus Baseline left-parenthesis bold upper L ModifyingAbove bold-italic beta With caret minus bold d right-parenthesis

are thus met, provided that

left-parenthesis bold upper L left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline bold upper L prime right-parenthesis Superscript minus Baseline bold upper L left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline bold upper L prime left-parenthesis bold upper L left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline bold upper L prime right-parenthesis Superscript minus Baseline bold upper L left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline bold upper L Superscript prime Baseline equals left-parenthesis bold upper L left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline bold upper L prime right-parenthesis Superscript minus Baseline bold upper L left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline bold upper L prime

This condition is obviously met if bold upper L left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline bold upper L prime is of full rank. The condition is also met if bold upper L left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline bold upper L Superscript prime minus is a reflexive inverse (a g 2-inverse) of bold upper L left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline bold upper L.

The test statistic to test the linear hypothesis upper H colon bold upper L bold-italic beta equals bold d is thus

upper F equals StartFraction left-parenthesis bold upper L ModifyingAbove bold-italic beta With caret minus bold d right-parenthesis prime left-parenthesis bold upper L left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline bold upper L prime right-parenthesis Superscript minus Baseline left-parenthesis bold upper L ModifyingAbove bold-italic beta With caret minus bold d right-parenthesis slash normal r normal a normal n normal k left-parenthesis bold upper L right-parenthesis Over normal upper S normal upper S normal upper R slash left-parenthesis n minus normal r normal a normal n normal k left-parenthesis bold upper X right-parenthesis right-parenthesis EndFraction

and it follows an F distribution with normal r normal a normal n normal k left-parenthesis bold upper L right-parenthesis numerator and n minus normal r normal a normal n normal k left-parenthesis bold upper X right-parenthesis denominator degrees of freedom under the hypothesis.

This test statistic looks very similar to the F statistic for the sum of squares reduction test. This is no accident. If the model is linear and parameters are estimated by ordinary least squares, then you can show that the quadratic form left-parenthesis bold upper L ModifyingAbove bold-italic beta With caret minus bold d right-parenthesis prime left-parenthesis bold upper L left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline bold upper L prime right-parenthesis Superscript minus Baseline left-parenthesis bold upper L ModifyingAbove bold-italic beta With caret minus bold d right-parenthesis equals the differences in the residual sum of squares, normal upper S normal upper S normal upper R Subscript upper H Baseline minus normal upper S normal upper S normal upper R, where normal upper S normal upper S normal upper R Subscript upper H is obtained as the residual sum of squares from OLS estimation in a model that satisfies bold upper L bold-italic beta equals bold d. However, this correspondence between the two test formulations does not apply when a different estimation principle is used. For example, assume that bold-italic epsilon tilde upper N left-parenthesis bold 0 comma bold upper V right-parenthesis and that bold-italic beta is estimated by generalized least squares:

ModifyingAbove bold-italic beta With caret Subscript g Baseline equals left-parenthesis bold upper X prime bold upper V Superscript negative 1 Baseline bold upper X right-parenthesis Superscript minus Baseline bold upper X prime bold upper V Superscript negative 1 Baseline bold upper Y

The construction of bold upper L matrices associated with hypotheses in SAS/STAT software is frequently based on the properties of the bold upper X matrix, not of bold upper X prime bold upper V Superscript minus Baseline bold upper X. In other words, the construction of the bold upper L matrix is governed only by the design. A sum of squares reduction test for upper H colon bold upper L bold-italic beta equals bold 0 that uses the generalized residual sum of squares left-parenthesis bold upper Y minus ModifyingAbove bold-italic beta With caret Subscript g Baseline right-parenthesis prime bold upper V Superscript negative 1 Baseline left-parenthesis bold upper Y minus ModifyingAbove bold-italic beta With caret Subscript g Baseline right-parenthesis is not identical to a linear hypothesis test with the statistic

upper F Superscript asterisk Baseline equals StartFraction ModifyingAbove bold-italic beta With caret prime Subscript g Baseline bold upper L prime left-parenthesis bold upper L left-parenthesis bold upper X prime bold upper V Superscript negative 1 Baseline bold upper X right-parenthesis Superscript minus Baseline bold upper L prime right-parenthesis Superscript minus Baseline bold upper L ModifyingAbove bold-italic beta With caret Subscript g Baseline Over normal r normal a normal n normal k left-parenthesis bold upper L right-parenthesis EndFraction

Furthermore, bold upper V is usually unknown and must be estimated as well. The estimate for bold upper V depends on the model, and imposing a constraint on the model would change the estimate. The asymptotic distribution of the statistic upper F Superscript asterisk is a chi-square distribution. However, in practical applications the F distribution with normal r normal a normal n normal k left-parenthesis bold upper L right-parenthesis numerator and nu denominator degrees of freedom is often used because it provides a better approximation to the sampling distribution of upper F Superscript asterisk in finite samples. The computation of the denominator degrees of freedom nu, however, is a matter of considerable discussion. A number of methods have been proposed and are implemented in various forms in SAS/STAT (see, for example, the degrees-of-freedom methods in the MIXED and GLIMMIX procedures).

The general form of an F statistic for testing a linear hypotheses upper H colon bold upper L bold-italic beta equals bold 0 is

upper F equals ModifyingAbove bold-italic beta With caret prime bold upper L prime left-parenthesis bold upper L left-parenthesis ModifyingAbove bold upper V With caret left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis right-parenthesis bold upper L prime right-parenthesis Superscript minus Baseline bold upper L ModifyingAbove bold-italic beta With caret slash mu

where mu equals normal r normal a normal n normal k left-parenthesis bold upper L left-parenthesis ModifyingAbove bold upper V With caret left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis right-parenthesis bold upper L prime right-parenthesis. The preceding development assumes that the estimated variance of ModifyingAbove bold-italic beta With caret, ModifyingAbove bold upper V With caret left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis, is of the form bold upper X prime bold upper V Superscript negative 1 Baseline bold upper X for some estimate bold upper V of the variance of bold upper Y. In this case, estimability ensures that this F statistic has a unique value no matter which kind of generalized inverse is used to compute it. However, when a residual-based sandwich estimator is used to estimate bold upper V, estimability does not ensure uniqueness. In this case, the F value is invariant to the choice of the generalized inverse if and only if bold upper L is estimable and bold upper L prime left-bracket bold upper L ModifyingAbove bold upper V With caret left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis bold upper L prime right-bracket Superscript minus Baseline left-bracket bold upper L ModifyingAbove bold upper V With caret left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis bold upper L prime right-bracket equals bold upper L prime.

Although it is extremely rare, it is possible in practice that the preceding uniqueness condition is not satisfied. For example, if the number of subjects (clusters) is less than the number of nonsingular parameters in the model and a residual-based sandwich variance estimator, ModifyingAbove bold upper V With caret left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis, is used to estimate bold upper V left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis, then the matrix of coefficients for testing the overall null does not satisfy the uniqueness condition. If this condition is not satisfied, then the F statistic for testing upper H colon bold upper L bold-italic beta equals bold 0 is not invariant to the choice of the g 2-inverse of bold upper L left-parenthesis ModifyingAbove bold upper V With caret left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis right-parenthesis bold upper L prime. In practical applications, F is compared with an F distribution with mu numerator and nu denominator degrees of freedom, where nu depends on how the variance of ModifyingAbove bold-italic beta With caret is estimated. But the F value and therefore the inference might be different when a different g 2-inverse is used. This F test is not recommended when the uniqueness condition is not satisfied. An alternative approach would be to increase the number of subjects (clusters) or to find a parsimonious model so that the number of parameters is less than the number of subjects (clusters).

Last updated: December 09, 2022