Introduction to Statistical Modeling with SAS/STAT Software

Finding the Least Squares Estimators

Finding the least squares estimator of bold-italic beta can be motivated as a calculus problem or by considering the geometry of least squares. The former approach simply states that the OLS estimator is the vector ModifyingAbove bold-italic beta With caret that minimizes the objective function

normal upper S normal upper S normal upper E equals left-parenthesis bold upper Y minus bold upper X bold-italic beta right-parenthesis prime left-parenthesis bold upper Y minus bold upper X bold-italic beta right-parenthesis

Applying the differentiation rules from the section Matrix Differentiation leads to

StartLayout 1st Row 1st Column StartFraction partial-differential Over partial-differential bold-italic beta EndFraction normal upper S normal upper S normal upper E equals 2nd Column StartFraction partial-differential Over partial-differential bold-italic beta EndFraction left-parenthesis bold upper Y prime bold upper Y minus 2 bold upper Y prime bold upper X bold-italic beta plus bold-italic beta prime bold upper X prime bold upper X bold-italic beta right-parenthesis 2nd Row 1st Column equals 2nd Column bold 0 minus 2 bold upper X prime bold upper Y plus 2 bold upper X prime bold upper X bold-italic beta 3rd Row 1st Column StartFraction partial-differential squared Over partial-differential bold-italic beta partial-differential bold-italic beta EndFraction normal upper S normal upper S normal upper E equals 2nd Column bold upper X prime bold upper X EndLayout

Consequently, the solution to the normal equations, bold upper X prime bold upper X bold-italic beta equals bold upper X prime bold upper Y, solves StartFraction partial-differential Over partial-differential bold-italic beta EndFraction normal upper S normal upper S normal upper E equals 0, and the fact that the second derivative is nonnegative definite guarantees that this solution minimizes normal upper S normal upper S normal upper E. The geometric argument to motivate ordinary least squares estimation is as follows. Assume that bold upper X is of rank k. For any value of bold-italic beta, such as bold-italic beta overTilde, the following identity holds:

bold upper Y equals bold upper X bold-italic beta overTilde plus left-parenthesis bold upper Y minus bold upper X bold-italic beta overTilde right-parenthesis

The vector bold upper X bold-italic beta overTilde is a point in a k-dimensional subspace of upper R Superscript n, and the residual left-parenthesis bold upper Y minus bold upper X bold-italic beta overTilde right-parenthesis is a point in an left-parenthesis n minus k right-parenthesis-dimensional subspace. The OLS estimator is the value ModifyingAbove bold-italic beta With caret that minimizes the distance of bold upper X bold-italic beta overTilde from bold upper Y, implying that bold upper X ModifyingAbove bold-italic beta With caret and left-parenthesis bold upper Y minus bold upper X ModifyingAbove bold-italic beta With caret right-parenthesis are orthogonal to each other; that is,

left-parenthesis bold upper Y minus bold upper X ModifyingAbove bold-italic beta With caret right-parenthesis prime bold upper X ModifyingAbove bold-italic beta With caret equals bold 0. This in turn implies that ModifyingAbove bold-italic beta With caret satisfies the normal equations, since

ModifyingAbove bold-italic beta With caret prime bold upper X prime bold upper Y equals ModifyingAbove bold-italic beta With caret prime bold upper X prime bold upper X ModifyingAbove bold-italic beta With caret left right double arrow bold upper X prime bold upper X ModifyingAbove bold-italic beta With caret equals bold upper X prime bold upper Y
Full-Rank Case

If bold upper X is of full column rank, the OLS estimator is unique and given by

ModifyingAbove bold-italic beta With caret equals left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript negative 1 Baseline bold upper X prime bold upper Y

The OLS estimator is an unbiased estimator of bold-italic beta—that is,

StartLayout 1st Row 1st Column normal upper E left-bracket ModifyingAbove bold-italic beta With caret right-bracket equals 2nd Column normal upper E left-bracket left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript negative 1 Baseline bold upper X prime bold upper Y right-bracket 2nd Row 1st Column equals 2nd Column left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript negative 1 Baseline bold upper X prime normal upper E left-bracket bold upper Y right-bracket equals left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript negative 1 Baseline bold upper X prime bold upper X bold-italic beta equals bold-italic beta EndLayout

Note that this result holds if normal upper E left-bracket bold upper Y right-bracket equals bold upper X bold-italic beta; in other words, the condition that the model errors have mean zero is sufficient for the OLS estimator to be unbiased. If the errors are homoscedastic and uncorrelated, the OLS estimator is indeed the best linear unbiased estimator (BLUE) of bold-italic beta—that is, no other estimator that is a linear function of bold upper Y has a smaller mean squared error. The fact that the estimator is unbiased implies that no other linear estimator has a smaller variance. If, furthermore, the model errors are normally distributed, then the OLS estimator has minimum variance among all unbiased estimators of bold-italic beta, whether they are linear or not. Such an estimator is called a uniformly minimum variance unbiased estimator, or UMVUE.

Rank-Deficient Case

In the case of a rank-deficient bold upper X matrix, a generalized inverse is used to solve the normal equations:

ModifyingAbove bold-italic beta With caret equals left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline bold upper X prime bold upper Y

Although a g 1-inverse is sufficient to solve a linear system, computational expedience and interpretation of the results often dictate the use of a generalized inverse with reflexive properties (that is, a g 2-inverse; see the section Generalized Inverse Matrices for details). Suppose, for example, that the bold upper X matrix is partitioned as bold upper X equals left-bracket bold upper X 1 bold upper X 2 right-bracket, where bold upper X 1 is of full column rank and each column in bold upper X 2 is a linear combination of the columns of bold upper X 1. The matrix

bold upper G 1 equals Start 2 By 2 Matrix 1st Row 1st Column left-parenthesis bold upper X prime 1 bold upper X 1 right-parenthesis Superscript negative 1 Baseline 2nd Column left-parenthesis bold upper X prime 1 bold upper X 1 right-parenthesis Superscript negative 1 Baseline bold upper X prime 1 bold upper X 2 2nd Row 1st Column minus bold upper X prime 2 bold upper X 1 left-parenthesis bold upper X prime 1 bold upper X 1 right-parenthesis Superscript negative 1 Baseline 2nd Column bold 0 EndMatrix

is a g 1-inverse of bold upper X prime bold upper X and

bold upper G 2 equals Start 2 By 2 Matrix 1st Row 1st Column left-parenthesis bold upper X prime 1 bold upper X 1 right-parenthesis Superscript negative 1 Baseline 2nd Column bold 0 2nd Row 1st Column bold 0 2nd Column bold 0 EndMatrix

is a g 2-inverse. If the least squares solution is computed with the g 1-inverse, then computing the variance of the estimator requires additional matrix operations and storage. On the other hand, the variance of the solution that uses a g 2-inverse is proportional to upper G 2.

StartLayout 1st Row 1st Column normal upper V normal a normal r left-bracket bold upper G 1 bold upper X prime bold upper Y right-bracket equals 2nd Column sigma squared bold upper G 1 bold upper X prime bold upper X bold upper G 1 2nd Row 1st Column normal upper V normal a normal r left-bracket bold upper G 2 bold upper X prime bold upper Y right-bracket equals 2nd Column sigma squared bold upper G 2 bold upper X prime bold upper X bold upper G 2 equals sigma squared bold upper G 2 EndLayout

If a generalized inverse bold upper G of bold upper X prime bold upper X is used to solve the normal equations, then the resulting solution is a biased estimator of bold-italic beta (unless bold upper X prime bold upper X is of full rank, in which case the generalized inverse is "the" inverse), since normal upper E left-bracket ModifyingAbove bold-italic beta With caret right-bracket equals bold upper G bold upper X prime bold upper X bold-italic beta, which is not in general equal to bold-italic beta.

If you think of estimation as "estimation without bias," then ModifyingAbove bold-italic beta With caret is the estimator of something, namely bold upper G bold upper X bold-italic beta. Since this is not a quantity of interest and since it is not unique—it depends on your choice of bold upper G—Searle (1971, p. 169) cautions that in the less-than-full-rank case, ModifyingAbove bold-italic beta With caret is a solution to the normal equations and "nothing more."

Last updated: December 09, 2022