Introduction to Statistical Modeling with SAS/STAT Software

Least Squares

The idea of the ordinary least squares (OLS) principle is to choose parameter estimates that minimize the squared distance between the data and the model. In terms of the general, additive model,

upper Y Subscript i Baseline equals f left-parenthesis x Subscript i Baseline 1 Baseline comma ellipsis comma x Subscript i k Baseline semicolon beta 1 comma ellipsis comma beta Subscript p Baseline right-parenthesis plus epsilon Subscript i

the OLS principle minimizes

normal upper S normal upper S normal upper E equals sigma-summation Underscript i equals 1 Overscript n Endscripts left-parenthesis y Subscript i Baseline minus f left-parenthesis x Subscript i Baseline 1 Baseline comma ellipsis comma x Subscript i k Baseline semicolon beta 1 comma ellipsis comma beta Subscript p Baseline right-parenthesis right-parenthesis squared

The least squares principle is sometimes called "nonparametric" in the sense that it does not require the distributional specification of the response or the error term, but it might be better termed "distributionally agnostic." In an additive-error model it is only required that the model errors have zero mean. For example, the specification

StartLayout 1st Row 1st Column upper Y Subscript i 2nd Column equals beta 0 plus beta 1 x Subscript i Baseline plus epsilon Subscript i Baseline 2nd Row 1st Column normal upper E left-bracket epsilon Subscript i Baseline right-bracket 2nd Column equals 0 EndLayout

is sufficient to derive ordinary least squares (OLS) estimators for and and to study a number of their properties. It is easy to show that the OLS estimators in this SLR model are

StartLayout 1st Row 1st Column ModifyingAbove beta With caret Subscript 1 2nd Column equals left-parenthesis sigma-summation Underscript i equals 1 Overscript n Endscripts left-parenthesis upper Y Subscript i Baseline minus upper Y overbar right-parenthesis sigma-summation Underscript i equals 1 Overscript n Endscripts left-parenthesis x Subscript i Baseline minus x overbar right-parenthesis right-parenthesis slash sigma-summation Underscript i equals 1 Overscript n Endscripts left-parenthesis x Subscript i Baseline minus x overbar right-parenthesis squared 2nd Row 1st Column ModifyingAbove beta With caret Subscript 0 2nd Column equals upper Y overbar minus ModifyingAbove beta With caret Subscript 1 Baseline x overbar EndLayout

Based on the assumption of a zero mean of the model errors, you can show that these estimators are unbiased, , . However, without further assumptions about the distribution of the , you cannot derive the variability of the least squares estimators or perform statistical inferences such as hypothesis tests or confidence intervals. In addition, depending on the distribution of the , other forms of least squares estimation can be more efficient than OLS estimation.

The conditions for which ordinary least squares estimation is efficient are zero mean, homoscedastic, uncorrelated model errors. Mathematically,

StartLayout 1st Row 1st Column normal upper E left-bracket epsilon Subscript i Baseline right-bracket 2nd Column equals 0 2nd Row 1st Column normal upper V normal a normal r left-bracket epsilon Subscript i Baseline right-bracket 2nd Column equals sigma squared 3rd Row 1st Column normal upper C normal o normal v left-bracket epsilon Subscript i Baseline comma epsilon Subscript j Baseline right-bracket 2nd Column equals 0 if i not-equals j EndLayout

The second and third assumption are met if the errors have an iid distribution—that is, if they are independent and identically distributed. Note, however, that the notion of stochastic independence is stronger than that of absence of correlation. Only if the data are normally distributed does the latter imply the former.

The various other forms of the least squares principle are motivated by different extensions of these assumptions in order to find more efficient estimators.

Weighted Least Squares

The objective function in weighted least squares (WLS) estimation is

normal upper S normal upper S normal upper E Subscript w Baseline equals sigma-summation Underscript i equals 1 Overscript n Endscripts w Subscript i Baseline left-parenthesis upper Y Subscript i Baseline minus f left-parenthesis x Subscript i Baseline 1 Baseline comma ellipsis comma x Subscript i k Baseline semicolon beta 1 comma ellipsis comma beta Subscript p Baseline right-parenthesis right-parenthesis squared

where is a weight associated with the ith observation. A situation where WLS estimation is appropriate is when the errors are uncorrelated but not homoscedastic. If the weights for the observations are proportional to the reciprocals of the error variances, , then the weighted least squares estimates are best linear unbiased estimators (BLUE). Suppose that the weights are collected in the diagonal matrix and that the mean function has the form of a linear model. The weighted sum of squares criterion then can be written as

normal upper S normal upper S normal upper E Subscript w Baseline equals left-parenthesis bold upper Y minus bold upper X bold-italic beta right-parenthesis prime bold upper W left-parenthesis bold upper Y minus bold upper X bold-italic beta right-parenthesis

which gives rise to the weighted normal equations

left-parenthesis bold upper X prime bold upper W bold upper X right-parenthesis bold-italic beta equals bold upper X prime bold upper W bold upper Y

The resulting WLS estimator of is

ModifyingAbove bold-italic beta With caret Subscript w Baseline equals left-parenthesis bold upper X prime bold upper W bold upper X right-parenthesis Superscript minus Baseline bold upper X prime bold upper W bold upper Y

Iteratively Reweighted Least Squares

If the weights in a least squares problem depend on the parameters, then a change in the parameters also changes the weight structure of the model. Iteratively reweighted least squares (IRLS) estimation is an iterative technique that solves a series of weighted least squares problems, where the weights are recomputed between iterations. IRLS estimation can be used, for example, to derive maximum likelihood estimates in generalized linear models.

Generalized Least Squares

The previously discussed least squares methods have in common that the observations are assumed to be uncorrelated—that is, , whenever . The weighted least squares estimation problem is a special case of a more general least squares problem, where the model errors have a general covariance matrix, . Suppose again that the mean function is linear, so that the model becomes

bold upper Y equals bold upper X bold-italic beta plus bold-italic epsilon bold-italic epsilon tilde left-parenthesis bold 0 comma bold upper Sigma right-parenthesis

The generalized least squares (GLS) principle is to minimize the generalized error sum of squares

normal upper S normal upper S normal upper E Subscript g Baseline equals left-parenthesis bold upper Y minus bold upper X bold-italic beta right-parenthesis prime bold upper Sigma Superscript negative 1 Baseline left-parenthesis bold upper Y minus bold upper X bold-italic beta right-parenthesis

This leads to the generalized normal equations

left-parenthesis bold upper X prime bold upper Sigma Superscript negative 1 Baseline bold upper X right-parenthesis bold-italic beta equals bold upper X prime bold upper Sigma Superscript negative 1 Baseline bold upper Y

and the GLS estimator

ModifyingAbove bold-italic beta With caret Subscript g Baseline equals left-parenthesis bold upper X prime bold upper Sigma Superscript negative 1 Baseline bold upper X right-parenthesis Superscript minus Baseline bold upper X prime bold upper Sigma Superscript negative 1 Baseline bold upper Y

Obviously, WLS estimation is a special case of GLS estimation, where —that is, the model is

bold upper Y equals bold upper X bold-italic beta plus bold-italic epsilon bold-italic epsilon tilde left-parenthesis bold 0 comma sigma squared bold upper W Superscript negative 1 Baseline right-parenthesis

Last updated: December 09, 2022