Introduction to Statistical Modeling with SAS/STAT Software

Mean Squared Error

The mean squared error is arguably the most important criterion used to evaluate the performance of a predictor or an estimator. (The subtle distinction between predictors and estimators is that random variables are predicted and constants are estimated.) The mean squared error is also useful to relay the concepts of bias, precision, and accuracy in statistical estimation. In order to examine a mean squared error, you need a target of estimation or prediction, and a predictor or estimator that is a function of the data. Suppose that the target, whether a constant or a random variable, is denoted as U. The mean squared error of the estimator or predictor for U is

normal upper M normal upper S normal upper E left-bracket upper T left-parenthesis bold upper Y right-parenthesis semicolon upper U right-bracket equals normal upper E left-bracket left-parenthesis upper T left-parenthesis bold upper Y right-parenthesis minus upper U right-parenthesis squared right-bracket

The reason for using a squared difference to measure the "loss" between and U is mostly convenience; properties of squared differences involving random variables are more easily examined than, say, absolute differences. The reason for taking an expectation is to remove the randomness of the squared difference by averaging over the distribution of the data.

Consider first the case where the target U is a constant—say, the parameter —and denote the mean of the estimator as . The mean squared error can then be decomposed as

StartLayout 1st Row 1st Column normal upper M normal upper S normal upper E left-bracket upper T left-parenthesis bold upper Y right-parenthesis semicolon beta right-bracket equals 2nd Column normal upper E left-bracket left-parenthesis upper T left-parenthesis bold upper Y right-parenthesis minus beta right-parenthesis squared right-bracket 2nd Row 1st Column equals 2nd Column normal upper E left-bracket left-parenthesis upper T left-parenthesis bold upper Y right-parenthesis minus mu Subscript upper T Baseline right-parenthesis squared right-bracket minus normal upper E left-bracket left-parenthesis beta minus mu Subscript upper T Baseline right-parenthesis squared right-bracket 3rd Row 1st Column equals 2nd Column normal upper V normal a normal r left-bracket upper T left-parenthesis bold upper Y right-parenthesis right-bracket plus left-parenthesis beta minus mu Subscript upper T Baseline right-parenthesis squared EndLayout

The mean squared error thus comprises the variance of the estimator and the squared bias. The two components can be associated with an estimator’s precision (small variance) and its accuracy (small bias).

If is an unbiased estimator of —that is, if —then the mean squared error is simply the variance of the estimator. By choosing an estimator that has minimum variance, you also choose an estimator that has minimum mean squared error among all unbiased estimators. However, as you can see from the previous expression, bias is also an "average" property; it is defined as an expectation. It is quite possible to find estimators in some statistical modeling problems that have smaller mean squared error than a minimum variance unbiased estimator; these are estimators that permit a certain amount of bias but improve on the variance. For example, in models where regressors are highly collinear, the ordinary least squares estimator continues to be unbiased. However, the presence of collinearity can induce poor precision and lead to an erratic estimator. Ridge regression stabilizes the regression estimates in this situation, and the coefficient estimates are somewhat biased, but the bias is more than offset by the gains in precision.

When the target U is a random variable, you need to carefully define what an unbiased prediction means. If the statistic and the target have the same expectation, , then

In many instances the target U is a new observation that was not part of the analysis. If the data are uncorrelated, then it is reasonable to assume in that instance that the new observation is also not correlated with the data. The mean squared error then reduces to the sum of the two variances. For example, in a linear regression model where U is a new observation and is the regression estimator

ModifyingAbove upper Y With caret Subscript 0 Baseline equals bold x prime 0 left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript negative 1 Baseline bold upper X prime bold upper Y

with variance , the mean squared prediction error for is

normal upper M normal upper S normal upper E left-bracket ModifyingAbove upper Y With caret semicolon upper Y 0 right-bracket equals sigma squared left-parenthesis bold x prime 0 left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript negative 1 Baseline bold x 0 plus 1 right-parenthesis

and the mean squared prediction error for predicting the mean is

normal upper M normal upper S normal upper E left-bracket ModifyingAbove upper Y With caret semicolon normal upper E left-bracket upper Y 0 right-bracket right-bracket equals sigma squared bold x prime 0 left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript negative 1 Baseline bold x 0

Last updated: December 09, 2022