Introduction to Regression Procedures

Linear Regression Models

In matrix notation, a linear model is written as

bold upper Y equals bold upper X bold-italic beta plus bold-italic epsilon

where bold upper X is the left-parenthesis n times k right-parenthesis design matrix (rows are observations and columns are the regressors), bold-italic beta is the left-parenthesis k times 1 right-parenthesis vector of unknown parameters, and bold-italic epsilon is the left-parenthesis n times 1 right-parenthesis vector of unobservable model errors. The first column of bold upper X is usually a vector of ones and is used to estimate the intercept term.

The statistical theory of linear models is based on strict classical assumptions. Ideally, you measure the response after controlling all factors in an experimentally determined environment. If you cannot control the factors experimentally, some tests must be interpreted as being conditional on the observed values of the regressors.

Other assumptions are as follows:

  • The form of the model is correct (all important explanatory variables have been included). This assumption is reflected mathematically in the assumption of a zero mean of the model errors, normal upper E left-bracket bold-italic epsilon right-bracket equals bold 0.

  • Regressor variables are measured without error.

  • The expected value of the errors is 0.

  • The variance of the error (and thus the dependent variable) for the ith observation is sigma squared slash w Subscript i, where w Subscript i is a known weight factor. Usually, w Subscript i Baseline equals 1 for all i and thus sigma squared is the common, constant variance.

  • The errors are uncorrelated across observations.

When hypotheses are tested, or when confidence and prediction intervals are computed, an additional assumption is made that the errors are normally distributed.

Last updated: December 09, 2022