Introduction to Statistical Modeling with SAS/STAT Software

Expectations of Random Variables and Vectors

If Y is a discrete random variable with mass function p left-parenthesis y right-parenthesis and support (possible values) y 1 comma y 2 comma ellipsis, then the expectation (expected value) of Y is defined as

normal upper E left-bracket upper Y right-bracket equals sigma-summation Underscript j equals 1 Overscript normal infinity Endscripts y Subscript j Baseline p left-parenthesis y Subscript j Baseline right-parenthesis

provided that sigma-summation StartAbsoluteValue y Subscript j Baseline EndAbsoluteValue p left-parenthesis y Subscript j Baseline right-parenthesis less-than normal infinity, otherwise the sum in the definition is not well-defined. The expected value of a function h left-parenthesis y right-parenthesis is similarly defined: provided that sigma-summation StartAbsoluteValue h left-parenthesis y Subscript j Baseline right-parenthesis EndAbsoluteValue p left-parenthesis y Subscript j Baseline right-parenthesis less-than normal infinity,

normal upper E left-bracket h left-parenthesis upper Y right-parenthesis right-bracket equals sigma-summation Underscript j equals 1 Overscript normal infinity Endscripts h left-parenthesis y Subscript j Baseline right-parenthesis p left-parenthesis y Subscript j Baseline right-parenthesis

For continuous random variables, similar definitions apply, but summation is replaced by integration over the support of the random variable. If X is a continuous random variable with density function f left-parenthesis x right-parenthesis, and integral StartAbsoluteValue x EndAbsoluteValue f left-parenthesis x right-parenthesis d x less-than normal infinity, then the expectation of X is defined as

normal upper E left-bracket upper X right-bracket equals integral Subscript negative normal infinity Superscript normal infinity Baseline x f left-parenthesis x right-parenthesis d x

The expected value of a random variable is also called its mean or its first moment. A particularly important function of a random variable is h left-parenthesis upper Y right-parenthesis equals left-parenthesis upper Y minus normal upper E left-bracket upper Y right-bracket right-parenthesis squared. The expectation of h left-parenthesis upper Y right-parenthesis is called the variance of Y or the second central moment of Y. When you study the properties of multiple random variables, then you might be interested in aspects of their joint distribution. The covariance between random variables Y and X is defined as the expected value of the function left-parenthesis upper Y minus normal upper E left-bracket upper Y right-bracket right-parenthesis left-parenthesis upper X minus normal upper E left-bracket upper X right-bracket right-parenthesis, where the expectation is taken under the bivariate joint distribution of Y and X:

normal upper C normal o normal v left-bracket upper Y comma upper X right-bracket equals normal upper E left-bracket left-parenthesis upper Y minus normal upper E left-bracket upper Y right-bracket right-parenthesis left-parenthesis upper X minus normal upper E left-bracket upper X right-bracket right-parenthesis right-bracket equals normal upper E left-bracket upper Y upper X right-bracket minus normal upper E left-bracket upper Y right-bracket normal upper E left-bracket upper X right-bracket equals integral integral x y f left-parenthesis x comma y right-parenthesis d x d y minus normal upper E left-bracket upper Y right-bracket normal upper E left-bracket upper X right-bracket

The covariance between a random variable and itself is the variance, normal upper C normal o normal v left-bracket upper Y comma upper Y right-bracket equals normal upper V normal a normal r left-bracket upper Y right-bracket.

In statistical applications and formulas, random variables are often collected into vectors. For example, a random sample of size n from the distribution of Y generates a random vector of order left-parenthesis n times 1 right-parenthesis,

bold upper Y equals Start 4 By 1 Matrix 1st Row  upper Y 1 2nd Row  upper Y 2 3rd Row  vertical-ellipsis 4th Row  upper Y Subscript n EndMatrix

The expected value of the left-parenthesis n times 1 right-parenthesis random vector bold upper Y is the vector of the means of the elements of bold upper Y:

bold upper E left-bracket bold upper Y right-bracket equals left-bracket normal upper E left-bracket upper Y Subscript i Baseline right-bracket right-bracket equals Start 4 By 1 Matrix 1st Row  normal upper E left-bracket upper Y 1 right-bracket 2nd Row  normal upper E left-bracket upper Y 2 right-bracket 3rd Row  vertical-ellipsis 4th Row  normal upper E left-bracket upper Y Subscript n Baseline right-bracket EndMatrix

It is often useful to directly apply rules about working with means, variances, and covariances of random vectors. To develop these rules, suppose that bold upper Y and bold upper U denote two random vectors with typical elements upper Y 1 comma ellipsis comma upper Y Subscript n Baseline and upper U 1 comma ellipsis comma upper U Subscript k Baseline. Further suppose that bold upper A and bold upper B are constant (nonstochastic) matrices, that bold a is a constant vector, and that the c Subscript i are scalar constants.

The following rules enable you to derive the mean of a linear function of a random vector:

StartLayout 1st Row 1st Column normal upper E left-bracket bold upper A right-bracket equals 2nd Column bold upper A 2nd Row 1st Column normal upper E left-bracket bold upper A bold upper Y plus bold a right-bracket equals 2nd Column bold upper A normal upper E left-bracket bold upper Y right-bracket plus bold a 3rd Row 1st Column normal upper E left-bracket bold upper Y plus bold upper U right-bracket equals 2nd Column normal upper E left-bracket bold upper Y right-bracket plus normal upper E left-bracket bold upper U right-bracket EndLayout

The covariance matrix of bold upper Y and bold upper U is the left-parenthesis n times k right-parenthesis matrix whose typical element in row i, column j is the covariance between upper Y Subscript i and upper U Subscript j. The covariance matrix between two random vectors is frequently denoted with the normal upper C normal o normal v "operator."

StartLayout 1st Row 1st Column normal upper C normal o normal v left-bracket bold upper Y comma bold upper U right-bracket equals 2nd Column left-bracket normal upper C normal o normal v left-bracket upper Y Subscript i Baseline comma upper U Subscript j Baseline right-bracket right-bracket 2nd Row 1st Column equals 2nd Column normal upper E left-bracket left-parenthesis bold upper Y minus normal upper E left-bracket bold upper Y right-bracket right-parenthesis left-parenthesis bold upper U minus normal upper E left-bracket bold upper U right-bracket right-parenthesis prime right-bracket equals normal upper E left-bracket bold upper Y bold upper U prime right-bracket minus normal upper E left-bracket bold upper Y right-bracket normal upper E left-bracket bold upper U right-bracket Superscript prime Baseline 3rd Row 1st Column equals 2nd Column Start 5 By 5 Matrix 1st Row 1st Column normal upper C normal o normal v left-bracket upper Y 1 comma upper U 1 right-bracket 2nd Column normal upper C normal o normal v left-bracket upper Y 1 comma upper U 2 right-bracket 3rd Column normal upper C normal o normal v left-bracket upper Y 1 comma upper U 3 right-bracket 4th Column midline-horizontal-ellipsis 5th Column normal upper C normal o normal v left-bracket upper Y 1 comma upper U Subscript k Baseline right-bracket 2nd Row 1st Column normal upper C normal o normal v left-bracket upper Y 2 comma upper U 1 right-bracket 2nd Column normal upper C normal o normal v left-bracket upper Y 2 comma upper U 2 right-bracket 3rd Column normal upper C normal o normal v left-bracket upper Y 2 comma upper U 3 right-bracket 4th Column midline-horizontal-ellipsis 5th Column normal upper C normal o normal v left-bracket upper Y 2 comma upper U Subscript k Baseline right-bracket 3rd Row 1st Column normal upper C normal o normal v left-bracket upper Y 3 comma upper U 1 right-bracket 2nd Column normal upper C normal o normal v left-bracket upper Y 3 comma upper U 2 right-bracket 3rd Column normal upper C normal o normal v left-bracket upper Y 3 comma upper U 3 right-bracket 4th Column midline-horizontal-ellipsis 5th Column normal upper C normal o normal v left-bracket upper Y 3 comma upper U Subscript k Baseline right-bracket 4th Row 1st Column vertical-ellipsis 2nd Column vertical-ellipsis 3rd Column vertical-ellipsis 4th Column down-right-diagonal-ellipsis 5th Column vertical-ellipsis 5th Row 1st Column normal upper C normal o normal v left-bracket upper Y Subscript n Baseline comma upper U 1 right-bracket 2nd Column normal upper C normal o normal v left-bracket upper Y Subscript n Baseline comma upper U 2 right-bracket 3rd Column normal upper C normal o normal v left-bracket upper Y Subscript n Baseline comma upper U 3 right-bracket 4th Column midline-horizontal-ellipsis 5th Column normal upper C normal o normal v left-bracket upper Y Subscript n Baseline comma upper U Subscript k Baseline right-bracket EndMatrix EndLayout

The variance matrix of a random vector bold upper Y is the covariance matrix between bold upper Y and itself. The variance matrix is frequently denoted with the normal upper V normal a normal r "operator."

StartLayout 1st Row 1st Column normal upper V normal a normal r left-bracket bold upper Y right-bracket equals 2nd Column normal upper C normal o normal v left-bracket bold upper Y comma bold upper Y right-bracket equals left-bracket normal upper C normal o normal v left-bracket upper Y Subscript i Baseline comma upper Y Subscript j Baseline right-bracket right-bracket 2nd Row 1st Column equals 2nd Column normal upper E left-bracket left-parenthesis bold upper Y minus normal upper E left-bracket bold upper Y right-bracket right-parenthesis left-parenthesis bold upper Y minus normal upper E left-bracket bold upper Y right-bracket right-parenthesis prime right-bracket equals normal upper E left-bracket bold upper Y bold upper Y prime right-bracket minus normal upper E left-bracket bold upper Y right-bracket normal upper E left-bracket bold upper Y right-bracket Superscript prime Baseline 3rd Row 1st Column equals 2nd Column Start 5 By 5 Matrix 1st Row 1st Column normal upper C normal o normal v left-bracket upper Y 1 comma upper Y 1 right-bracket 2nd Column normal upper C normal o normal v left-bracket upper Y 1 comma upper Y 2 right-bracket 3rd Column normal upper C normal o normal v left-bracket upper Y 1 comma upper Y 3 right-bracket 4th Column midline-horizontal-ellipsis 5th Column normal upper C normal o normal v left-bracket upper Y 1 comma upper Y Subscript n Baseline right-bracket 2nd Row 1st Column normal upper C normal o normal v left-bracket upper Y 2 comma upper Y 1 right-bracket 2nd Column normal upper C normal o normal v left-bracket upper Y 2 comma upper Y 2 right-bracket 3rd Column normal upper C normal o normal v left-bracket upper Y 2 comma upper Y 3 right-bracket 4th Column midline-horizontal-ellipsis 5th Column normal upper C normal o normal v left-bracket upper Y 2 comma upper Y Subscript n Baseline right-bracket 3rd Row 1st Column normal upper C normal o normal v left-bracket upper Y 3 comma upper Y 1 right-bracket 2nd Column normal upper C normal o normal v left-bracket upper Y 3 comma upper Y 2 right-bracket 3rd Column normal upper C normal o normal v left-bracket upper Y 3 comma upper Y 3 right-bracket 4th Column midline-horizontal-ellipsis 5th Column normal upper C normal o normal v left-bracket upper Y 3 comma upper Y Subscript n Baseline right-bracket 4th Row 1st Column vertical-ellipsis 2nd Column vertical-ellipsis 3rd Column vertical-ellipsis 4th Column down-right-diagonal-ellipsis 5th Column vertical-ellipsis 5th Row 1st Column normal upper C normal o normal v left-bracket upper Y Subscript n Baseline comma upper Y 1 right-bracket 2nd Column normal upper C normal o normal v left-bracket upper Y Subscript n Baseline comma upper Y 2 right-bracket 3rd Column normal upper C normal o normal v left-bracket upper Y Subscript n Baseline comma upper Y 3 right-bracket 4th Column midline-horizontal-ellipsis 5th Column normal upper C normal o normal v left-bracket upper Y Subscript n Baseline comma upper Y Subscript n Baseline right-bracket EndMatrix 4th Row 1st Column equals 2nd Column Start 5 By 5 Matrix 1st Row 1st Column normal upper V normal a normal r left-bracket upper Y 1 right-bracket 2nd Column normal upper C normal o normal v left-bracket upper Y 1 comma upper Y 2 right-bracket 3rd Column normal upper C normal o normal v left-bracket upper Y 1 comma upper Y 3 right-bracket 4th Column midline-horizontal-ellipsis 5th Column normal upper C normal o normal v left-bracket upper Y 1 comma upper Y Subscript n Baseline right-bracket 2nd Row 1st Column normal upper C normal o normal v left-bracket upper Y 2 comma upper Y 1 right-bracket 2nd Column normal upper V normal a normal r left-bracket upper Y 2 right-bracket 3rd Column normal upper C normal o normal v left-bracket upper Y 2 comma upper Y 3 right-bracket 4th Column midline-horizontal-ellipsis 5th Column normal upper C normal o normal v left-bracket upper Y 2 comma upper Y Subscript n Baseline right-bracket 3rd Row 1st Column normal upper C normal o normal v left-bracket upper Y 3 comma upper Y 1 right-bracket 2nd Column normal upper C normal o normal v left-bracket upper Y 3 comma upper Y 2 right-bracket 3rd Column normal upper V normal a normal r left-bracket upper Y 3 right-bracket 4th Column midline-horizontal-ellipsis 5th Column normal upper C normal o normal v left-bracket upper Y 3 comma upper Y Subscript n Baseline right-bracket 4th Row 1st Column vertical-ellipsis 2nd Column vertical-ellipsis 3rd Column vertical-ellipsis 4th Column down-right-diagonal-ellipsis 5th Column vertical-ellipsis 5th Row 1st Column normal upper C normal o normal v left-bracket upper Y Subscript n Baseline comma upper Y 1 right-bracket 2nd Column normal upper C normal o normal v left-bracket upper Y Subscript n Baseline comma upper Y 2 right-bracket 3rd Column normal upper C normal o normal v left-bracket upper Y Subscript n Baseline comma upper Y 3 right-bracket 4th Column midline-horizontal-ellipsis 5th Column normal upper V normal a normal r left-bracket upper Y Subscript n Baseline right-bracket EndMatrix EndLayout

Because the variance matrix contains variances on the diagonal and covariances in the off-diagonal positions, it is also referred to as the variance-covariance matrix of the random vector bold upper Y.

If the elements of the covariance matrix normal upper C normal o normal v left-bracket bold upper Y comma bold upper U right-bracket are zero, the random vectors are uncorrelated. If bold upper Y and bold upper U are normally distributed, then a zero covariance matrix implies that the vectors are stochastically independent. If the off-diagonal elements of the variance matrix normal upper V normal a normal r left-bracket bold upper Y right-bracket are zero, the elements of the random vector bold upper Y are uncorrelated. If bold upper Y is normally distributed, then a diagonal variance matrix implies that its elements are stochastically independent.

Suppose that bold upper A and bold upper B are constant (nonstochastic) matrices and that c Subscript i denotes a scalar constant. The following results are useful in manipulating covariance matrices:

StartLayout 1st Row 1st Column normal upper C normal o normal v left-bracket bold upper A bold upper Y comma bold upper U right-bracket equals 2nd Column bold upper A normal upper C normal o normal v left-bracket bold upper Y comma bold upper U right-bracket 2nd Row 1st Column normal upper C normal o normal v left-bracket bold upper Y comma bold upper B bold upper U right-bracket equals 2nd Column normal upper C normal o normal v left-bracket bold upper Y comma bold upper U right-bracket bold upper B prime 3rd Row 1st Column normal upper C normal o normal v left-bracket bold upper A bold upper Y comma bold upper B bold upper U right-bracket equals 2nd Column bold upper A normal upper C normal o normal v left-bracket bold upper Y comma bold upper U right-bracket bold upper B prime 4th Row 1st Column normal upper C normal o normal v left-bracket c 1 bold upper Y 1 plus c 2 bold upper U 1 comma c 3 bold upper Y 2 plus c 4 bold upper U 2 right-bracket equals 2nd Column c 1 c 3 normal upper C normal o normal v left-bracket bold upper Y 1 comma bold upper Y 2 right-bracket plus c 1 c 4 normal upper C normal o normal v left-bracket bold upper Y 1 comma bold upper U 2 right-bracket 5th Row 1st Column plus 2nd Column c 2 c 3 normal upper C normal o normal v left-bracket bold upper U 1 comma bold upper Y 2 right-bracket plus c 2 c 4 normal upper C normal o normal v left-bracket bold upper U 1 comma bold upper U 2 right-bracket EndLayout

Since normal upper C normal o normal v left-bracket bold upper Y comma bold upper Y right-bracket equals normal upper V normal a normal r left-bracket bold upper Y right-bracket, these results can be applied to produce the following results, useful in manipulating variances of random vectors:

StartLayout 1st Row 1st Column normal upper V normal a normal r left-bracket bold upper A right-bracket equals 2nd Column bold 0 2nd Row 1st Column normal upper V normal a normal r left-bracket bold upper A bold upper Y right-bracket equals 2nd Column bold upper A normal upper V normal a normal r left-bracket bold upper Y right-bracket bold upper A prime 3rd Row 1st Column normal upper V normal a normal r left-bracket bold upper Y plus bold x right-bracket equals 2nd Column normal upper V normal a normal r left-bracket bold upper Y right-bracket 4th Row 1st Column normal upper V normal a normal r left-bracket bold x prime bold upper Y right-bracket equals 2nd Column bold x prime normal upper V normal a normal r left-bracket bold upper Y right-bracket bold x 5th Row 1st Column normal upper V normal a normal r left-bracket c 1 bold upper Y right-bracket equals 2nd Column c 1 squared normal upper V normal a normal r left-bracket bold upper Y right-bracket 6th Row 1st Column normal upper V normal a normal r left-bracket c 1 bold upper Y plus c 2 bold upper U right-bracket equals 2nd Column c 1 squared normal upper V normal a normal r left-bracket bold upper Y right-bracket plus c 2 squared normal upper V normal a normal r left-bracket bold upper U right-bracket plus 2 c 1 c 2 normal upper C normal o normal v left-bracket bold upper Y comma bold upper U right-bracket EndLayout

Another area where expectation rules are helpful is quadratic forms in random variables. These forms arise particularly in the study of linear statistical models and in linear statistical inference. Linear inference is statistical inference about linear function of random variables, even if those random variables are defined through nonlinear models. For example, the parameter estimator ModifyingAbove bold-italic theta With caret might be derived in a nonlinear model, but this does not prevent statistical questions from being raised that can be expressed through linear functions of bold-italic theta; for example,

upper H 0 colon StartLayout Enlarged left-brace 1st Row  theta 1 minus 2 theta 2 equals 0 2nd Row  theta 2 minus theta 3 equals 0 EndLayout

if bold upper A is a matrix of constants and bold upper Y is a random vector, then

normal upper E left-bracket bold upper Y prime bold upper A bold upper Y right-bracket equals normal t normal r normal a normal c normal e left-parenthesis bold upper A normal upper V normal a normal r left-bracket bold upper Y right-bracket right-parenthesis plus normal upper E left-bracket bold upper Y right-bracket prime bold upper A normal upper E left-bracket bold upper Y right-bracket
Last updated: December 09, 2022