a. the basic principle we consider the multivariate extension of multiple linear regression –...

45
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1 ,…,Y m and a single set of r predictor variables z 1 ,…,z r . Each of the m responses is assumed to follow its own regression model, i.e., Y 1 = B 01 + B 11 z 1 + B 21 z 2 + + B r1 z r Y 2 = B 02 + B 12 z 1 + B 22 z 2 + + B r2 z r Y1 = B 01 + B 11 z 1 + B 21 z 2 + + B r1 z r where V. Multivariate Linear Regression 1 2 m ε ε E ε = E = 0, Var ε ε

Upload: leah-petit

Post on 14-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

A. The Basic Principle

We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y1,…,Ym and a single set of r predictor variables z1,…,zr. Each of the m responses is assumed to follow its own regression model, i.e.,

Y1 = B01 + B11z1 + B21z2 + + Br1zr

Y2 = B02 + B12z1 + B22z2 + + Br2zr

Y1 = B01 + B11z1 + B21z2 + + Br1zr

where

V. Multivariate Linear Regression

1

2

m

ε

εE ε = E = 0, Var ε = Σ

ε

Conceptually, we can let

[zj0, zj1, …, zjr]

denote the values of the predictor variables for the jth trial and

be the responses and errors for the jth trial. Thus we have an n x (r + 1) design matrix

,

j1 j1

j2 j2j

jm jm

Y ε

Y εY = ε =

Y ε

10 11 1r

20 21 2r

n0 n1 nr

z z z

z z zZ =

z z z

'

'

'

| | |

1

11 12 1m2

21 22 2m

1 2 m

n1 n2 nm

m

ε

ε ε εε

ε ε εε = = ε ε ε =

ε ε ε

ε

If we now set

| | |

11 12 1m

21 22 2m

1 2 m

n1 n2 nm

Y Y Y

Y Y YY = = Y Y Y

Y Y Y

| | |

01 02 0m

11 12 1m

1 2 m

r1 r2 rm

β β β

β β ββ = = β β β

β β β

and

the multivariate linear regression model is

iE ε = 0

Note also that the m observed responses on the jth trial have covariance matrix

i k ikCov ε ,ε = σ I, i,k = 1, ,m

Y = Zβ + ε

with

11 12 1m

21 22 2m

m1 m2 mm

σ σ σ

σ σ σΣ =

σ σ σ

The ordinary least squares estimates are found in a manner analogous to the univariate case – we begin by taking

ˆ -1' '

i iβ = ZZ ZY

collecting the univariate least squares estimates yields

^

~

ˆ ˆ ˆˆ

-1 -1' ' ' '1 2 m 1 2 m= β | β | | β = ZZ Z Y | Y | | Y = ZZ Zβ Y

Now for any choice of parameters

1 2 mB = b | b | | b

the resulting matrix of errors is

- ZY β

The resulting Error Sums of Squares and Crossproducts is

' '

1 1 1 1 1 1 m m'

' '

m m 1 1 m m m m

Y - Zb Y - Zb Y - Zb Y - Zb

- ZB - ZB =

Y - Zb Y - Zb Y - Zb Y - Zb

Y Y

We can show that the selection b(i) = (i) minimizes the ith diagonal sum of squares

^

~~

'

i i i iY - Zb Y - Zb

i.e.,

' 'tr - ZB - ZB and - ZB - ZBY Y Y Y

are both minimized.

generalized variance

so we have matrices of predicted values

ˆˆ -1' '- Zβ = ZZ ZY Y

and we have a resulting matrices of residuals

ˆ ˆ

-1' 'ε = - = I - Z ZZ ZY Y Y

Note that the orthogonality conditions among residuals, predicted values, and columns of the design matrix which hold in the univariate case are also true in the multivariate case because

-1' ' ' ' 'Z I - Z ZZ Z = Z - Z = 0

… which means the residuals are perpendicular to the columns of the design matrix

and to the predicted values

ˆ ˆˆ

-1' ' ' ' 'ε = Z I - Z ZZ Z = 0Y β Y

Furthermore, because

ˆˆ= + εY Y

ˆ

-1' ' ' ' ' 'Zε = Z I - Z ZZ Z = Z - Z = 0

we have

ˆˆˆˆ' ' ' = + εεYY YY

total sums of squares and

crossproducts

predicted sums of squares and crossproducts

residual (error) sums of squares

and crossproducts

Example – suppose we had the following six sample observations on two independent variables (palatability and texture) and two dependent variables (purchase intent and overall quality):

Palatability TextureOverall Quality

Purchase Intent

65 71 63 6772 77 70 7077 73 72 7068 78 75 7281 76 89 8873 87 76 77

Use these data to estimate the multivariate linear regression model for which palatability and texture are independent variables while purchase intent and overall quality are the dependent variables

We wish to estimate

Y1 = B01 + B11z1 + B21z2

and

Y2 = B02 + B12z1 + B22z2

jointly.

The design matrix is

1 65 71

1 72 77

1 77 73Z =

1 68 78

1 81 76

1 73 87

so

'

1 65 71

1 72 77 1 1 1 1 1 1 6 436 462

1 77 73ZZ = 65 72 77 68 81 73 = 436 31852 33591

1 68 7871 77 73 78 76 87 462 33591 35728

1 81 76

1 73 87

-1

-1'

6 436 462

ZZ = 436 31852 33591

462 33591 35728

62.560597030 -0.378268027 -0.453330568

= -0.378268027 0.005988412 -0.000738830

-0.453330568 -0.000738830 0.006584661

and

and

'

1

63

70 1 1 1 1 1 1 445

72Zy = 65 72 77 68 81 73 = 32536

7571 77 73 78 76 87 34345

89

76

ˆ

-1' '1 1β = ZZ Zy

62.560597030 -0.378268027 -0.453330568 445

= -0.378268027 0.005988412 -0.000738830 32536

-0.453330568 -0.000738830 0.006584661 34345

-37.501205460

= 1.134583728

0.

379499410

so

and

'

2

67

70 1 1 1 1 1 1 444

70Zy = 65 72 77 68 81 73 = 32430

7271 77 73 78 76 87 34260

88

77

ˆ

-1' '2 2β = ZZ Zy

62.560597030 -0.378268027 -0.453330568 444

= -0.378268027 0.005988412 -0.000738830 32430

-0.453330568 -0.000738830 0.006584661 34260

-21.432293350

= 0.940880634

0.

351449792

so

so

ˆ ˆˆ1 2

-37.501205460 -21.432293350

= β | β = 1.134583728 0.940880634

0.379499410 0.351449792

β

This gives us estimated values matrix

ˆ

1 65 71 63.19119 64.67788

1 72 77 73.41028 73.37275-37.501205460 -21.432293350

1 77 73 77.56520 76.67135= Z = 1.134583728 0.940880634 =

1 68 78 69.2 0.379499410 0.351449792

1 81 76

1 73 87

Y β

5144 69.96067

83.24203 81.48922

78.33986 77.82812

and residuals matrix

ˆ ˆ

63 67 63.19119 64.67788 0.191194960 -2.322116943

70 70 73.41028 73.37275 3.

72 70 77.56520 76.67135ε = - = - =

75 72 69.25144 69.96067

89 88 83.24203 81.48922

76 77 78.33986 77.82812

Y Y

410277515 3.372746244

5.565198512 6.671350244

-5.748557985 -2.039326498

-5.757968347 -6.510777845

2.339855345 0.828124797

Note that each column sums to zero!

B. Inference in Multivariate Regression

The least squares estimators

= [(1) | (2) ||(m)]

of the multivariate regression model have the following properties

-

-

-

if the model is of full rank, i.e., rank(Z)= r + 1 < n. Note that and are also uncorrelated.

ˆ ˆ ˆ i iE β = β i.e., E =β β

ˆ ˆ

-1'iki kCov β ,β = σ ZZ , i,k = 1, , m

ˆ ˆˆ

' -11E ε = 0 and E εε = Σ

n - r - 1

~ ~ ~ ~

~ ~

~

This means that, for any observation z0

ˆ ˆ ˆ ˆ ˆ ˆˆ

' ' ' ' '0 0 0 0 01 2 m 1 2 mz = z β | β | | β = z β | z β | | z ββ

is an unbiased estimator, i.e.,

~

ˆ

' '0 0E z = zβ β

We can also determine from these properties that the estimation errors

ˆ

' '0 0i iz β - z β

have covariances

ˆ ˆ

ˆ ˆ

''0 0i i i i

' -1' ' '0 0 ik 0 0i i i i

E z β - β β - β z

= z E β - β β - β z = σ z ZZ z

Furthermore, we can easily ascertain that

ˆˆ '0 0z = Yβ

i.e., the forecasted vector Y0 associated with the values of the predictor variables z0 is an unbiased estimator of Y0.

The forecast errors have covariance

ˆ ˆ

-1' ' ' '0i 0 0k 0 ik 0 0i kE Y - z β Y - z β = σ 1 + z ZZ z

~

~

^

~

Thus, for the multivariate regression model with full rank (Z) = r + 1, n r + 1 + m, and normally distributed errors ,

ˆ

-1' '= ZZ Zβ Y

is the maximum likelihood estimator of and

~~

~

ˆ ˆ

~ N , Σβ β

where the elements of are~

ˆ ˆ

-1'iki kCov β ,β = σ ZZ , i, k = 1, , m

Also, the maximum likelihood estimator of

is independent of the maximum likelihood estimator of the positive definite matrix given by

ˆ ˆ ˆ

''1 1

Σ = εε = - Z - Zn n

Y β Y β

and

all of which provide additional support for using the least squares estimate – when the errors are normally distributed

~

ˆ

-1 ' and n εεβ

~

^

ˆ p,n-r-1nΣ ~ W Σ

are the maximum likelihood estimators of

and Σβ

These results can be used to develop likelihood ratio tests for the multivariate regression parameters.

The hypothesis that the responses do not depend on predictor variables zq+1, zq+2,…, zr is

1

0 2

2

H : = 0 where =

ββ β

β

~

(q + 1) x m

(r - q) x m

If we partition Z in a similar manner

1 2Z = Z | Z

m x (q + 1) m x (r - q)

Big Beta (2)

we can write the general model as

1

1 2 1 21 2

2

E = Z = Z | Z = Z + Z

βY β β β

β

The extra sum of squares associated with

(2) are ~

ˆ ˆˆ ˆ ˆ ˆ

' '

1 1 11 1- Z - Z - - Z - Z = n Σ - ΣY β Y β Y β Y β

ˆ

-1' '1 1 11 = Z Z Zβ Y

ˆ ˆ ˆ

'-1

1 1 11 1Σ = n - Z - ZY β Y β

where

and

^

The likelihood ratio for the test of the hypothesis

H0:(2) = 0

is given by the ratio of generalized variances

ˆ ˆ

ˆˆ

ˆ

ˆ

1

n 21 11,Σ

1,Σ

β

β

max L , Σ L , Σ ΣΛ = = =

Σmax L , Σ L , Σ

β β

β β

~

which is often converted to Wilks’ Lambda statistic

~

ˆ

ˆ

2 n

1

ΣΛ =

Σ

Finally, for the multivariate regression model with full rank (Z) = r + 1, n r + 1 + m, normally distributed errors , and the null hypothesis is true (so n(1 – ) ~ Wq,r-

q())

ˆ

ˆ

2m r-q

1

Σ1- n - r - 1 - m - r + q + 1 ln ~ χ

2 Σ

~

when n – r and n – m are both large.

~

~~ ~^ ^

If we again refer to the Error Sum of Squares and Crossproducts as

E = n

and the Hypothesis Sum of Squares and Crossproducts as

H = n(1 - )

then we can define Wilks’ lambda as

~

ˆ

ˆ

s2 n

i=1 i1

Σ E 1Λ = = =

E + H 1 + ηΣ

~

^

~ ~

where 1 2 s are the ordered eigienvalues of HE-1 where s = min(p, r - q).~ ~

~

1

1

η1 + η

There are other similar tests (as we have seen in our discussion of MANOVA):

s-1i

i=1 i

η= tr H H + E

1 + η

Each of these statistics is an alternative to Wilks’ lambda and perform in a very similar manner (particularly for large sample sizes).

Pillai’s Trace

Hotelling-Lawley Trace

Roy’s Greatest Root

s-1

ii=1

η = tr HE

Example – For our previous data (the following six sample observations on two independent variables - palatability and texture - and two dependent variables - purchase intent and overall quality

Palatability TextureOverall Quality

Purchase Intent

65 71 63 6772 77 70 7077 73 72 7068 78 75 7281 76 89 8873 87 76 77

to test the hypotheses that i) palatability has no joint relationship with purchase intent and overall quality and ii) texture has no joint relationship with purchase intent and overall quality.

We first test the hypothesis that palatability has no joint relationship with purchase intent and overall quality, i.e.,

H0:(1) = 0

The likelihood ratio for the test of this hypothesis is given by the ratio of generalized variances

ˆ ˆ

ˆˆ

ˆ

ˆ

2

n 22 22,Σ

2,Σ

β

β

max L , Σ L , Σ ΣΛ = = =

Σmax L , Σ L , Σ

β β

β β

For ease of computation, we’ll use the Wilks’ lambda statistic

ˆ

ˆ

2 n

2

Σ EΛ = =

E + HΣ

~

The error sum of squares and crossproducts matrix is

114.31302415 99.335143683E =

99.335143683 108.5094298

and the hypothesis sum of squares and crossproducts matrix for this null hypothesis is

214.96186763 178.26225891H =

178.26225891 147.82823253

so the calculated value of the Wilks’ lambda statistic is

2 n EΛ =

E + H

114.31302415 99.335143683

99.335143683 108.5094298 =

114.31302415 99.335143683 214.96186763 178.26225891+

99.335143683 108.5094298 178.26225891 147.82823253

2536.570299 =

7345.2380= 0.34533534

98

The transformation to a Chi-square distributed statistic (which is actually valid only when n – r and n – m are both large) is

ˆ

ˆ

1

Σ1- n - r - 1 - m - r + q + 1 ln

2 Σ

1 = - 6 - 2 - 1 - 2 - 2 + 1 + 1 ln 0.34533534

2 = 0.92351795

at = 0.01 and m(r - q) = 1 degrees of freedom, the critical value is 9.210351 - we have a strong non-rejection. Also, the approximate p-value of this chi-square test is 0.630174 – note that this is an extremely gross approximation (since n – r = 4 and n – m = 4).

We next test the hypothesis that texture has no joint relationship with purchase intent and overall quality, i.e.,

H0:(2) = 0

The likelihood ratio for the test of this hypothesis is given by the ratio of generalized variances

ˆ ˆ

ˆˆ

ˆ

ˆ

1

n 21 11,Σ

1,Σ

β

β

max L , Σ L , Σ ΣΛ = = =

Σmax L , Σ L , Σ

β β

β β

For ease of computation, we’ll use the Wilks’ lambda statistic

ˆ

ˆ

2 n

1

Σ EΛ = =

E + HΣ

~

The error sum of squares and crossproducts matrix is

114.31302415 99.335143683E =

99.335143683 108.5094298

and the hypothesis sum of squares and crossproducts matrix for this null hypothesis is

21.872015222 20.255407498H =

20.255407498 18.758286731

so the calculated value of the Wilks’ lambda statistic is

2 n EΛ =

E + H

114.31302415 99.335143683

99.335143683 108.5094298 =

114.31302415 99.335143683 21.872015222 20.255407498+

99.335143683 108.5094298 20.255407498 18.758286731

2536.570299 =

3030.0590= 0.837135598

55

The transformation to a Chi-square distributed statistic (which is actually valid only when n – r and n – m are both large) is

ˆ

ˆ

1

Σ1- n - r - 1 - m - r + q + 1 ln

2 Σ

1 = - 6 - 2 - 1 - 2 - 2 + 1 + 1 ln 0.837135598

2 = 0.15440838

at = 0.01 and m(r - q) = 1 degrees of freedom, the critical value is 9.210351 - we have a strong non-rejection. Also, the approximate p-value of this chi-square test is 0.925701 - note that this is an extremely gross approximation (since n – r = 4 and n – m = 4).

OPTIONS LINESIZE = 72 NODATE PAGENO = 1;DATA stuff;INPUT z1 z2 y1 y2;LABEL z1='Palatability Rating' z2='Texture Rating' y1='Overall Quality Rating' y2='Purchase Intent';CARDS;65 71 63 6772 77 70 7077 73 72 7068 78 75 7281 76 89 8873 87 76 77;PROC GLM DATA=stuff;MODEL y1 y2 = z1 z2/;MANOVA H=z1 z2/PRINTE PRINTH;TITLE4 'Using PROC GLM for Multivariate Linear Regression';RUN;

SAS code for a Multivariate Linear Regression Analysis:

Dependent Variable: y1 Overall Quality Rating

Sum ofSource DF Squares Mean Square F Value Pr > FModel 2 256.5203092 128.2601546 3.37 0.1711Error 3 114.3130241 38.1043414Corrected Total 5 370.8333333

R-Square Coeff Var Root MSE y1 Mean 0.691740 8.322973 6.172871 74.16667

Source DF Type I SS Mean Square F Value Pr > Fz1 1 234.6482940 234.6482940 6.16 0.0891z2 1 21.8720152 21.8720152 0.57 0.5037

Source DF Type III SS Mean Square F Value Pr > Fz1 1 214.9618676 214.9618676 5.64 0.0980z2 1 21.8720152 21.8720152 0.57 0.5037

Dependent Variable: y1 Overall Quality Rating

Standard Parameter Estimate Error t Value Pr > |t| Intercept -37.50120546 48.82448511 -0.77 0.4984 z1 1.13458373 0.47768661 2.38 0.0980 z2 0.37949941 0.50090335 0.76 0.5037

SAS output for a Multivariate Linear Regression Analysis:

Dependent Variable: y2 Purchase Intent

Sum ofSource DF Squares Mean Square F Value Pr > FModel 2 181.4905702 90.7452851 2.51 0.2289Error 3 108.5094298 36.1698099Corrected Total 5 290.0000000

R-Square Coeff Var Root MSE y2 Mean 0.625830 8.127208 6.014134 74.00000

Source DF Type I SS Mean Square F Value Pr > Fz1 1 162.7322835 162.7322835 4.50 0.1241z2 1 18.7582867 18.7582867 0.52 0.5235

Source DF Type III SS Mean Square F Value Pr > Fz1 1 147.8282325 147.8282325 4.09 0.1364z2 1 18.7582867 18.7582867 0.52 0.5235

Dependent Variable: y2 Purchase Intent

Standard Parameter Estimate Error t Value Pr > |t| Intercept -21.43229335 47.56894895 -0.45 0.6829 z1 0.94088063 0.46540276 2.02 0.1364 z2 0.35144979 0.48802247 0.72 0.5235

SAS output for a Multivariate Linear Regression Analysis:

The GLM Procedure Multivariate Analysis of Variance

E = Error SSCP Matrix y1 y2 y1 114.31302415 99.335143683 y2 99.335143683 108.5094298

Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r|

DF = 3 y1 y2 y1 1.000000 0.891911 0.1081 y2 0.891911 1.000000 0.1081

SAS output for a Multivariate Linear Regression Analysis:

The GLM Procedure Multivariate Analysis of Variance

H = Type III SSCP Matrix for z1 y1 y2 y1 214.96186763 178.26225891 y2 178.26225891 147.82823253

Characteristic Roots and Vectors of: E Inverse * H, where H = Type III SSCP Matrix for z1 E = Error SSCP Matrix Characteristic Characteristic Vector V'EV=1 Root Percent y1 y2 1.89573606 100.00 0.10970859 -0.01905206 0.00000000 0.00 -0.17533407 0.21143084

MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall z1 Effect H = Type III SSCP Matrix for z1 E = Error SSCP Matrix

S=1 M=0 N=0Statistic Value F Value Num DF Den DF Pr > FWilks' Lambda 0.34533534 1.90 2 2 0.3453Pillai's Trace 0.65466466 1.90 2 2 0.3453Hotelling-Lawley Trace 1.89573606 1.90 2 2 0.3453Roy's Greatest Root 1.89573606 1.90 2 2 0.3453

SAS output for a Multivariate Linear Regression Analysis:

The GLM Procedure Multivariate Analysis of Variance H = Type III SSCP Matrix for z2 y1 y2 y1 21.872015222 20.255407498 y2 20.255407498 18.758286731 Characteristic Roots and Vectors of: E Inverse * H, where H = Type III SSCP Matrix for z2 E = Error SSCP Matrix Characteristic Characteristic Vector V'EV=1 Root Percent y1 y2 0.19454961 100.00 0.06903935 0.02729059 0.00000000 0.00 -0.19496558 0.21052601

MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall z2 Effect H = Type III SSCP Matrix for z2 E = Error SSCP Matrix

S=1 M=0 N=0Statistic Value F Value Num DF Den DF Pr > FWilks' Lambda 0.83713560 0.19 2 2 0.8371Pillai's Trace 0.16286440 0.19 2 2 0.8371Hotelling-Lawley Trace 0.19454961 0.19 2 2 0.8371Roy's Greatest Root 0.19454961 0.19 2 2 0.8371

SAS output for a Multivariate Linear Regression Analysis:

We can also build confidence intervals for

the predicted mean value of Y0 associated with z0 - if the model

and

ˆ

= Z + εY β

ˆ

-1' '0 m 0 0 0

' 'z ~ N z ,z ZZ z Σβ β

has normal errors, then

~ ~

ˆ n-r-1nΣ ~ W Σ

independent

so

ˆˆ ˆ

'

-10 0 0 02

-1 -1' ' ' '0 0 0 0

' ' ' 'z - z z - znT = Σ

n - r - 1z ZZ z z ZZ z

β β β β

ˆˆ ˆ

-1'-1' '

0 0 0 0 0 0 m,n-r-m

m n - r - 1n' ' ' 'z - z Σ z - z z ZZ z F αn - r - 1 n - r - m

β β β β

Thus the 100(1 – )% confidence interval

for the predicted mean value of Y0

associated with z0 (’z0) is given by~ ~

and the 100(1 – )% simultaneous confidence intervals for the mean value of

Yi associated with z0 (z’0 (i) ) are

~ ~

~~~

ˆ ˆ

-1' ' '0 m,n-r-m 0 0 iii

m n - r - 1 nz β F α z ZZ z σ

n - r - m n - r - 1

i = 1,…,m

~

Finally, we can build prediction intervals for the predicted value of Y0 associated with z0 – here the prediction error

and

ˆ

= Z + εY β

ˆ

-1' '0 m 0 0 0

' 'z ~ N z ,z ZZ z Σβ β

has normal errors, then

~ ~

ˆ n-r-1nΣ ~ W Σ

independent

so

ˆˆ ˆ

'

-10 0 0 02

-1 -1' ' ' '0 0 0 0

' ' ' 'z - z z - znT = Σ

n - r - 1z ZZ z z ZZ z

β β β β

ˆˆ ˆ

-1'-1' '

0 0 0 0 0 0 m,n-r-m

m n - r - 1n' 'Y - z Σ Y - z 1 + z ZZ z F αn - r - 1 n - r - m

β β

the prediction intervals the 100(1 – )% prediction interval associated with z0 is given by

and the 100(1 – )% simultaneous prediction intervals with z0 are

~

~

ˆ ˆ

-1' ' '0 m,n-r-m 0 0 iii

m n - r - 1 nz β F α 1 + z ZZ z σ

n - r - m n - r - 1

i = 1,…,m