a. the basic principle we consider the multivariate extension of multiple linear regression –...
TRANSCRIPT
A. The Basic Principle
We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y1,…,Ym and a single set of r predictor variables z1,…,zr. Each of the m responses is assumed to follow its own regression model, i.e.,
Y1 = B01 + B11z1 + B21z2 + + Br1zr
Y2 = B02 + B12z1 + B22z2 + + Br2zr
Y1 = B01 + B11z1 + B21z2 + + Br1zr
where
V. Multivariate Linear Regression
1
2
m
ε
εE ε = E = 0, Var ε = Σ
ε
Conceptually, we can let
[zj0, zj1, …, zjr]
denote the values of the predictor variables for the jth trial and
be the responses and errors for the jth trial. Thus we have an n x (r + 1) design matrix
,
j1 j1
j2 j2j
jm jm
Y ε
Y εY = ε =
Y ε
10 11 1r
20 21 2r
n0 n1 nr
z z z
z z zZ =
z z z
'
'
'
| | |
1
11 12 1m2
21 22 2m
1 2 m
n1 n2 nm
m
ε
ε ε εε
ε ε εε = = ε ε ε =
ε ε ε
ε
If we now set
| | |
11 12 1m
21 22 2m
1 2 m
n1 n2 nm
Y Y Y
Y Y YY = = Y Y Y
Y Y Y
| | |
01 02 0m
11 12 1m
1 2 m
r1 r2 rm
β β β
β β ββ = = β β β
β β β
and
the multivariate linear regression model is
iE ε = 0
Note also that the m observed responses on the jth trial have covariance matrix
i k ikCov ε ,ε = σ I, i,k = 1, ,m
Y = Zβ + ε
with
11 12 1m
21 22 2m
m1 m2 mm
σ σ σ
σ σ σΣ =
σ σ σ
The ordinary least squares estimates are found in a manner analogous to the univariate case – we begin by taking
ˆ -1' '
i iβ = ZZ ZY
collecting the univariate least squares estimates yields
^
~
ˆ ˆ ˆˆ
-1 -1' ' ' '1 2 m 1 2 m= β | β | | β = ZZ Z Y | Y | | Y = ZZ Zβ Y
Now for any choice of parameters
1 2 mB = b | b | | b
the resulting matrix of errors is
- ZY β
The resulting Error Sums of Squares and Crossproducts is
' '
1 1 1 1 1 1 m m'
' '
m m 1 1 m m m m
Y - Zb Y - Zb Y - Zb Y - Zb
- ZB - ZB =
Y - Zb Y - Zb Y - Zb Y - Zb
Y Y
We can show that the selection b(i) = (i) minimizes the ith diagonal sum of squares
^
~~
'
i i i iY - Zb Y - Zb
i.e.,
' 'tr - ZB - ZB and - ZB - ZBY Y Y Y
are both minimized.
generalized variance
so we have matrices of predicted values
ˆˆ -1' '- Zβ = ZZ ZY Y
and we have a resulting matrices of residuals
ˆ ˆ
-1' 'ε = - = I - Z ZZ ZY Y Y
Note that the orthogonality conditions among residuals, predicted values, and columns of the design matrix which hold in the univariate case are also true in the multivariate case because
-1' ' ' ' 'Z I - Z ZZ Z = Z - Z = 0
… which means the residuals are perpendicular to the columns of the design matrix
and to the predicted values
ˆ ˆˆ
-1' ' ' ' 'ε = Z I - Z ZZ Z = 0Y β Y
Furthermore, because
ˆˆ= + εY Y
ˆ
-1' ' ' ' ' 'Zε = Z I - Z ZZ Z = Z - Z = 0
we have
ˆˆˆˆ' ' ' = + εεYY YY
total sums of squares and
crossproducts
predicted sums of squares and crossproducts
residual (error) sums of squares
and crossproducts
Example – suppose we had the following six sample observations on two independent variables (palatability and texture) and two dependent variables (purchase intent and overall quality):
Palatability TextureOverall Quality
Purchase Intent
65 71 63 6772 77 70 7077 73 72 7068 78 75 7281 76 89 8873 87 76 77
Use these data to estimate the multivariate linear regression model for which palatability and texture are independent variables while purchase intent and overall quality are the dependent variables
We wish to estimate
Y1 = B01 + B11z1 + B21z2
and
Y2 = B02 + B12z1 + B22z2
jointly.
The design matrix is
1 65 71
1 72 77
1 77 73Z =
1 68 78
1 81 76
1 73 87
so
'
1 65 71
1 72 77 1 1 1 1 1 1 6 436 462
1 77 73ZZ = 65 72 77 68 81 73 = 436 31852 33591
1 68 7871 77 73 78 76 87 462 33591 35728
1 81 76
1 73 87
-1
-1'
6 436 462
ZZ = 436 31852 33591
462 33591 35728
62.560597030 -0.378268027 -0.453330568
= -0.378268027 0.005988412 -0.000738830
-0.453330568 -0.000738830 0.006584661
and
and
'
1
63
70 1 1 1 1 1 1 445
72Zy = 65 72 77 68 81 73 = 32536
7571 77 73 78 76 87 34345
89
76
ˆ
-1' '1 1β = ZZ Zy
62.560597030 -0.378268027 -0.453330568 445
= -0.378268027 0.005988412 -0.000738830 32536
-0.453330568 -0.000738830 0.006584661 34345
-37.501205460
= 1.134583728
0.
379499410
so
and
'
2
67
70 1 1 1 1 1 1 444
70Zy = 65 72 77 68 81 73 = 32430
7271 77 73 78 76 87 34260
88
77
ˆ
-1' '2 2β = ZZ Zy
62.560597030 -0.378268027 -0.453330568 444
= -0.378268027 0.005988412 -0.000738830 32430
-0.453330568 -0.000738830 0.006584661 34260
-21.432293350
= 0.940880634
0.
351449792
so
so
ˆ ˆˆ1 2
-37.501205460 -21.432293350
= β | β = 1.134583728 0.940880634
0.379499410 0.351449792
β
This gives us estimated values matrix
ˆ
1 65 71 63.19119 64.67788
1 72 77 73.41028 73.37275-37.501205460 -21.432293350
1 77 73 77.56520 76.67135= Z = 1.134583728 0.940880634 =
1 68 78 69.2 0.379499410 0.351449792
1 81 76
1 73 87
Y β
5144 69.96067
83.24203 81.48922
78.33986 77.82812
and residuals matrix
ˆ ˆ
63 67 63.19119 64.67788 0.191194960 -2.322116943
70 70 73.41028 73.37275 3.
72 70 77.56520 76.67135ε = - = - =
75 72 69.25144 69.96067
89 88 83.24203 81.48922
76 77 78.33986 77.82812
Y Y
410277515 3.372746244
5.565198512 6.671350244
-5.748557985 -2.039326498
-5.757968347 -6.510777845
2.339855345 0.828124797
Note that each column sums to zero!
B. Inference in Multivariate Regression
The least squares estimators
= [(1) | (2) ||(m)]
of the multivariate regression model have the following properties
-
-
-
if the model is of full rank, i.e., rank(Z)= r + 1 < n. Note that and are also uncorrelated.
ˆ ˆ ˆ i iE β = β i.e., E =β β
ˆ ˆ
-1'iki kCov β ,β = σ ZZ , i,k = 1, , m
ˆ ˆˆ
' -11E ε = 0 and E εε = Σ
n - r - 1
~ ~ ~ ~
~ ~
~
This means that, for any observation z0
ˆ ˆ ˆ ˆ ˆ ˆˆ
' ' ' ' '0 0 0 0 01 2 m 1 2 mz = z β | β | | β = z β | z β | | z ββ
is an unbiased estimator, i.e.,
~
ˆ
' '0 0E z = zβ β
We can also determine from these properties that the estimation errors
ˆ
' '0 0i iz β - z β
have covariances
ˆ ˆ
ˆ ˆ
''0 0i i i i
' -1' ' '0 0 ik 0 0i i i i
E z β - β β - β z
= z E β - β β - β z = σ z ZZ z
Furthermore, we can easily ascertain that
ˆˆ '0 0z = Yβ
i.e., the forecasted vector Y0 associated with the values of the predictor variables z0 is an unbiased estimator of Y0.
The forecast errors have covariance
ˆ ˆ
-1' ' ' '0i 0 0k 0 ik 0 0i kE Y - z β Y - z β = σ 1 + z ZZ z
~
~
^
~
Thus, for the multivariate regression model with full rank (Z) = r + 1, n r + 1 + m, and normally distributed errors ,
ˆ
-1' '= ZZ Zβ Y
is the maximum likelihood estimator of and
~~
~
ˆ ˆ
~ N , Σβ β
where the elements of are~
ˆ ˆ
-1'iki kCov β ,β = σ ZZ , i, k = 1, , m
Also, the maximum likelihood estimator of
is independent of the maximum likelihood estimator of the positive definite matrix given by
ˆ ˆ ˆ
''1 1
Σ = εε = - Z - Zn n
Y β Y β
and
all of which provide additional support for using the least squares estimate – when the errors are normally distributed
~
ˆ
-1 ' and n εεβ
~
^
ˆ p,n-r-1nΣ ~ W Σ
are the maximum likelihood estimators of
and Σβ
These results can be used to develop likelihood ratio tests for the multivariate regression parameters.
The hypothesis that the responses do not depend on predictor variables zq+1, zq+2,…, zr is
1
0 2
2
H : = 0 where =
ββ β
β
~
(q + 1) x m
(r - q) x m
If we partition Z in a similar manner
1 2Z = Z | Z
m x (q + 1) m x (r - q)
Big Beta (2)
we can write the general model as
1
1 2 1 21 2
2
E = Z = Z | Z = Z + Z
βY β β β
β
The extra sum of squares associated with
(2) are ~
ˆ ˆˆ ˆ ˆ ˆ
' '
1 1 11 1- Z - Z - - Z - Z = n Σ - ΣY β Y β Y β Y β
ˆ
-1' '1 1 11 = Z Z Zβ Y
ˆ ˆ ˆ
'-1
1 1 11 1Σ = n - Z - ZY β Y β
where
and
^
The likelihood ratio for the test of the hypothesis
H0:(2) = 0
is given by the ratio of generalized variances
ˆ ˆ
ˆˆ
ˆ
ˆ
1
n 21 11,Σ
1,Σ
β
β
max L , Σ L , Σ ΣΛ = = =
Σmax L , Σ L , Σ
β β
β β
~
which is often converted to Wilks’ Lambda statistic
~
ˆ
ˆ
2 n
1
ΣΛ =
Σ
Finally, for the multivariate regression model with full rank (Z) = r + 1, n r + 1 + m, normally distributed errors , and the null hypothesis is true (so n(1 – ) ~ Wq,r-
q())
ˆ
ˆ
2m r-q
1
Σ1- n - r - 1 - m - r + q + 1 ln ~ χ
2 Σ
~
when n – r and n – m are both large.
~
~~ ~^ ^
If we again refer to the Error Sum of Squares and Crossproducts as
E = n
and the Hypothesis Sum of Squares and Crossproducts as
H = n(1 - )
then we can define Wilks’ lambda as
~
ˆ
ˆ
s2 n
i=1 i1
Σ E 1Λ = = =
E + H 1 + ηΣ
~
^
~ ~
where 1 2 s are the ordered eigienvalues of HE-1 where s = min(p, r - q).~ ~
~
1
1
η1 + η
There are other similar tests (as we have seen in our discussion of MANOVA):
s-1i
i=1 i
η= tr H H + E
1 + η
Each of these statistics is an alternative to Wilks’ lambda and perform in a very similar manner (particularly for large sample sizes).
Pillai’s Trace
Hotelling-Lawley Trace
Roy’s Greatest Root
s-1
ii=1
η = tr HE
Example – For our previous data (the following six sample observations on two independent variables - palatability and texture - and two dependent variables - purchase intent and overall quality
Palatability TextureOverall Quality
Purchase Intent
65 71 63 6772 77 70 7077 73 72 7068 78 75 7281 76 89 8873 87 76 77
to test the hypotheses that i) palatability has no joint relationship with purchase intent and overall quality and ii) texture has no joint relationship with purchase intent and overall quality.
We first test the hypothesis that palatability has no joint relationship with purchase intent and overall quality, i.e.,
H0:(1) = 0
The likelihood ratio for the test of this hypothesis is given by the ratio of generalized variances
ˆ ˆ
ˆˆ
ˆ
ˆ
2
n 22 22,Σ
2,Σ
β
β
max L , Σ L , Σ ΣΛ = = =
Σmax L , Σ L , Σ
β β
β β
For ease of computation, we’ll use the Wilks’ lambda statistic
ˆ
ˆ
2 n
2
Σ EΛ = =
E + HΣ
~
The error sum of squares and crossproducts matrix is
114.31302415 99.335143683E =
99.335143683 108.5094298
and the hypothesis sum of squares and crossproducts matrix for this null hypothesis is
214.96186763 178.26225891H =
178.26225891 147.82823253
so the calculated value of the Wilks’ lambda statistic is
2 n EΛ =
E + H
114.31302415 99.335143683
99.335143683 108.5094298 =
114.31302415 99.335143683 214.96186763 178.26225891+
99.335143683 108.5094298 178.26225891 147.82823253
2536.570299 =
7345.2380= 0.34533534
98
The transformation to a Chi-square distributed statistic (which is actually valid only when n – r and n – m are both large) is
ˆ
ˆ
1
Σ1- n - r - 1 - m - r + q + 1 ln
2 Σ
1 = - 6 - 2 - 1 - 2 - 2 + 1 + 1 ln 0.34533534
2 = 0.92351795
at = 0.01 and m(r - q) = 1 degrees of freedom, the critical value is 9.210351 - we have a strong non-rejection. Also, the approximate p-value of this chi-square test is 0.630174 – note that this is an extremely gross approximation (since n – r = 4 and n – m = 4).
We next test the hypothesis that texture has no joint relationship with purchase intent and overall quality, i.e.,
H0:(2) = 0
The likelihood ratio for the test of this hypothesis is given by the ratio of generalized variances
ˆ ˆ
ˆˆ
ˆ
ˆ
1
n 21 11,Σ
1,Σ
β
β
max L , Σ L , Σ ΣΛ = = =
Σmax L , Σ L , Σ
β β
β β
For ease of computation, we’ll use the Wilks’ lambda statistic
ˆ
ˆ
2 n
1
Σ EΛ = =
E + HΣ
~
The error sum of squares and crossproducts matrix is
114.31302415 99.335143683E =
99.335143683 108.5094298
and the hypothesis sum of squares and crossproducts matrix for this null hypothesis is
21.872015222 20.255407498H =
20.255407498 18.758286731
so the calculated value of the Wilks’ lambda statistic is
2 n EΛ =
E + H
114.31302415 99.335143683
99.335143683 108.5094298 =
114.31302415 99.335143683 21.872015222 20.255407498+
99.335143683 108.5094298 20.255407498 18.758286731
2536.570299 =
3030.0590= 0.837135598
55
The transformation to a Chi-square distributed statistic (which is actually valid only when n – r and n – m are both large) is
ˆ
ˆ
1
Σ1- n - r - 1 - m - r + q + 1 ln
2 Σ
1 = - 6 - 2 - 1 - 2 - 2 + 1 + 1 ln 0.837135598
2 = 0.15440838
at = 0.01 and m(r - q) = 1 degrees of freedom, the critical value is 9.210351 - we have a strong non-rejection. Also, the approximate p-value of this chi-square test is 0.925701 - note that this is an extremely gross approximation (since n – r = 4 and n – m = 4).
OPTIONS LINESIZE = 72 NODATE PAGENO = 1;DATA stuff;INPUT z1 z2 y1 y2;LABEL z1='Palatability Rating' z2='Texture Rating' y1='Overall Quality Rating' y2='Purchase Intent';CARDS;65 71 63 6772 77 70 7077 73 72 7068 78 75 7281 76 89 8873 87 76 77;PROC GLM DATA=stuff;MODEL y1 y2 = z1 z2/;MANOVA H=z1 z2/PRINTE PRINTH;TITLE4 'Using PROC GLM for Multivariate Linear Regression';RUN;
SAS code for a Multivariate Linear Regression Analysis:
Dependent Variable: y1 Overall Quality Rating
Sum ofSource DF Squares Mean Square F Value Pr > FModel 2 256.5203092 128.2601546 3.37 0.1711Error 3 114.3130241 38.1043414Corrected Total 5 370.8333333
R-Square Coeff Var Root MSE y1 Mean 0.691740 8.322973 6.172871 74.16667
Source DF Type I SS Mean Square F Value Pr > Fz1 1 234.6482940 234.6482940 6.16 0.0891z2 1 21.8720152 21.8720152 0.57 0.5037
Source DF Type III SS Mean Square F Value Pr > Fz1 1 214.9618676 214.9618676 5.64 0.0980z2 1 21.8720152 21.8720152 0.57 0.5037
Dependent Variable: y1 Overall Quality Rating
Standard Parameter Estimate Error t Value Pr > |t| Intercept -37.50120546 48.82448511 -0.77 0.4984 z1 1.13458373 0.47768661 2.38 0.0980 z2 0.37949941 0.50090335 0.76 0.5037
SAS output for a Multivariate Linear Regression Analysis:
Dependent Variable: y2 Purchase Intent
Sum ofSource DF Squares Mean Square F Value Pr > FModel 2 181.4905702 90.7452851 2.51 0.2289Error 3 108.5094298 36.1698099Corrected Total 5 290.0000000
R-Square Coeff Var Root MSE y2 Mean 0.625830 8.127208 6.014134 74.00000
Source DF Type I SS Mean Square F Value Pr > Fz1 1 162.7322835 162.7322835 4.50 0.1241z2 1 18.7582867 18.7582867 0.52 0.5235
Source DF Type III SS Mean Square F Value Pr > Fz1 1 147.8282325 147.8282325 4.09 0.1364z2 1 18.7582867 18.7582867 0.52 0.5235
Dependent Variable: y2 Purchase Intent
Standard Parameter Estimate Error t Value Pr > |t| Intercept -21.43229335 47.56894895 -0.45 0.6829 z1 0.94088063 0.46540276 2.02 0.1364 z2 0.35144979 0.48802247 0.72 0.5235
SAS output for a Multivariate Linear Regression Analysis:
The GLM Procedure Multivariate Analysis of Variance
E = Error SSCP Matrix y1 y2 y1 114.31302415 99.335143683 y2 99.335143683 108.5094298
Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r|
DF = 3 y1 y2 y1 1.000000 0.891911 0.1081 y2 0.891911 1.000000 0.1081
SAS output for a Multivariate Linear Regression Analysis:
The GLM Procedure Multivariate Analysis of Variance
H = Type III SSCP Matrix for z1 y1 y2 y1 214.96186763 178.26225891 y2 178.26225891 147.82823253
Characteristic Roots and Vectors of: E Inverse * H, where H = Type III SSCP Matrix for z1 E = Error SSCP Matrix Characteristic Characteristic Vector V'EV=1 Root Percent y1 y2 1.89573606 100.00 0.10970859 -0.01905206 0.00000000 0.00 -0.17533407 0.21143084
MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall z1 Effect H = Type III SSCP Matrix for z1 E = Error SSCP Matrix
S=1 M=0 N=0Statistic Value F Value Num DF Den DF Pr > FWilks' Lambda 0.34533534 1.90 2 2 0.3453Pillai's Trace 0.65466466 1.90 2 2 0.3453Hotelling-Lawley Trace 1.89573606 1.90 2 2 0.3453Roy's Greatest Root 1.89573606 1.90 2 2 0.3453
SAS output for a Multivariate Linear Regression Analysis:
The GLM Procedure Multivariate Analysis of Variance H = Type III SSCP Matrix for z2 y1 y2 y1 21.872015222 20.255407498 y2 20.255407498 18.758286731 Characteristic Roots and Vectors of: E Inverse * H, where H = Type III SSCP Matrix for z2 E = Error SSCP Matrix Characteristic Characteristic Vector V'EV=1 Root Percent y1 y2 0.19454961 100.00 0.06903935 0.02729059 0.00000000 0.00 -0.19496558 0.21052601
MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall z2 Effect H = Type III SSCP Matrix for z2 E = Error SSCP Matrix
S=1 M=0 N=0Statistic Value F Value Num DF Den DF Pr > FWilks' Lambda 0.83713560 0.19 2 2 0.8371Pillai's Trace 0.16286440 0.19 2 2 0.8371Hotelling-Lawley Trace 0.19454961 0.19 2 2 0.8371Roy's Greatest Root 0.19454961 0.19 2 2 0.8371
SAS output for a Multivariate Linear Regression Analysis:
We can also build confidence intervals for
the predicted mean value of Y0 associated with z0 - if the model
and
ˆ
= Z + εY β
ˆ
-1' '0 m 0 0 0
' 'z ~ N z ,z ZZ z Σβ β
has normal errors, then
~ ~
ˆ n-r-1nΣ ~ W Σ
independent
so
ˆˆ ˆ
'
-10 0 0 02
-1 -1' ' ' '0 0 0 0
' ' ' 'z - z z - znT = Σ
n - r - 1z ZZ z z ZZ z
β β β β
ˆˆ ˆ
-1'-1' '
0 0 0 0 0 0 m,n-r-m
m n - r - 1n' ' ' 'z - z Σ z - z z ZZ z F αn - r - 1 n - r - m
β β β β
Thus the 100(1 – )% confidence interval
for the predicted mean value of Y0
associated with z0 (’z0) is given by~ ~
and the 100(1 – )% simultaneous confidence intervals for the mean value of
Yi associated with z0 (z’0 (i) ) are
~ ~
~~~
ˆ ˆ
-1' ' '0 m,n-r-m 0 0 iii
m n - r - 1 nz β F α z ZZ z σ
n - r - m n - r - 1
i = 1,…,m
~
Finally, we can build prediction intervals for the predicted value of Y0 associated with z0 – here the prediction error
and
ˆ
= Z + εY β
ˆ
-1' '0 m 0 0 0
' 'z ~ N z ,z ZZ z Σβ β
has normal errors, then
~ ~
ˆ n-r-1nΣ ~ W Σ
independent
so
ˆˆ ˆ
'
-10 0 0 02
-1 -1' ' ' '0 0 0 0
' ' ' 'z - z z - znT = Σ
n - r - 1z ZZ z z ZZ z
β β β β
ˆˆ ˆ
-1'-1' '
0 0 0 0 0 0 m,n-r-m
m n - r - 1n' 'Y - z Σ Y - z 1 + z ZZ z F αn - r - 1 n - r - m
β β
the prediction intervals the 100(1 – )% prediction interval associated with z0 is given by
and the 100(1 – )% simultaneous prediction intervals with z0 are
~
~
ˆ ˆ
-1' ' '0 m,n-r-m 0 0 iii
m n - r - 1 nz β F α 1 + z ZZ z σ
n - r - m n - r - 1
i = 1,…,m