high-dimensional covariance matrix estimation with ... · bias-variance trade-o frobenius norm:...
TRANSCRIPT
High-dimensional covariance matrix estimationwith applications to microarray studies and
portfolio optimization
Esa Ollila
Department of Signal Processing and AcousticsAalto University, Finland
June 7th, 2018, Gipsa-Lab
Covariance estimation problem
x : p-variate (centered) random vector (p large)
x1, . . . ,xn i.i.d. realizations of x.
Problem: Find an estimate Σ of the pos. def. covariance matrix
Σ = E[(x− E[x])(x− E[x])>
]∈ Sp×p++
The sample covariance matrix (SCM),
S =1
n− 1
n∑
i=1
(xi − x)(xi − x)>,
is the most commonly used estimator of Σ.
Challenges in HD:
1 Insufficient sample support (ISS) case: p > n.=⇒ Estimate of Σ−1 can not be computed!
2 Low sample support (LSS) (i.e., p of the same magnitude as n)=⇒ Σ is estimated with a lot of error.
3 Outliers or heavy-tailed non-Gaussian data
2/40
Why covariance estimation?
Portfolio selection Discriminant AnalysisThe Most Important Applications
graphical models clustering/discriminant analysis
PCAradar detection
Ilya Soloveychik (HUJI) Robust Covariance Estimation 15 / 47
PCA
The Most Important Applications
graphical models clustering/discriminant analysis
PCA radar detection
Ilya Soloveychik (HUJI) Robust Covariance Estimation 15 / 47
Radar detection
Graphical models
Gaussian graphical model
n-dimensional Gaussian vector
x = (x1, . . . , xn) ∼ N (0,Σ)
xi, xj are conditionally independent (given the rest of x) if
(Σ−1)ij = 0
modeled as undirected graph with n nodes; arc i, j is absent if (Σ−1)ij = 0
1
2
34
5Σ−1 =
• • 0 • •• • • 0 •0 • • • 0• 0 • • 0• • 0 0 •
13/40
Bias-variance trade-off
Frobenius norm: ‖A‖2F = tr(A2).
Any estimator Σ ∈ Sp×p++ of Σ verifies
MSE(Σ) = var(Σ) + bias(Σ)
where MSE(Σ) , E[‖Σ−Σ‖2F].Suppose Σ = S. Then, since S is unbiased, i.e.,bias(Σ) = ‖E[Σ]−Σ‖2F = 0, one has that
MSE(Σ) = var(S) (= tr{cov(vec(S))})
7 However, S is not applicable for high-dimensional problems
n ≈ p ⇒ var(S) is very largep > n or p� n ⇒ S is singular (non-invertible).
3 Use an estimator Σ = Sβ that shrinks S towards a structure (e.g., ascaled identity matrix) using a tuning (shrinkage) parameter β
MSE(Σ) can be reduced by introducing some bias!Positive definiteness of Σ can be ensured!
4/40
Bias-variance trade-off
Frobenius norm: ‖A‖2F = tr(A2).
Any estimator Σ ∈ Sp×p++ of Σ verifies
MSE(Σ) = var(Σ) + bias(Σ)
where MSE(Σ) , E[‖Σ−Σ‖2F].Suppose Σ = S. Then, since S is unbiased, i.e.,bias(Σ) = ‖E[Σ]−Σ‖2F = 0, one has that
MSE(Σ) = var(S) (= tr{cov(vec(S))})
7 However, S is not applicable for high-dimensional problems
n ≈ p ⇒ var(S) is very largep > n or p� n ⇒ S is singular (non-invertible).
3 Use an estimator Σ = Sβ that shrinks S towards a structure (e.g., ascaled identity matrix) using a tuning (shrinkage) parameter β
MSE(Σ) can be reduced by introducing some bias!Positive definiteness of Σ can be ensured!
4/40
Bias-variance trade-off (cont’d)
Regularized SCM (RSCM) a la Ledoit and Wolf
Sβ = βS + (1− β)[tr(S)/p]Iwhere β > 0 denotes the shrinkage (regularization) parameter.
,_
e ,_
w
Total error Variance
Bias2
Tuning parameter
5/40
This talk: choose β optimally under the random samplingfrom elliptical distribution + applications
Esa Ollila. Optimal high-dimensional shrinkage covariance estimation forelliptical distributions. In EUSIPCO’17, Kos, Greece, 2017, pp1689–1693.
Esa Ollila and Elias Raninen. Optimal shrinkage covariance matrixestimator under random sampling from elliptical distributions. in Arxivsoon.
M. N. Tabassum and E. Ollila. Compressive regularized discriminantanalysis of high-dimensional data with applications to microarraystudies. In ICASSP’18, Calgary, Canada, 2017, pp. 4204 –4208.
6/40
1 Portfolio optimization
2 Ell-RSCM estimators
3 Estimates of oracle parameters
4 Simulation studies
5 Compressive Regularized Discriminant Analysis
Modern portfolio theory (MPT)
mathematical framework for portfolio allocations that balances thereturn-risk tradeoff by ??.
Important inputs by ???∗
A portfolio consist of p assets, e.g.:
- equity securities (stocks), market indexes- fixed-income securities (e.g., government or corporate bonds),
commodities (e.g., oil, gold, etc), ETF-s- currencies (exchange rates), derivatives, . . . ,
To use MPT one needs to estimate the mean vector µ and thecovariance matrix Σ of asset returns.
7 often p, the number of assets is larger (or of similar magnitude) to n,the number of historical returns.
⇒ MPT standard plug-in estimates is not applicable
∗Noble price winners: James Tobin (1981), Harry Markovitz (1990) and Willian F.
Sharpe (1990), and Eugene F. Fama (2013)
7/40
Basic definitions
Portfolio weight at (discrete) time index t:
wt = (wt,1, . . . , wt,p)> s.t.
p∑
i=1
wt,i = 1>wt = 1
Let Ci,t > 0 be the price of ith asset
The net return of the ith asset over one interval is
ri,t =Ci,t − Ci,t−1
Ci,t−1=
Ci,tCi,t−1
− 1 ∈ [−1,∞)
Single period net returns of p assets is a p-variate vector
rt = (r1,t, . . . , rp,t)>
The portfolio net return at period t+ 1 is
Rt+1 = w>t rt+1 =
p∑
i=1
wi,tri,t+1
8/40
Basic definitions (cont’d)
Given historical returns {rt}nt=1,
µ = E[rt] and Σ ≡ cov(rt) = E[(rt − µt)(rt − µt)
>]
holds for all t (so drop the index t from subscript).
Let r denote (random vector) of returns, then the expected return ofthe portfolio is
µw = E[R] = w>µ
and the variance (or risk) of the portfolio is
σ2w = var(R) = w>Σw.
The standard deviation σw =√
var(R) is called the volatility
Global mean variance portfolio (GMVP) optimization problem:
minimizew∈Rp
w>Σw subject to 1>w = 1.
The solution is wo =Σ−11
1>Σ−11.
9/40
S&P 500 and Nasdaq-100 indexes for year 2017
01-03 03-16 05-26 08-08 10-18 12-29
2300
2400
2500
2600
S&P 500 index prices
01-03 03-16 05-26 08-08 10-18 12-29
5000
5500
6000
6500NASDAQ-100 prices
01-03 03-16 05-26 08-08 10-18 12-29
-0.02
-0.01
0
0.01
0.02
NASDAQ-100 daily net returns
01-03 03-16 05-26 08-08 10-18 12-29
-0.02
-0.01
0
0.01
0.02
S&P 500 daily net returns
10/40
Are historical returns Gaussian?
QQ-plots and estimated pdf-s
-4 -2 0 2 4
Standard Normal Quantiles
-4
-2
0
2
4
Qu
an
tile
s o
f In
pu
t S
am
ple
S&P 500
-4 -2 0 2 4
Standard Normal Quantiles
-4
-2
0
2
4
Qu
an
tile
s o
f In
pu
t S
am
ple
NASDAQ-100
-4 -2 0 2 4
Standard Normal Quantiles
-4
-2
0
2
4
Qu
an
tile
s o
f In
pu
t S
am
ple
N(0,1) sample
S&P 500
-4 -2 0 2
0
0.1
0.2
0.3
0.4
0.5
0.6NASDAQ-100
-4 -2 0 2 4
0
0.1
0.2
0.3
0.4
0.5
0.6N(0,1) sample
-2 -1 0 1 2
0
0.1
0.2
0.3
0.4
0.5
0.6
11/40
Are historical returns Gaussian (cont’d)?
Scatter plots and estimated 99%, 95% and 50% tolerance ellipses
-0.02 -0.015 -0.01 -0.005 0 0.005 0.01 0.015
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
inside the 50% ellipse: 65.6% of returnsinside the 95 and 99% ellipses = 93.2% 95.6% of returns
12/40
Stocks are unpredictable (and non-Gaussian)!
03-16 05-26 08-08 10-18 12-29-5
0
5
10
15
Shorting allowed
long-only
05-26 08-08 10-18 12-29-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
Shorting allowed
long-only
TECH stocks (Facebook, Apple, Amazon, Microsoft, Google) droppeddrastically (in seconds) due ”fat finger” trade or an automated trade.
13/40
Real data analysis
We apply GMVP problem for real stock data consisting of dividentadjusted daily closing prices of daily net returns.
Data sets:
1 p = 45 stocks in Hang Seng Index (HSI) for Jan. 4, 2010 - Dec.24, 2011
2 p = 50 stocks in HSI for Jan. 1, 2016 to Dec. 27, 2017.3 p = 396 stocks in S&P500 index for Jan. 4, 2016 - Apr 27, 2018
Sliding window method:
1 At day t, we use the previous n days to estimate Σ and w.2 portfolio returns are then computed for the following 20 days.3 Window is shifted 20 trading days forward, new weight and
portfolio returns for another 20 days are computed.
14/40
Results for HSIJan 4, 2010 - Dec 24, 2011 Jan 1, 2016 - Dec 27, 2017
50 100 150 200
n
0.12
0.122
0.124
0.126
0.128
annu
aliz
ed re
aliz
ed ri
sk
Ell1-RSCM
Ell2-RSCM
LW-RSCM
150 250200 n
0.075
0.08
0.085
0.09
annu
aliz
ed re
aliz
ed ri
sk
Ell1-RSCM
Ell2-RSCM
LW-RSCM
150 200 250
n
0.75
0.8
0.85
0.9
0.95
be
ta
Ell1-RSCM
Ell2-RSCM
LW-RSCM
50 100 150 200
n
0.75
0.8
0.85
0.9
0.95
beta
Ell1-RSCM
Ell2-RSCM
LW-RSCM
15/40
Results for S&P 500
p = 396 stocks in S&P 500 index for Jan. 4, 2016 to to Apr 27, 2018
70 120 170 220 270
n
0.075
0.078
0.081
0.084
0.087
annu
aliz
ed re
aliz
ed ri
sk
Ell1-RSCM
Ell2-RSCM
LW-RSCM
100 150 200 250 300
n
0.5
0.6
0.7
0.8
0.9
be
taEll1-RSCM
Ell2-RSCM
LW-RSCM
16/40
1 Portfolio optimization
2 Ell-RSCM estimators
3 Estimates of oracle parameters
4 Simulation studies
5 Compressive Regularized Discriminant Analysis
Regularized SCM and MMSE estimator
Minimum mean squared error (MMSE) estimator Sαo,βo :
(αo, βo) = argminα,β>0
{E[∥∥βS + αI−Σ
∥∥2F
]},
7 (αo, βo) will depend on true unknown Σ ⇒ need to estimate (αo, βo)
? developed a popular consistent estimators (αLWo , β
LWo ) under the
random matrix theory (RMT) regime.
? assumed Gaussianity to improve estimation accuracy.
We develop estimators of (αo, βo) under the RMT regime whensampling from an unspecified elliptically symmetric distribution.
17/40
Important statistics
Scale measure:
η =tr(Σ)
p= mean of eigenvalues
Sphericity measure:
γ =p tr(Σ2)
tr(Σ)2
=mean of squared eigenvalues
[mean of eigenvalues]2
γ ∈ [1, p] and
γ = 1 iff Σ ∝ Iγ = p iff rank(Σ) = p.
18/40
Optimal shrinkage parameters
Result 1.
Assume finite 4th-order moments.
Optimal shrinkage parameters:
βo =‖Σ− ηI‖2F
‖Σ− ηI‖2F +MSE(S)
=p(γ − 1)η2
E[tr(S2)]− pη2
αo = (1− βo)η.
and note that βo ∈ [0, 1).
The MSE at the optimum is
E[∥∥βoS + αoI−Σ
∥∥2F
]= ‖Σ− ηI‖2F(1− βo).
19/40
Elliptically symmetric distributions
x ∼ Ep(µ,Σ, g) (and assume finite 4th-order moments):
f(x) = C · |Σ|−1/2g([x− µ]>Σ−1[x− µ]
)
where g : [0,∞)→ [0,∞) is the density generator, C is a normalizingconstant, µ = E[x] and Σ = cov(x) � 0.
x ∼ Np(µ,Σ): g(t) = exp(−t/2).t-distribution with ν > 4 deg. of freedom, x ∼ tp,ν(0,Σ), . . .
Elliptical kurtosis parameter (?)
κ =E[‖Σ−1/2(x− µ)‖4]
p(p+ 2)− 1
=1
3· {kurtosis of xi}
(kurtosis = E[(xi − µi)4]/var(xi)2 − 3)
20/40
Covariance matrix of the SCM SKp := commutation matrix (i.e., Kpvec(A) = vec(A>) ∀ A ∈ Rp×p)
Result. When sampling from Ep(µ,Σ, g) :
cov(vec(S)) =( 1
n− 1+κ
n
)(I + Kp)(Σ⊗Σ) +
κ
nvec(Σ)vec(Σ)>,
Special cases:
1 Case p = 1 (so S = s2 (sample variance), Σ = σ2, κ = 13kurt(x)):
var(s2) = σ4{ 2
n− 1+
kurt(x)
n
}
2 Gaussian case (κ = 0):
cov(vec(S)) =1
n− 1(I + Kp)(Σ⊗Σ)
which is expected result since S ∼Wp(n− 1, (1/n)Σ).
21/40
Covariance matrix of the SCM SKp := commutation matrix (i.e., Kpvec(A) = vec(A>) ∀ A ∈ Rp×p)
Result. When sampling from Ep(µ,Σ, g) :
cov(vec(S)) =( 1
n− 1+κ
n
)(I + Kp)(Σ⊗Σ) +
κ
nvec(Σ)vec(Σ)>,
Special cases:
1 Case p = 1 (so S = s2 (sample variance), Σ = σ2, κ = 13kurt(x)):
var(s2) = σ4{ 2
n− 1+
kurt(x)
n
}
2 Gaussian case (κ = 0):
cov(vec(S)) =1
n− 1(I + Kp)(Σ⊗Σ)
which is expected result since S ∼Wp(n− 1, (1/n)Σ).
21/40
Covariance matrix of the SCM SKp := commutation matrix (i.e., Kpvec(A) = vec(A>) ∀ A ∈ Rp×p)
Result. When sampling from Ep(µ,Σ, g) :
cov(vec(S)) =( 1
n− 1+κ
n
)(I + Kp)(Σ⊗Σ) +
κ
nvec(Σ)vec(Σ)>,
Special cases:
1 Case p = 1 (so S = s2 (sample variance), Σ = σ2, κ = 13kurt(x)):
var(s2) = σ4{ 2
n− 1+
kurt(x)
n
}
2 Gaussian case (κ = 0):
cov(vec(S)) =1
n− 1(I + Kp)(Σ⊗Σ)
which is expected result since S ∼Wp(n− 1, (1/n)Σ).
21/40
MMSE shrinkage parameters: elliptical case (cont’d)
Result. Optimal shrinkage parameters when F = Ep(µ,Σ, g) :
αEllo = (1− βEll1o )η
βEllo =γ − 1
γ − 1 + κ(2γ + p)/n+ (γ + p)/(n− 1),
where η = tr(Σ)/p and γ = p tr(Σ2)/ tr(Σ)2 are the scale andsphericity measures.
Note: βEllo = βEllo (γ, κ) depends on unknown γ and κ.
Proof: Simply use easlier result and the fact that:
MSE(S) = tr{cov(vec(S))}
(and some calculus related to Kronecker product, trace andcommutation matrix).
22/40
1 Portfolio optimization
2 Ell-RSCM estimators
3 Estimates of oracle parameters
4 Simulation studies
5 Compressive Regularized Discriminant Analysis
Estimation of oracle parameters
Plug-in estimator:βEllo = βEllo (γ, κ),
where γ and κ are preferably consistent estimators of γ and κ.
Ell-RSCM estimator
βEllo = min
(1,max
(0,
γ − 1
γ − 1 + κ(2γ + p)/n+ (γ + p)/(n− 1)
))
αEllo = (1− βEllo ) tr(S)/p
Note: The ”min - max ” is due to the fact that βo ∈ [0, 1).
23/40
Estimate of elliptical kurtosis
A natural estimate of κ = (1/3)kurt(xi):
κ = max(− 2
p+ 2,1
3p
p∑
j=1
Kj
),
where Kj is (cumulant based) estimate of kurtosis of xj (?).
Kj =n− 1
(n− 2)(n− 3){(n+ 1)kj + 6}
and kj is the sample moment based estimate of kurtosis.
Note: κ > −2/(p+ 2) (?).
24/40
Ell1-estimator of γ
Sample sign covariance matrix (?) is defined as
Ssgn =1
n
n∑
i=1
(xi − µ)(xi − µ)>
‖xi − µ‖2 ,
where µ = argminµ
n∑
i=1
‖xi − µ‖
(?) proposed a sphericity statistic
γEll1 = p tr(S2sgn
)− p
n
and showed that γEll1 → γ under the assumptions:
n, p→∞ and c =p
n→ c0 and eigenvalues of Σ converge to a
fixed spectrum. [random matrix theory (RMT) regime]
As p→∞, ηi =tr(Σi)
p→ ηoi , for i = 1, . . . , 4
25/40
Ell1-estimator of γ (cont’d)
We include a bias correction:
γEll1 =n
n− 1
(p tr
(S2sgn
)− p
n
)
=p
n(n− 1)
∑
i 6=j(v>i vj)
2
= p avei 6=j{cos2(^(xi,xj))} (vi = xi/‖xi‖)
Note: E[γEll1] ∈ [1, p].
26/40
Ell2-estimator of γ
Define
η2 = bn
(tr(S2)
p− an
p
n
[tr(S)
p
]2)
where
an ≡ an(κ) =(
n
n+ κ
)(n
n− 1+ κ
)
bn ≡ bn(κ) =(κ+ n)(n− 1)2
(n− 2)(3κ(n− 1) + n(n+ 1))
Note: for large n, we have
η2 ≈(tr(S2)
p− (1 + κ)
p
n
[tr(S)
p
]2)
Result: E[η2] = tr(Σ2)/p .
⇒ tr(S2)/p 6→ tr(Σ2)/p unless c = p/n→ 0 as p, n→∞.
27/40
Ell2-estimator of γ (cont’d)
Defining an = an(κ) and bn = bn(κ), we get our second estimator ofsphericity γ:
γEll2 =η2η2
= bn
(p tr(S2)
tr(S)2− an
p
n
)
Ell2-RSCM estimator uses βEll2o = βo(κ, γEll2).
Ell1-RSCM estimator uses βEll1o = βo(κ, γEll1).
28/40
1 Portfolio optimization
2 Ell-RSCM estimators
3 Estimates of oracle parameters
4 Simulation studies
5 Compressive Regularized Discriminant Analysis
AR(1) covariance matrix : [Σ]ij = %|i−j|, % ∈ (0, 1)Observations xi from Gaussian distribution, p = 100
Normalized MSE = E[‖Σ−Σ‖2F]/‖Σ‖2F
20 40
0
0.1
0.2
0.3
0.4
n
NMSE
Theory; Ell1; Ell2; Ell3; LW
(c) % = 0.1
20 40
0.3
0.4
0.5
n
NMSE
(d) % = 0.4
29/40
AR(1) covariance matrix : [Σ]ij = %|i−j|, % ∈ (0, 1)Observations xi from t12 distribution, p = 100
20 40
0
0.2
0.4
0.6
n
NMSE
Theory; Ell1; Ell2; Ell3; LW
(e) % = 0.1 and ν = 12
20 400.2
0.4
0.6
nNMSE
(f) % = 0.4 and ν = 12
30/40
Largely varying spectrum case I: Gaussian samples
n = p = 50
Σ = diag(1, 1, . . . , 1, 0.01, . . . , 0.01), where m eigenvalues are 1 andremaining 50−m are 0.01.
30 40 50
0
1
2
·10−2
m
NMSE
Theory; Ell1; Ell2; Ell3; LW
30 40 50
0
0.1
0.2
0.3
m
β
31/40
Largely varying spectrum case I : t10-distributed samples
n = p = 50
Σ = diag(1, 1, . . . , 1, 0.01, . . . , 0.01), where m eigenvalues are 1 andremaining 50−m are 0.01.
30 40 50
0
1
2
·10−2
m
NMSE
Theory; Ell1; Ell2; Ell3; LW
30 40 50
0
0.1
0.2
0.3
m
β
32/40
Largely varying spectrum case II: Gaussian samples
Σ has 30 eigenvalues equal to 100, 40 eigenvalues equal to 1 and 30eigenvalues equal to 0.01.
p = 100 and n varies
10 20 30
0.4
0.6
0.8
1
1.2
n
NMSE
Theory; Ell1; Ell2; Ell3; LW
10 20 30
0.1
0.2
0.3
0.4
n
β
33/40
Largely varying spectrum case II: t10-distributed samples
Σ has 30 eigenvalues equal to 100, 40 eigenvalues equal to 1 and 30eigenvalues equal to 0.01.
p = 100 and n varies
10 20 30
0.5
1
1.5
n
NMSE
Theory; Ell1; Ell2; Ell3; LW
10 20 30
0.1
0.2
0.3
0.4
n
β
34/40
1 Portfolio optimization
2 Ell-RSCM estimators
3 Estimates of oracle parameters
4 Simulation studies
5 Compressive Regularized Discriminant Analysis
Micro array analysis
Inferring large-scale covariance matrices from sparse genomic data isan ubiquitous problem in bioinformatics.
Microarrays monitor genome wide expression levels of genes in a givenorganism.
MDA provides a challenging framework for data analyst to work with:
p = # number of genes (> 10,000)
ng = # number of patients/organism (∼ 10) e.g., of cancer type g,g = 1, . . . , G.
G = # number of classes (around 2− 10)
Dataset N p G Disease
? 190 16,063 14 Cancer? 248 12,625 6 Leukemia? 180 54,613 4 Glioma? 105 22,283 10 Sarcoma
35/40
Compressive regularized discriminant analysis (CRDA)
Goals:
1 Assign x ∈ Rp to a correct class (out of G distinct classes).2 Reduce # of variables (or features) without sacrificing the
classification accuracy.
Challenge: Small n, large p, i.e., # of features � # of observationsin the training dataset.
Benchmark methods:
nearest shrunked centroid (NSC) ?
shrunken centroids regularized discriminant analysis (SCRDA)(?).
3 CRDA can be used as fast and accurate gene selection method andclassification tool in microarray studies.
3 CRDA gives fewer misclassification errors than its competitors whileat the same time achieving accurate feature elimination.
36/40
Linear Discriminant Analysis
LDA discriminant function expressed in vector form:
d(x) = (d1(x), . . . , dg(x), . . . , dG(x))
= x>B − 1
2diag
(M>B
)
where
M =(µ1 . . . µG
)
B = Σ−1M.
Here Σ is the common covariance matrix of the classes.
LDA rule: classify x ∈ Rp to class g ∈ {1, . . . , G}:
g = argmaxg
dg(x)
37/40
Compressive Regularized Discriminant Analysis (CRDA)
CRDA uses compressed estimated rule
d(x) =(d1(x), . . . , dG(x)
)
= x>B − 1
2diag
(M>B
),
where µg = xg, and
M =(µ1 . . . µG
),
B = HK(Σ−1M, q).
Regularization: Σ = Ell2-RSCM estimator (based on pooled data).
Feature elimination though hard-thresholding operator HK(·, q):HK(B, q) retains the elements of the K rows of B that possess largest `qnorm and set elements of the other rows to zero.
Regularization parameter is K (for a fixed `q-norm q ∈ {1, 2,∞}).
38/40
Compressive Regularized Discriminant Analysis (CRDA)
MethodsRamaswamy et al. dataset Yeoh et al. dataset Sun et al. dataset Nakayama et al. datasetTE NFS TE NFS TE NFS TE NFS
CRDA`1 9.9 4899 7.5 4697 12.9 27416 7.9 6952
CRDA`2 10.3 3968 6.0 4659 13.3 23484 7.6 7755
CRDA`∞ 10.3 4530 9.3 4697 12.4 20207 7.6 2340PLDA 18.8 5023 NA NA 15.2 21635 4.4 10479SCRDA 24 14874 NA NA 15.7 54183 2.8 22283NSC 16.3 2337 NA NA 15 30005 4.2 5908
Table: Classification results: bold-face indicates the best results for testerror (TE) and number of features selected (NFS).
We compute the results for each dataset over 10 training-test set splits,each with a random choice of training and test set containing 75% and25% of the total N observations, respectively.
39/40
Conclusions
We proposed optimal (minimizing MSE) regularized samplecovariance matrix estimators, called Ell1- and Ell2-RSCM estimator,which are suitable for high-dimensional problems.
We exploit the assumption that samples are from an unspecifiedelliptically symmetric distribution.
Simulation studies illustrated the significant advantages of theproposed Ell1 and Ell2-RSCM estimators over the Ledoit-Wolf(LW-)RSCM estimator
Good performance of the method in regularized QDA was illustratedwith real data.
We proposed a new compressive regularized discriminant analysis forsimultaneous classification and feature selection
Method was shown to outperform its popular competitors in theanalysis of real microarray data sets.
40/40
ReferencesPeter M Bentler and Maia Berkane. Greatest lower bound to the elliptical theory
kurtosis parameter. Biometrika, 73(1):240–241, 1986.Yilun Chen, Ami Wiesel, Yonina C Eldar, and Alfred O Hero. Shrinkage algorithms for
mmse covariance estimation. IEEE Trans. Signal Process., 58(10):5016–5029, 2010.Yaqian Guo, Trevor Hastie, and Robert Tibshirani. Regularized linear discriminant
analysis and its application in microarrays. Biostatistics, 8(1):86–100, 2007.DN Joanes and CA Gill. Comparing measures of sample skewness and kurtosis. Journal
of the Royal Statistical Society: Series D (The Statistician), 47(1):183–189, 1998.Olivier Ledoit and Michael Wolf. A well-conditioned estimator for large-dimensional
covariance matrices. Journal of multivariate analysis, 88(2):365–411, 2004.Burton G Malkiel and Eugene F Fama. Efficient capital markets: A review of theory and
empirical work. The journal of Finance, 25(2):383–417, 1970.Harry Markowitz. Portfolio selection. The journal of finance, 7(1):77–91, 1952.Harry Markowitz. Portfolio Selection, Efficent Diversification of Investments. J. Wiley,
1959.R. J. Muirhead. Aspects of Multivariate Statistical Theory. Wiley, New York, 1982. 704
pages.Robert Nakayama, Takeshi Nemoto, Hiro Takahashi, Tsutomu Ohta, Akira Kawai,
Kunihiko Seki, Teruhiko Yoshida, Yoshiaki Toyama, Hitoshi Ichikawa, and TadashiHasegawa. Gene expression analysis of soft tissue sarcomas: characterization andreclassification of malignant fibrous histiocytoma. Modern pathology, 20(7):749–759,2007.
Sridhar Ramaswamy, Pablo Tamayo, Ryan Rifkin, Sayan Mukherjee, Chen-Hsiang Yeang,Michael Angelo, Christine Ladd, Michael Reich, Eva Latulippe, Jill P Mesirov, et al.Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings ofthe National Academy of Sciences, 98(26):15149–15154, 2001.
William F Sharpe. Capital asset prices: A theory of market equilibrium under conditionsof risk. The journal of finance, 19(3):425–442, 1964.
Lixin Sun, Ai-Min Hui, Qin Su, Alexander Vortmeyer, Yuri Kotliarov, Sandra Pastorino,Antonino Passaniti, Jayant Menon, Jennifer Walling, Rolando Bailey, et al. Neuronaland glioma-derived stem cell factor induces angiogenesis within the brain. Cancer cell,9(4):287–300, 2006.
Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu.Diagnosis of multiple cancer types by shrunken centroids of gene expression.Proceedings of the National Academy of Sciences, 99(10):6567–6572, 2002.
James Tobin. Liquidity preference as behavior towards risk. The review of economicstudies, 25(2):65–86, 1958.
S. Visuri, V. Koivunen, and H. Oja. Sign and rank covariance matrices. J. Statist.Plann. Inference, 91:557–575, 2000.
Eng-Juh Yeoh, Mary E Ross, Sheila A Shurtleff, W Kent Williams, Divyen Patel, RamiMahfouz, Fred G Behm, Susana C Raimondi, Mary V Relling, Anami Patel, et al.Classification, subtype discovery, and prediction of outcome in pediatric acutelymphoblastic leukemia by gene expression profiling. Cancer cell, 1(2):133–143, 2002.
Teng Zhang and Ami Wiesel. Automatic diagonal loading for tyler’s robust covarianceestimator. In IEEE Statistical Signal Processing Workshop (SSP’16), pages 1–5, 2016.