high-dimensional covariance matrix estimation with ... · bias-variance trade-o frobenius norm:...

High-dimensional covariance matrix estimationwith applications to microarray studies and

portfolio optimization

Esa Ollila

Department of Signal Processing and AcousticsAalto University, Finland

June 7th, 2018, Gipsa-Lab

Covariance estimation problem

x : p-variate (centered) random vector (p large)

x1, . . . ,xn i.i.d. realizations of x.

Problem: Find an estimate Σ of the pos. def. covariance matrix

Σ = E[(x− E[x])(x− E[x])>

]∈ Sp×p++

The sample covariance matrix (SCM),

S =1

n− 1

n∑

i=1

(xi − x)(xi − x)>,

is the most commonly used estimator of Σ.

Challenges in HD:

1 Insufficient sample support (ISS) case: p > n.=⇒ Estimate of Σ−1 can not be computed!

2 Low sample support (LSS) (i.e., p of the same magnitude as n)=⇒ Σ is estimated with a lot of error.

3 Outliers or heavy-tailed non-Gaussian data

2/40

Why covariance estimation?

Portfolio selection Discriminant AnalysisThe Most Important Applications

graphical models clustering/discriminant analysis

PCAradar detection

Ilya Soloveychik (HUJI) Robust Covariance Estimation 15 / 47

PCA

The Most Important Applications

graphical models clustering/discriminant analysis

PCA radar detection

Ilya Soloveychik (HUJI) Robust Covariance Estimation 15 / 47

Radar detection

Graphical models

Gaussian graphical model

n-dimensional Gaussian vector

x = (x1, . . . , xn) ∼ N (0,Σ)

xi, xj are conditionally independent (given the rest of x) if

(Σ−1)ij = 0

modeled as undirected graph with n nodes; arc i, j is absent if (Σ−1)ij = 0

1

2

34

5Σ−1 =

• • 0 • •• • • 0 •0 • • • 0• 0 • • 0• • 0 0 •

13/40

Bias-variance trade-off

Frobenius norm: ‖A‖2F = tr(A2).

Any estimator Σ ∈ Sp×p++ of Σ verifies

MSE(Σ) = var(Σ) + bias(Σ)

where MSE(Σ) , E[‖Σ−Σ‖2F].Suppose Σ = S. Then, since S is unbiased, i.e.,bias(Σ) = ‖E[Σ]−Σ‖2F = 0, one has that

MSE(Σ) = var(S) (= tr{cov(vec(S))})

7 However, S is not applicable for high-dimensional problems

n ≈ p ⇒ var(S) is very largep > n or p� n ⇒ S is singular (non-invertible).

3 Use an estimator Σ = Sβ that shrinks S towards a structure (e.g., ascaled identity matrix) using a tuning (shrinkage) parameter β

MSE(Σ) can be reduced by introducing some bias!Positive definiteness of Σ can be ensured!

4/40

Bias-variance trade-off (cont’d)

Regularized SCM (RSCM) a la Ledoit and Wolf

Sβ = βS + (1− β)[tr(S)/p]Iwhere β > 0 denotes the shrinkage (regularization) parameter.

,_

e ,_

w

Total error Variance

Bias2

Tuning parameter

5/40

This talk: choose β optimally under the random samplingfrom elliptical distribution + applications

Esa Ollila. Optimal high-dimensional shrinkage covariance estimation forelliptical distributions. In EUSIPCO’17, Kos, Greece, 2017, pp1689–1693.

Esa Ollila and Elias Raninen. Optimal shrinkage covariance matrixestimator under random sampling from elliptical distributions. in Arxivsoon.

M. N. Tabassum and E. Ollila. Compressive regularized discriminantanalysis of high-dimensional data with applications to microarraystudies. In ICASSP’18, Calgary, Canada, 2017, pp. 4204 –4208.

6/40

1 Portfolio optimization

2 Ell-RSCM estimators

3 Estimates of oracle parameters

4 Simulation studies

5 Compressive Regularized Discriminant Analysis

Modern portfolio theory (MPT)

mathematical framework for portfolio allocations that balances thereturn-risk tradeoff by ??.

Important inputs by ???∗

A portfolio consist of p assets, e.g.:

- equity securities (stocks), market indexes- fixed-income securities (e.g., government or corporate bonds),

commodities (e.g., oil, gold, etc), ETF-s- currencies (exchange rates), derivatives, . . . ,

To use MPT one needs to estimate the mean vector µ and thecovariance matrix Σ of asset returns.

7 often p, the number of assets is larger (or of similar magnitude) to n,the number of historical returns.

⇒ MPT standard plug-in estimates is not applicable

∗Noble price winners: James Tobin (1981), Harry Markovitz (1990) and Willian F.

Sharpe (1990), and Eugene F. Fama (2013)

7/40

Basic definitions

Portfolio weight at (discrete) time index t:

wt = (wt,1, . . . , wt,p)> s.t.

p∑

i=1

wt,i = 1>wt = 1

Let Ci,t > 0 be the price of ith asset

The net return of the ith asset over one interval is

ri,t =Ci,t − Ci,t−1

Ci,t−1=

Ci,tCi,t−1

− 1 ∈ [−1,∞)

Single period net returns of p assets is a p-variate vector

rt = (r1,t, . . . , rp,t)>

The portfolio net return at period t+ 1 is

Rt+1 = w>t rt+1 =

p∑

i=1

wi,tri,t+1

8/40

Basic definitions (cont’d)

Given historical returns {rt}nt=1,

µ = E[rt] and Σ ≡ cov(rt) = E[(rt − µt)(rt − µt)

>]

holds for all t (so drop the index t from subscript).

Let r denote (random vector) of returns, then the expected return ofthe portfolio is

µw = E[R] = w>µ

and the variance (or risk) of the portfolio is

σ2w = var(R) = w>Σw.

The standard deviation σw =√

var(R) is called the volatility

Global mean variance portfolio (GMVP) optimization problem:

minimizew∈Rp

w>Σw subject to 1>w = 1.

The solution is wo =Σ−11

1>Σ−11.

9/40

S&P 500 and Nasdaq-100 indexes for year 2017

01-03 03-16 05-26 08-08 10-18 12-29

2300

2400

2500

2600

S&P 500 index prices

01-03 03-16 05-26 08-08 10-18 12-29

5000

5500

6000

6500NASDAQ-100 prices

01-03 03-16 05-26 08-08 10-18 12-29

-0.02

-0.01

0

0.01

0.02

NASDAQ-100 daily net returns

01-03 03-16 05-26 08-08 10-18 12-29

-0.02

-0.01

0

0.01

0.02

S&P 500 daily net returns

10/40

Are historical returns Gaussian?

QQ-plots and estimated pdf-s

-4 -2 0 2 4

Standard Normal Quantiles

-4

-2

0

2

4

Qu

an

tile

s o

f In

pu

t S

am

ple

S&P 500

-4 -2 0 2 4


-4

-2

0

2

4

Qu

an

tile

s o

f In

pu

t S

am

ple

NASDAQ-100

-4 -2 0 2 4


-4

-2

0

2

4

Qu

an

tile

s o

f In

pu

t S

am

ple

N(0,1) sample

S&P 500

-4 -2 0 2

0

0.1

0.2

0.3

0.4

0.5

0.6NASDAQ-100

-4 -2 0 2 4

0

0.1

0.2

0.3

0.4

0.5

0.6N(0,1) sample

-2 -1 0 1 2

0

0.1

0.2

0.3

0.4

0.5

0.6

11/40

Are historical returns Gaussian (cont’d)?

Scatter plots and estimated 99%, 95% and 50% tolerance ellipses

-0.02 -0.015 -0.01 -0.005 0 0.005 0.01 0.015

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

inside the 50% ellipse: 65.6% of returnsinside the 95 and 99% ellipses = 93.2% 95.6% of returns

12/40

Stocks are unpredictable (and non-Gaussian)!

03-16 05-26 08-08 10-18 12-29-5

0

5

10

15

Shorting allowed

long-only

05-26 08-08 10-18 12-29-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

Shorting allowed

long-only

TECH stocks (Facebook, Apple, Amazon, Microsoft, Google) droppeddrastically (in seconds) due ”fat finger” trade or an automated trade.

13/40

Real data analysis

We apply GMVP problem for real stock data consisting of dividentadjusted daily closing prices of daily net returns.

Data sets:

1 p = 45 stocks in Hang Seng Index (HSI) for Jan. 4, 2010 - Dec.24, 2011

2 p = 50 stocks in HSI for Jan. 1, 2016 to Dec. 27, 2017.3 p = 396 stocks in S&P500 index for Jan. 4, 2016 - Apr 27, 2018

Sliding window method:

1 At day t, we use the previous n days to estimate Σ and w.2 portfolio returns are then computed for the following 20 days.3 Window is shifted 20 trading days forward, new weight and

portfolio returns for another 20 days are computed.

14/40

Results for HSIJan 4, 2010 - Dec 24, 2011 Jan 1, 2016 - Dec 27, 2017

50 100 150 200

n

0.12

0.122

0.124

0.126

0.128

annu

aliz

ed re

aliz

ed ri

sk

Ell1-RSCM

Ell2-RSCM

LW-RSCM

150 250200 n

0.075

0.08

0.085

0.09

annu

aliz

ed re

aliz

ed ri

sk

Ell1-RSCM

Ell2-RSCM

LW-RSCM

150 200 250

n

0.75

0.8

0.85

0.9

0.95

be

ta

Ell1-RSCM

Ell2-RSCM

LW-RSCM

50 100 150 200

n

0.75

0.8

0.85

0.9

0.95

beta

Ell1-RSCM

Ell2-RSCM

LW-RSCM

15/40

Results for S&P 500

p = 396 stocks in S&P 500 index for Jan. 4, 2016 to to Apr 27, 2018

70 120 170 220 270

n

0.075

0.078

0.081

0.084

0.087

annu

aliz

ed re

aliz

ed ri

sk

Ell1-RSCM

Ell2-RSCM

LW-RSCM

100 150 200 250 300

n

0.5

0.6

0.7

0.8

0.9

be

taEll1-RSCM

Ell2-RSCM

LW-RSCM

16/40

Regularized SCM and MMSE estimator

Minimum mean squared error (MMSE) estimator Sαo,βo :

(αo, βo) = argminα,β>0

{E[∥∥βS + αI−Σ

∥∥2F

]},

7 (αo, βo) will depend on true unknown Σ ⇒ need to estimate (αo, βo)

? developed a popular consistent estimators (αLWo , β

LWo ) under the

random matrix theory (RMT) regime.

? assumed Gaussianity to improve estimation accuracy.

We develop estimators of (αo, βo) under the RMT regime whensampling from an unspecified elliptically symmetric distribution.

17/40

Important statistics

Scale measure:

η =tr(Σ)

p= mean of eigenvalues

Sphericity measure:

γ =p tr(Σ2)

tr(Σ)2

=mean of squared eigenvalues

[mean of eigenvalues]2

γ ∈ [1, p] and

γ = 1 iff Σ ∝ Iγ = p iff rank(Σ) = p.

18/40

Optimal shrinkage parameters

Result 1.

Assume finite 4th-order moments.

Optimal shrinkage parameters:

βo =‖Σ− ηI‖2F

‖Σ− ηI‖2F +MSE(S)

=p(γ − 1)η2

E[tr(S2)]− pη2

αo = (1− βo)η.

and note that βo ∈ [0, 1).

The MSE at the optimum is

E[∥∥βoS + αoI−Σ

∥∥2F

]= ‖Σ− ηI‖2F(1− βo).

19/40

Elliptically symmetric distributions

x ∼ Ep(µ,Σ, g) (and assume finite 4th-order moments):

f(x) = C · |Σ|−1/2g([x− µ]>Σ−1[x− µ]

)

where g : [0,∞)→ [0,∞) is the density generator, C is a normalizingconstant, µ = E[x] and Σ = cov(x) � 0.

x ∼ Np(µ,Σ): g(t) = exp(−t/2).t-distribution with ν > 4 deg. of freedom, x ∼ tp,ν(0,Σ), . . .

Elliptical kurtosis parameter (?)

κ =E[‖Σ−1/2(x− µ)‖4]

p(p+ 2)− 1

=1

3· {kurtosis of xi}

(kurtosis = E[(xi − µi)4]/var(xi)2 − 3)

20/40

Covariance matrix of the SCM SKp := commutation matrix (i.e., Kpvec(A) = vec(A>) ∀ A ∈ Rp×p)

Result. When sampling from Ep(µ,Σ, g) :

cov(vec(S)) =( 1

n− 1+κ

n

)(I + Kp)(Σ⊗Σ) +

κ

nvec(Σ)vec(Σ)>,

Special cases:

1 Case p = 1 (so S = s2 (sample variance), Σ = σ2, κ = 13kurt(x)):

var(s2) = σ4{ 2

n− 1+

kurt(x)

n

}

2 Gaussian case (κ = 0):

cov(vec(S)) =1

n− 1(I + Kp)(Σ⊗Σ)

which is expected result since S ∼Wp(n− 1, (1/n)Σ).

21/40

MMSE shrinkage parameters: elliptical case (cont’d)

Result. Optimal shrinkage parameters when F = Ep(µ,Σ, g) :

αEllo = (1− βEll1o )η

βEllo =γ − 1

γ − 1 + κ(2γ + p)/n+ (γ + p)/(n− 1),

where η = tr(Σ)/p and γ = p tr(Σ2)/ tr(Σ)2 are the scale andsphericity measures.

Note: βEllo = βEllo (γ, κ) depends on unknown γ and κ.

Proof: Simply use easlier result and the fact that:

MSE(S) = tr{cov(vec(S))}

(and some calculus related to Kronecker product, trace andcommutation matrix).

22/40

Estimation of oracle parameters

Plug-in estimator:βEllo = βEllo (γ, κ),

where γ and κ are preferably consistent estimators of γ and κ.

Ell-RSCM estimator

βEllo = min

(1,max

(0,

γ − 1

γ − 1 + κ(2γ + p)/n+ (γ + p)/(n− 1)

))

αEllo = (1− βEllo ) tr(S)/p

Note: The ”min - max ” is due to the fact that βo ∈ [0, 1).

23/40

Estimate of elliptical kurtosis

A natural estimate of κ = (1/3)kurt(xi):

κ = max(− 2

p+ 2,1

3p

p∑

j=1

Kj

),

where Kj is (cumulant based) estimate of kurtosis of xj (?).

Kj =n− 1

(n− 2)(n− 3){(n+ 1)kj + 6}

and kj is the sample moment based estimate of kurtosis.

Note: κ > −2/(p+ 2) (?).

24/40

Ell1-estimator of γ

Sample sign covariance matrix (?) is defined as

Ssgn =1

n

n∑

i=1

(xi − µ)(xi − µ)>

‖xi − µ‖2 ,

where µ = argminµ

n∑

i=1

‖xi − µ‖

(?) proposed a sphericity statistic

γEll1 = p tr(S2sgn

)− p

n

and showed that γEll1 → γ under the assumptions:

n, p→∞ and c =p

n→ c0 and eigenvalues of Σ converge to a

fixed spectrum. [random matrix theory (RMT) regime]

As p→∞, ηi =tr(Σi)

p→ ηoi , for i = 1, . . . , 4

25/40

Ell1-estimator of γ (cont’d)

We include a bias correction:

γEll1 =n

n− 1

(p tr

(S2sgn

)− p

n

)

=p

n(n− 1)

∑

i 6=j(v>i vj)

2

= p avei 6=j{cos2(^(xi,xj))} (vi = xi/‖xi‖)

Note: E[γEll1] ∈ [1, p].

26/40

Ell2-estimator of γ

Define

η2 = bn

(tr(S2)

p− an

p

n

[tr(S)

p

]2)

where

an ≡ an(κ) =(

n

n+ κ

)(n

n− 1+ κ

)

bn ≡ bn(κ) =(κ+ n)(n− 1)2

(n− 2)(3κ(n− 1) + n(n+ 1))

Note: for large n, we have

η2 ≈(tr(S2)

p− (1 + κ)

p

n

[tr(S)

p

]2)

Result: E[η2] = tr(Σ2)/p .

⇒ tr(S2)/p 6→ tr(Σ2)/p unless c = p/n→ 0 as p, n→∞.

27/40

Ell2-estimator of γ (cont’d)

Defining an = an(κ) and bn = bn(κ), we get our second estimator ofsphericity γ:

γEll2 =η2η2

= bn

(p tr(S2)

tr(S)2− an

p

n

)

Ell2-RSCM estimator uses βEll2o = βo(κ, γEll2).

Ell1-RSCM estimator uses βEll1o = βo(κ, γEll1).

28/40

AR(1) covariance matrix : [Σ]ij = %|i−j|, % ∈ (0, 1)Observations xi from Gaussian distribution, p = 100

Normalized MSE = E[‖Σ−Σ‖2F]/‖Σ‖2F

20 40

0

0.1

0.2

0.3

0.4

n

NMSE

Theory; Ell1; Ell2; Ell3; LW

(c) % = 0.1

20 40

0.3

0.4

0.5

n

NMSE

(d) % = 0.4

29/40

AR(1) covariance matrix : [Σ]ij = %|i−j|, % ∈ (0, 1)Observations xi from t12 distribution, p = 100

20 40

0

0.2

0.4

0.6

n

NMSE


(e) % = 0.1 and ν = 12

20 400.2

0.4

0.6

nNMSE

(f) % = 0.4 and ν = 12

30/40

Largely varying spectrum case I: Gaussian samples

n = p = 50

Σ = diag(1, 1, . . . , 1, 0.01, . . . , 0.01), where m eigenvalues are 1 andremaining 50−m are 0.01.

30 40 50

0

1

2

·10−2

m

NMSE


30 40 50

0

0.1

0.2

0.3

m

β

31/40

Largely varying spectrum case I : t10-distributed samples

n = p = 50

Σ = diag(1, 1, . . . , 1, 0.01, . . . , 0.01), where m eigenvalues are 1 andremaining 50−m are 0.01.

30 40 50

0

1

2

·10−2

m

NMSE


30 40 50

0

0.1

0.2

0.3

m

β

32/40

Largely varying spectrum case II: Gaussian samples

Σ has 30 eigenvalues equal to 100, 40 eigenvalues equal to 1 and 30eigenvalues equal to 0.01.

p = 100 and n varies

10 20 30

0.4

0.6

0.8

1

1.2

n

NMSE


10 20 30

0.1

0.2

0.3

0.4

n

β

33/40

Largely varying spectrum case II: t10-distributed samples

Σ has 30 eigenvalues equal to 100, 40 eigenvalues equal to 1 and 30eigenvalues equal to 0.01.

p = 100 and n varies

10 20 30

0.5

1

1.5

n

NMSE


10 20 30

0.1

0.2

0.3

0.4

n

β

34/40

Micro array analysis

Inferring large-scale covariance matrices from sparse genomic data isan ubiquitous problem in bioinformatics.

Microarrays monitor genome wide expression levels of genes in a givenorganism.

MDA provides a challenging framework for data analyst to work with:

p = # number of genes (> 10,000)

ng = # number of patients/organism (∼ 10) e.g., of cancer type g,g = 1, . . . , G.

G = # number of classes (around 2− 10)

Dataset N p G Disease

? 190 16,063 14 Cancer? 248 12,625 6 Leukemia? 180 54,613 4 Glioma? 105 22,283 10 Sarcoma

35/40

Compressive regularized discriminant analysis (CRDA)

Goals:

1 Assign x ∈ Rp to a correct class (out of G distinct classes).2 Reduce # of variables (or features) without sacrificing the

classification accuracy.

Challenge: Small n, large p, i.e., # of features � # of observationsin the training dataset.

Benchmark methods:

nearest shrunked centroid (NSC) ?

shrunken centroids regularized discriminant analysis (SCRDA)(?).

3 CRDA can be used as fast and accurate gene selection method andclassification tool in microarray studies.

3 CRDA gives fewer misclassification errors than its competitors whileat the same time achieving accurate feature elimination.

36/40

Linear Discriminant Analysis

LDA discriminant function expressed in vector form:

d(x) = (d1(x), . . . , dg(x), . . . , dG(x))

= x>B − 1

2diag

(M>B

)

where

M =(µ1 . . . µG

)

B = Σ−1M.

Here Σ is the common covariance matrix of the classes.

LDA rule: classify x ∈ Rp to class g ∈ {1, . . . , G}:

g = argmaxg

dg(x)

37/40

Compressive Regularized Discriminant Analysis (CRDA)

CRDA uses compressed estimated rule

d(x) =(d1(x), . . . , dG(x)

)

= x>B − 1

2diag

(M>B

),

where µg = xg, and

M =(µ1 . . . µG

),

B = HK(Σ−1M, q).

Regularization: Σ = Ell2-RSCM estimator (based on pooled data).

Feature elimination though hard-thresholding operator HK(·, q):HK(B, q) retains the elements of the K rows of B that possess largest `qnorm and set elements of the other rows to zero.

Regularization parameter is K (for a fixed `q-norm q ∈ {1, 2,∞}).

38/40

Compressive Regularized Discriminant Analysis (CRDA)

MethodsRamaswamy et al. dataset Yeoh et al. dataset Sun et al. dataset Nakayama et al. datasetTE NFS TE NFS TE NFS TE NFS

CRDA`1 9.9 4899 7.5 4697 12.9 27416 7.9 6952

CRDA`2 10.3 3968 6.0 4659 13.3 23484 7.6 7755

CRDA`∞ 10.3 4530 9.3 4697 12.4 20207 7.6 2340PLDA 18.8 5023 NA NA 15.2 21635 4.4 10479SCRDA 24 14874 NA NA 15.7 54183 2.8 22283NSC 16.3 2337 NA NA 15 30005 4.2 5908

Table: Classification results: bold-face indicates the best results for testerror (TE) and number of features selected (NFS).

We compute the results for each dataset over 10 training-test set splits,each with a random choice of training and test set containing 75% and25% of the total N observations, respectively.

39/40

Conclusions

We proposed optimal (minimizing MSE) regularized samplecovariance matrix estimators, called Ell1- and Ell2-RSCM estimator,which are suitable for high-dimensional problems.

We exploit the assumption that samples are from an unspecifiedelliptically symmetric distribution.

Simulation studies illustrated the significant advantages of theproposed Ell1 and Ell2-RSCM estimators over the Ledoit-Wolf(LW-)RSCM estimator

Good performance of the method in regularized QDA was illustratedwith real data.

We proposed a new compressive regularized discriminant analysis forsimultaneous classification and feature selection

Method was shown to outperform its popular competitors in theanalysis of real microarray data sets.

40/40

ReferencesPeter M Bentler and Maia Berkane. Greatest lower bound to the elliptical theory

kurtosis parameter. Biometrika, 73(1):240–241, 1986.Yilun Chen, Ami Wiesel, Yonina C Eldar, and Alfred O Hero. Shrinkage algorithms for

mmse covariance estimation. IEEE Trans. Signal Process., 58(10):5016–5029, 2010.Yaqian Guo, Trevor Hastie, and Robert Tibshirani. Regularized linear discriminant

analysis and its application in microarrays. Biostatistics, 8(1):86–100, 2007.DN Joanes and CA Gill. Comparing measures of sample skewness and kurtosis. Journal

of the Royal Statistical Society: Series D (The Statistician), 47(1):183–189, 1998.Olivier Ledoit and Michael Wolf. A well-conditioned estimator for large-dimensional

covariance matrices. Journal of multivariate analysis, 88(2):365–411, 2004.Burton G Malkiel and Eugene F Fama. Efficient capital markets: A review of theory and

empirical work. The journal of Finance, 25(2):383–417, 1970.Harry Markowitz. Portfolio selection. The journal of finance, 7(1):77–91, 1952.Harry Markowitz. Portfolio Selection, Efficent Diversification of Investments. J. Wiley,

1959.R. J. Muirhead. Aspects of Multivariate Statistical Theory. Wiley, New York, 1982. 704

pages.Robert Nakayama, Takeshi Nemoto, Hiro Takahashi, Tsutomu Ohta, Akira Kawai,

Kunihiko Seki, Teruhiko Yoshida, Yoshiaki Toyama, Hitoshi Ichikawa, and TadashiHasegawa. Gene expression analysis of soft tissue sarcomas: characterization andreclassification of malignant fibrous histiocytoma. Modern pathology, 20(7):749–759,2007.

Sridhar Ramaswamy, Pablo Tamayo, Ryan Rifkin, Sayan Mukherjee, Chen-Hsiang Yeang,Michael Angelo, Christine Ladd, Michael Reich, Eva Latulippe, Jill P Mesirov, et al.Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings ofthe National Academy of Sciences, 98(26):15149–15154, 2001.

William F Sharpe. Capital asset prices: A theory of market equilibrium under conditionsof risk. The journal of finance, 19(3):425–442, 1964.

Lixin Sun, Ai-Min Hui, Qin Su, Alexander Vortmeyer, Yuri Kotliarov, Sandra Pastorino,Antonino Passaniti, Jayant Menon, Jennifer Walling, Rolando Bailey, et al. Neuronaland glioma-derived stem cell factor induces angiogenesis within the brain. Cancer cell,9(4):287–300, 2006.

Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu.Diagnosis of multiple cancer types by shrunken centroids of gene expression.Proceedings of the National Academy of Sciences, 99(10):6567–6572, 2002.

James Tobin. Liquidity preference as behavior towards risk. The review of economicstudies, 25(2):65–86, 1958.

S. Visuri, V. Koivunen, and H. Oja. Sign and rank covariance matrices. J. Statist.Plann. Inference, 91:557–575, 2000.

Eng-Juh Yeoh, Mary E Ross, Sheila A Shurtleff, W Kent Williams, Divyen Patel, RamiMahfouz, Fred G Behm, Susana C Raimondi, Mary V Relling, Anami Patel, et al.Classification, subtype discovery, and prediction of outcome in pediatric acutelymphoblastic leukemia by gene expression profiling. Cancer cell, 1(2):133–143, 2002.

Teng Zhang and Ami Wiesel. Automatic diagonal loading for tyler’s robust covarianceestimator. In IEEE Statistical Signal Processing Workshop (SSP’16), pages 1–5, 2016.

high-dimensional covariance matrix estimation with ... · bias-variance trade-o frobenius norm:...

Documents