a bregman extension of quasi-newton updatesmatsuzoe.web.nitech.ac.jp/infogeo/ocami2010/kanamori.pdfa...

34
A Bregman Extension of quasi-Newton Updates Takafumi Kanamori 1 Atsumi Ohara 2 1 Nagoya university 2 Osaka university 情報幾何関連分野研究会 2010 情報工学への幾何学的アプローチGeometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 1 / 30

Upload: truongkhuong

Post on 15-Mar-2019

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

.

.

A Bregman Extension of quasi-Newton Updates

Takafumi Kanamori1 Atsumi Ohara2

1Nagoya university

2Osaka university

情報幾何関連分野研究会 2010

–情報工学への幾何学的アプローチ–

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 1 / 30

Page 2: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

Plan of Presentation

.

..

1 quasi-Newton methodnonlinear optimization problemHessian update formulavariational view of quasi-Newton update

.

..

2 Bregman extension of quasi-Newton methodBregman divergence with V-potential on PD(n)dual structure defined by Bregman divergence

.

.

.

3 Exploiting Sparsity of Hessian matrixrelation to U-boost and em-algorithm

.

.

.

4 Other topicsconvergence propertyinvariance under group actionrobustness against numerical errors

.

.

.

5 Concluding Remarks

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 2 / 30

Page 3: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

— quasi-Newton method —

keywords: Hessian update, secant condition

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 3 / 30

Page 4: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

Unconstrained nonlinear optimization problem

制約なし最適化問題:目的関数 f ∈ C2(Rn)

minx

f (x), x ∈ Rn

局所解 x∗ を数値的に求める

x∗

f (x)

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 3 / 30

Page 5: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

Numerical algorithm

初期値 x0, B0 : n × n matrix, α0 ≥ 0.以下を繰り返す

xk+1 = xk − αkB−1k∇ f (xk), k = 0, 1, 2, . . .

xk−1

xk

xk+1

各 xk のまわりで 2次近似

f (x) � f (xk) + ∇ f (xk)>(x − xk) +12

(x − xk)>∇2 f (xk)(x − xk)

−→ 2次近似した関数を小さくする xを計算

最急降下法:Bk = I, αk: line searchニュートン法:Bk = ∇2 f (xk), αk = 1準ニュートン法:Bk = ∇2 f (xk)の近似, αk: line search

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 4 / 30

Page 6: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

quasi-Newton method

ヘシアン行列 ∇2 f (xk+1)の近似 Bk+1 の構成

xk+1 = xk − αkB−1k∇ f (xk), αk ≥ 0

標準的な記法: s = xk+1 − xk ∈ Rn, y = ∇ f (xk+1) − ∇ f (xk) ∈ Rn

(正確には sk, yk と添字を付ける)

目標: Bk, s, yから Bk+1 を作る.

BFGS update :

Bk+1 = BBFGS[Bk] := Bk −Bkss>Bk

s>Bks+

yy>

s>yDFP update :

Bk+1 = BDFP[Bk] := Bk −Bksy> + ys>Bk

s>y+ s>Bks

yy>

(s>y)2+

yy>

s>y

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 5 / 30

Page 7: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

Some requirements for update formula

セカント条件: Bk+1s = y

∇2 f (xk+1)(xk+1 − xk) ≈ ∇ f (xk+1) − ∇ f (xk), Bk+1 ≈ ∇2 f (xk+1)

正定値性の継承:

Bk ∈ PD(n), s>y > 0 =⇒ Bk+1 ∈ PD(n).

Bk+1 が正定値 =⇒ −Bk+1∇ f (xk+1)が降下方向

条件 s>y > 0:係数 αk を定める直線探索がある程度正確.

BFGS, DFPは上の条件を満たす

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 6 / 30

Page 8: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

A Variational View of quasi-Newton updates [Fletcher ’91]

Bk+1:セカント条件を満たす行列の中で Bk に最も「近い」

Matrix nearness: KL-divergence between N(0, P) and N(0,Q)

KL(P,Q) := 〈P, Q−1〉 − log det(PQ−1) − n, P,Q ∈ PD(n)

note: KL(P,Q) = KL(Q−1, P−1)

(BFGS update) minB∈PD(n)

KL(B, Bk) subject to Bs = y

(DFP update) minB∈PD(n)

KL(Bk, B), subject to Bs = y

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 7 / 30

Page 9: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

— Bregman extension of quasi-Newton method —

keywords: Bregman divergence, projection

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 8 / 30

Page 10: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

quasi-Newton updates with Bregman divergence

Bregman divergence: An extension of KL-divergence

準ニュートン更新則の Bregman拡張¶ ³PD(n)上の Bregman divergence =⇒

幾何構造行列の更新則µ ´

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 8 / 30

Page 11: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

Bregman divergence with V-potential [Ohara and Eguchi, ’05]

V-potential : ϕV(P) = V(det P), V : R+ → RBregman div. : DV(P,Q) = ϕV(P) − {ϕV(Q) + 〈∇ϕV(Q), P − Q〉}

Tangent line at Q

Bregman divergence

Q

ϕV (Q)

P

ϕV (P )

DV (P,Q)

V(z) = − log z =⇒ KL-divergence,i.e. BFGS or DFP update.

ϕV(P) is strictly convex⇐⇒ν(z) := −zV′(z) > 0,β(z) := zν′(z)/ν(z) < 1/n

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 9 / 30

Page 12: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

Information Geometry on PD(n) [Ohara and Eguchi, ’05] -1/3-two coordinate systems

PD(n)上の2つの座標系: for B ∈ PD(n)

B → η(B) = B, B → θV(B) = ν(det(B))B−1 (one-to-one)

例:V(z) = − log z =⇒ θ(B) = B−1

η(B) = B

θV (B) = ν(det B)B−1

PD(n)

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 10 / 30

Page 13: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

Information Geometry on PD(n) [Ohara and Eguchi, ’05] -2/3-Autoparallel Submanifold

M ⊂ PD(n) : submanifold

Mが η-座標に関して affine平面 =⇒ η-autoparallel

Mが θV-座標に関して affine平面 =⇒ θV-autoparallel

(V(z) = − log zのとき θ-autoparallel)

M = {B ∈ PD(n) | 〈A, θV (B)〉 = c} (θV -autoparallel)

θV -coordinate

例:secant condition. η, θ-autoparallel (doubly autoparallel).

M = {B ∈ PD(n) | η(B)s = y} = {B ∈ PD(n) | s = θ(B)y}(η(B) = B, θ(B) = B−1)

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 11 / 30

Page 14: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

Information Geometry on PD(n) [Ohara and Eguchi, ’05] -3/3-Projection onto autoparallel submanifold

M : η-autoparallel submanifold in PD(n)

Bk のMへの θV-projection : Bk+1 = argminB

DV(B, Bk), B ∈ M,

Bk, Bk+1を結ぶ θV-geodesicとMは直交⇐⇒ 〈θV(Bk) − θV(Bk+1), η(B) − η(Bk+1)〉 = 0, ∀B ∈ M

(η ←→ θV で同様の関係)

M : η-autoparallel

θV -geodesic

η-geodesic

Bk

Bk+1

B

For all B ∈ M,

DV(B, Bk)= DV(B, Bk+1) + DV(Bk+1, Bk)

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 12 / 30

Page 15: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

A Variational View of quasi-Newton updates [Fletcher ’91](revisited)

M: doubly autoparallelBFGS: η-autoparallelMへの Bk の θ-projectionDFP: θ-autoparallelMへの Bk の η-projection

Bk

BBFGS[Bk]BDFP [Bk]

Secant condition:M = {B ∈ PD(n) | Bs = y} (η, θ-autoparallel)

θ-geodesicη-geodesic

実用上は BFGSが数値的に良いとされているGeometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 13 / 30

Page 16: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

V-extension of quasi-Newton update

V-BFGS update: minB∈PD(n)

DV(B, Bk) subject to B ∈ M (i.e. Bs = y)

V-DFP update: minB∈PD(n)

DV(B−1, B−1k

) subject to B ∈ M

M : η, θ-autoparallel

non-θV -autoparallel

θV -geodesic

Bk

Bk+1 (V -BFGS)

note: minB DV(Bk, B) subject to B ∈ M may not be convex problem.Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 14 / 30

Page 17: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

V-BFGS update formula

V-BFGS update¶ ³Bk+1 =

ν(det Bk+1)ν(det Bk)

· BBFGS[Bk]+

(1 −ν(det Bk+1)ν(det Bk)

yy>

s>y,

where ν(z) = −zV′(z) > 0.µ ´KL-div. (V(z) = − log(z)) =⇒ ν(z) = 1, Bk+1 = BBFGS[Bk

]Bk ∈ PD(n), s>y > 0, ν > 0 =⇒ Bk+1 ∈ PD(n)BFGS updateと (ほとんど)同じ計算量Self-scaling quasi-Newton:

Bk+1 = φk · BBFGS[Bk]+ (1 − φk) ·

yy>

s>yV-BFGSでは φk が V-potentialから決まる.

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 15 / 30

Page 18: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

V-BFGS update:

Initialization: Let L0 L>0= B0 be the Cholesky decomposition of B0, and

x0 ∈ Rn be an initial point. Set k = 0.

Repeat: If stopping criterion is satisfied, go to Output.

.

..1 Let xk+1 = xk − αk B−1k∇ f (xk) with an appropriate αk ≥ 0.

The Cholesky decomposition Bk = Lk L>k

is available tocompute B−1

k∇ f (xk).

.

.

.

2 Update Lk to L which is the Cholesky decomposition ofBBFGS[Bk; sk, yk]. The Cholesky decomposition forrank-one update is available.

.

.

.

3 Compute C = (det L)2/ν((det Lk)2)n−1 and find the root z∗of the equation C · ν(z)n−1 = z, z > 0.

.

.

.

4 Compute the Cholesky decomposition Lk+1 such that

Lk+1 L>k+1 =

ν(z∗)ν((det Lk)2)

LL> +(1 −

ν(z∗)ν((det Lk)2)

) yk y>k

s>k

yk.

.

.

.

5 k ← k + 1.

Output: Local optimal solution xk.

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 16 / 30

Page 19: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

— Exploiting Sparsity of Hessian matrix —

keywords: iterative Bregman projection

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 17 / 30

Page 20: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

Sparsity of Hessian matrix

適当な xについて

F ⊃ {(i, j) | (∇2 f (x))i j , 0},S := {B ∈ PD(n) | Bi j = 0, (i, j) ∈ Fc}

∗ ∗ ∗ ∗ ∗∗ ∗ ∗∗ ∗ ∗∗ ∗∗ ∗

設定

xの次元 nが大きいFの要素数は少ない

計算量を減らすために疎行列 Bk ∈ Sを利用Sは η-autoparallel=⇒情報幾何的にアルゴリズムを考察

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 17 / 30

Page 21: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

Sparse quasi-Newton [Yamashita, ’08]

M = {B ∈ PD(n) | Bs = y} (η, θ-autoparallel)

S = {B ∈ PD(n) | Bi j = 0, (i, j) ∈ Fc} (η-autoparallel)

Hessian update:Bk ∈ S −→ Bk+1 ∈ SBk+1 ∈ M ∩ S (η-autoparallel)が理想的

計算が大変なので近似¶ ³

.

.

.

1 Bk から B = BBFGS[Bk] or B = BDFP[Bk]を計算BBFGS[Bk]は Bk のMへの θ-projectionBDFP[Bk]は Bk のMへの η-projection

.

.

.

2 Bを Sに θ-projection

Bk+1 = argminB∈S

KL(B, B)µ ´Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 18 / 30

Page 22: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

Sparse quasi-Newton [Yamashita, ’08]

BkBk+1

B

M

S

η or θ-proj.

θ-proj.

M∩S

Bk −→ B = BDFP[Bk] −→ Bk+1: em-algorithm

Bk −→ B = BBFGS[Bk] −→ Bk+1: boosting

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 19 / 30

Page 23: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

Sparse V-quasi-Newton – boosting-type extension –

.

..

1 Bは Bk のMへの θV-projection (V-BFGS)

.

. .2 Bk+1 は Bk の Sへの θV-projection (計算可能)

θV-projectionを繰り返すと B∗ = argminB∈M∩S DV(B, Bk)に収束

B(1) = Bk

B(2)

B(3)

M

SB∗

θV -proj.

θV -proj.

DV(B∗, B(1)) ≥ DV(B∗, B(2)) ≥ · · · ≥ DV(B∗, B(T))

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 20 / 30

Page 24: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

Sparse V-quasi-Newton – em-type extension –

.

..

1 Bは Bk のMへの η-projection (DFP)

.

. .2 Bk+1 は Bk の Sへの θV-projection

η, θV-projectionを繰り返したときの収束先?

B(1) = Bk

B(2)

B(3)

B(1)

B(2)

M

SM∩S

ηθVηθV

DV(B(1), B(1)) ≥ DV(B(2), B(2)) ≥ · · · ≥ DV(B(T), B(T))

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 21 / 30

Page 25: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

Computation of Projectoin onto S

G = ({1, . . . , n}, F): cordal graphとする (loopを除く)

maximal clique of G: C1, . . . ,CL∗ ∗ ∗ ∗ ∗∗ ∗ ∗∗ ∗ ∗∗ ∗∗ ∗

⇐⇒

1

2 3

4

5

C3

C2C1

B = BBFGS[Bk] or BDFP[Bk]のSへの θV-projection

minB

DV(B, B), B ∈ S

−→解 (Bopt)−1: H = B−1 の部分行列 HC`C` から計算可能Yamashitaの方法:Sparse clique-factorization [Fukuda et al.,’00]

−→ V-quasi Newtonにも適用可Computation cost of B(t) → B(t+1): O(

∑L`=1|CL|2) � O(n)

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 22 / 30

Page 26: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

— Other Topics —

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 22 / 30

Page 27: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

Convergence property of V-BFGS

標準的な仮定:

.

. . 1 The objective function f ∈ C2(Rn).

.

..

2 The level set L = {x ∈ Rn | f (x) ≤ f (x0)} is convex, andthere exist positive constants m and M such that

m‖z‖2 ≤ z>∇2 f (x)z ≤ M‖z‖2

for all z ∈ Rn and x ∈ L.

.

Theorem

.

.

.

∃L1, L2 > 0 such that L1 ≤ ν ≤ L2 =⇒ limk→∞

xk = x∗ (local opt.)

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 23 / 30

Page 28: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

Invariance under Group Action -1/2-

変数変換: x −→ x = Tx, f (x) := f (T−1 x)不変性:xk = Txk, k = 0, 1, 2, . . . ,は成立するか?Newton method, BFGS or DFP with exact line searchで成立

x0

−B−1

0∇f (x0)

x3x0 = Tx0

−B−1

0∇f (x0)

x3

T

T−1

不変なアルゴリズム:変数変換のもとで収束性などの挙動は不変Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 24 / 30

Page 29: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

Invariance under Group Action -2/2-

transformation: x −→ x = TxT ∈ SL(n) = {T ∈ GL(n) | det T = 1}=⇒任意の potential V に対して V-BFGSは不変.T ∈ GL(n)に対して V-BFGSが不変⇐⇒ power potential V(z) = (1 − zγ)/γ

limγ→0(1 − zγ)/γ = − log zPD(n)上の射影:GL(n)-group actionに対して不変

V-BFGS update with power potential:

Bk+1 =

( s>ys>Bks

)αBBFGS[Bk] +

(1 −

( s>ys>Bks

)α) yy>

s>y,

α =γ

1 − (n − 1)γ,

( − 1n − 1

< α < 1 for strict convexity)

α = 1: a popular self-scaling quasi-Newton.

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 25 / 30

Page 30: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

Robustness against inexact line search -1/3-

line searchの誤差 =⇒ quasi-Newton updateに影響xk+1 = xk − αkB−1

k∇ f (xk), s = xk+1 − xk, y = ∇ f (xk+1) − ∇ f (xk)

(Inexact line search) minB∈PD(n)

DV(B, Bk) subject to B(1 + ε)s = y + εy

=⇒ opt. sol. B(ε)k+1

xk

xk+1 exact

x′

k+1inexact

s

ε · s

s −→ s + ε · s

y −→ y + ε · ∇2 f (xk+1) s + O(ε2)

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 26 / 30

Page 31: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

Robustness against inexact line search -2/3-

A measure of sensitivity against numerical error:

Influence function: dBV(Bk, y) := limε→0

B(ε)k+1− B(0)

k+1

ε

Gross error sensitivity : maxBk, y‖dBV(Bk, y)‖

(Bk, yの範囲を適当に取る)

Gross error sensitivityが小さい⇐⇒誤差の影響が小さいrobust statisticsで使われる評価法¶ ³

Gross error sensitivityを最小にする potential V を求める

For given s, y ∈ Rn s.t. s>y > 0,

minV

maxBk, y

‖dBV(Bk, y)‖ subject to Bk ∈ PD(n), y ∈ Yµ ´Y ⊂ Rn: a bounded subset (‖∇2 f‖ ≤ M < ∞に相当)

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 27 / 30

Page 32: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

Robustness against inexact line search -3/3-

V-BFGS, V-DFPB � ∇2 f update, H = B−1 � (∇2 f )−1 update

4通りについて Gross error sensitivityの値を計算

V-BFGS V-DFPB update BFGSのみ有界 ∞

H = B−1 update ∞ ∞(for all s, y s.t. s>y > 0)

∇2 f を近似する BFGSが (V-拡張のなかで)最適sensitivity for line search: duality is violated.

(B, s, y) ↔ (H, y, s)(B, (1 + ε)s, y + εy)= (H, y + εy, (1 + ε)s)

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 28 / 30

Page 33: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

Concluding Remarks

Future works

Superlinear convergence of V-quasi-Newton updatesWhich V is preferable?

Robustness against numerical error:BFGS for B-update, i.e. V(z) = − log(z)Optimally conditioned update formula[Dennis and Wolkovicz, sizing and least-change secant methods, ’93]

(intensive) Numerical experimentsrate of convergencecomputation costrobustness

Link between computation and geometry[Ohara and Tsuchiya, An information geometric approach to polynomial-time

interior-point algorithms, ’07]

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 29 / 30

Page 34: A Bregman Extension of quasi-Newton Updatesmatsuzoe.web.nitech.ac.jp/infogeo/OCAMI2010/kanamori.pdfA Bregman Extension of quasi-Newton Updates Takafumi Kanamori1 Atsumi Ohara2 1Nagoya

References

OptimizationR. Fletcher. A new result for quasi-Newton formulae. SIAM J. Optim., 1:18–21,1991.H. Yamashita, Sparse quasi-Newton updates with positive definite matrixcompletion, Mathematical programming, 2008, vol. 115, no1, pp. 1-30.I. S. Dhillon and J. A. Tropp, Matrix nearness problems with Bregmandivergences. SIAM J. Matrix Anal. Appl., 29(4):1120–1146, 2007.O. Guler, F. Gurtuna, and O. Shevchenko. Duality in quasi-newton methodsand new variational characterizations of the DFP and BFGS updates.Optimization Methods and Software, 24(1):45–62, 2009.J. Nocedal and S. J. Wright. Numerical Optimization. Springer, 1999.

Information GeometryS. Amari and H. Nagaoka. Methods of Information Geometry, volume 191 ofTranslations of Mathematical Monographs. Oxford University Press, 2000.A. Ohara and S. Eguchi, Geometry on positive definite matrices andv-potential function, Technical report, ISM Research Memo, 2005.N. Murata, T. Takenouchi, T. Kanamori and S. Eguchi, Information geometry ofU-Boost and Bregman divergence, Neural Computation, 16, 1437-1481, 2004.

Robust StatisticsF. R. Hampel, P. J. Rousseeuw, E. M. Ronchetti, and W. A. Stahel. RobustStatistics. The Approach based on Influence Functions. John Wiley and Sons,Inc., 1986

Geometric Approach to Information TEchnology Bregman extension of quasi-Newton updates 30 / 30