faster gradient descent methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (cg),...

Post on 22-Feb-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Faster Gradient Descent MethodsRychlejsı gradientnı spadove metody

Ing. Lukas Pospısil, Ing. Martin Mensık

Katedra aplikovane matematiky,VSB - Technicka univerzita Ostrava

24.1.2012

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 1/28

Osnova prezentace

Motivace

Stochasticke metody

Barzilai-Borweinova metoda

Numericke testy

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 2/28

Motivace

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 3/28

Zakladnı uloha kvadratickeho programovanı

Uloha

Naleznete minimum ryze konvexnı kvadraticke funkce, tj.

x = arg minx∈Rn

1

2xTAx − bT x ,

ekvivalentne reste soustavu linearnıch rovnic

Ax = b ,

kde A ∈ Rn,n je SPD, b ∈ Rn, x ∈ Rn.(Naleznete koreny rovnice g(x) := Ax − b = 0)

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 4/28

Iteracnı metody

Krylovovske metody

Metody tvorıcı ortogonalnı bazi podprostoruKm = {g0,Ag0, . . . ,A

m−1g0}Velmi rychle

Velmi nachylne na chybu

sdruzene gradienty (CG), Lanzosova metoda, ...

Gradientnı spadove metody

Minimalizace ve smeru gradientu

Velmi stabilnı

Velmi pomale

Richardsonova metoda, metoda nejvetsıho spaduIng. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 5/28

Gradientnı spadove metody

Obecna spadova metoda

xk+1 = xk −1

βkgk

Metoda nejvetsıho spadu

xk+1 = xk −(gk , gk)

(Agk , gk)gk

Richardsonova metoda s optimalnı delkou kroku

xk+1 = xk −2

λAmax + λAmin

gk

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 6/28

Rekurentnı vypocet gradientu

Jelikoz

xk+1 = xk −1

βkgk ,

lze jednoduse upravit

gk+1 = Axk+1 − b = A(xk −1

βkgk)− b = gk −

1

βkAgk .

µ(k)α =

(Aαgk , gk)

(gk , gk).

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 7/28

Stochasticka volba delky kroku

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 8/28

Pravdepodobnostnı mıra nad gradientem

Pro jednoduchost predpokladejme (bez ujmy na obecnosti) A zadiagonalnı matici A = diag{λ1, . . . , λd}, kde0 < m = λ1 ≤ · · · ≤ λd = M <∞

zk =gk√

(gk , gk)

p(k)i = {zk}2

i

p(k+1)i =

(λi − βk)2

β2k − 2βkµ

(k)1 + µ

(k)2

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 9/28

Atraktivnı (pritazlive) extremy (okraje) p

Mejme posloupnost βk > 0, βk 6∈ {m,M} pro vsechna k sdistribucnı funkcı F (β) s nosicem 〈m′,M ′〉, kde 0 < m′ ≤ M ′ <∞a navıc platı:∫

log(β−λ)2dF (β) < max{∫

log(M−β)2dF (β),

∫log(m−β)2dF (β)}

∀λ ∈ {λ2, . . . , λd−1}Pak existujı konstanty C > 0, k0 > 0, 0 ≤ θ < 1 takove, ze:

d−1∑i=2

= p(k)i ≤ C Θk ∀k > k0

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 10/28

Vyhodne n-tice

Pro pokles budeme volit opakujıcı se n-tice {β0, . . . , βN}symetricke podle stredu spektra - m+M

2

Odhad R

R22 (β) =

(β −m)(M − β)

β(m + M − β)

RN =

N∏j=0

(βj −m)2

β2j

1N+1

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 11/28

a dostavame se tak k...

Rarcsin,ε =

(M −m + 2

√ε(M −m − ε)

M + m + 2√

(M − ε)(m + ε)

)2

Rarcsin,ε = R∞(1 + 4√ε(M −m)) + O(ε), ε→ 0

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 12/28

Algoritmus

vyberme male kladne τ , napr. τ = 10−6, z0 = 0

pro k = 0, 1 volme βk = µ(k)1 a zacneme s odhady m,M

pro k > 1, nastavme εk = τ(M −m)

pro k = 2j : zj = zj−1 + ϕ aβ2j = mk + εk + (cos(πzj) + 1)(M −m − 2ε)pro k = 2j + 1 : β2j+1 = M + m − β2j

vylepseme odhad m,M

V algoritmu je pouzita konstanta ϕ = 12 (√

5− 1).

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 13/28

Problemy s nepresnostı

Gradient gk+1 se obvykle pocıta rekurzivne:gk+1 = A(k+1)xk+1 − b = (A(k+1)xk − b)− 1

βkA(k+1)gk =

gk − 1βk

A(k+1)gk

Problem vezı v nepresne aplikaci A:gk = A(k)xk − b 6= A(k+1)xk − b = gk

Ignorace vede k vysledu ... ale spatnemu!

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 14/28

Resenı lezı v kompromisu

Mame dva zpusoby ja zıskat gk+1

Rekurzivnı

gk+1 = gk − 1βk

A(k+1)gk

PRO: poskytuje kvalitnı odhad spektra

CON: vede k spatnemu vysledku

Restartovany

gk+1 = A(k+1)xk+1 − b

PRO: vede ke spravnemu vysledku

CON: odhad spektra je velmi slaby

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 15/28

Barzilai-Borweinova metoda

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 16/28

Odvozenı

Metoda secen (modifikace Newtonovy metody)

Naleznete koreny rovnice g(x) = 0, g : R→ R

xk+1 = xk −xk − xk−1

gk − gk−1gk

(pokud g na 〈a, b〉 3 xk splnuje urcite podmınky)

Metoda secen je spadova metoda

xk+1 = xk −1

βkgk ⇒ βk =

gk − gk−1

xk − xk−1

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 17/28

Odvozenı

Metoda secen v nD?

Naleznete koreny rovnice g(x) = Ax − b = 0, g : Rn → Rn.

βk =gk − gk−1

xk − xk−1nelze dosadit

Mısto toho resme tzv. secant equation

(xk − xk−1)βk = gk − gk−1

metodou nejmensıch ctvercu, tj.

βk := arg minβ∈R‖(xk − xk−1)β − (gk − gk−1)‖2

2 .

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 18/28

Odvozenı

Metoda secen v nD?

Oznacmesk := xk − xk−1, yk := gk − gk−1 ,

pak resenım minimalizacnı ulohy

βk := arg minβ∈R‖skβ − yk‖2

2 .

je (z nutne podmınky existence minima)

βk =(sk , yk)

(sk , sk).

Hotovo?

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 19/28

Prekvapenı

Mensı uprava a prekvapenı

Jelikoz

yk = gk − gk−1 = (Axk − b)− (Axk−1 − b) = Asksk = xk − xk−1 = (xk−1 − β−1

k gk−1)− xk−1 = −β−1k gk−1

pak dosazenım

βk =(sk ,Ask)

(sk , sk)=

(−β−1k gk−1,−β−1

k Agk−1)

(−β−1k gk−1,−β−1

k gk−1)=

(Agk−1, gk−1)

(gk−1, gk−1).

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 20/28

Prekvapenı

Mensı uprava a prekvapenı

Tedy Barzilai-Borweinova metoda ma predpis

xk+1 = xk −(gk−1, gk−1)

(Agk−1, gk−1)gk .

≈ metoda nejvetsıho spadu s opozdenım.

Rekurentnı vypocet gradientu

Navıc

gk+1 = Axk+1 − b = A(xk −1

βkgk)− b = gk −

1

βkAgk .

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 21/28

Konvergence spadovych metod

Konvergence spadovych metod

gk = γk1 v1 + · · ·+ γkn vn

kde v1, . . . , vk je ortogonalnı baze z vlastnıch vektoru a γki jsousouradnice vektoru gk v teto bazi.Jelikoz

gk+1 = gk −1

βkAgk

pak dosazenım a upravou

∀i = 1, . . . , n : γk+1i =

(1− λi

βk

)γki

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 22/28

Konvergence spadovych metod

Konvergence spadovych metod

Dale take

‖gk‖22 = (gk , gk) =

(n∑

i=1

γki vi ,n∑

i=1

γki vi

)=

n∑i=1

(γki )2

limk→∞

‖gk‖2 = 0 ⇔ limk→∞

γki = 0, ∀i = 1, . . . , n

Proto chovanı funkcı(

1− λiβk

)je v otazkach konvergence klıcove.

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 23/28

Rychlost konvergence BB metody

Rychlost konvergence BB metody

Necht’ minimalizovana kvadraticka funkce f je ryze konvexnı.Necht’ {xk} je posloupnost generovana metodou BB. Pak

existuje konecne k takove, ze gk = 0

posloupnost {‖gk‖2} konverguje k nule R-linearne, konkretne

‖gk‖2 ≤(

1

2

)k

.C‖g1‖2 ,

kde C ∈ R je konstanta zavisla na λmax a λmin.

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 24/28

Numericke testy

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 25/28

Numericky test 1 - jednoducha matice

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 26/28

Numericky test 2 - 3D elektrostatika (Doc. Lukas)

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 27/28

Dekujeme za pozornost

J. Barzilai, J. M. Borwein: Two point step size gradientmethods. IMA Journal of Numerical Analysis, 8:141-148, 1988.

M. Raydan: Convergence properities of the Barzilai andBorwein gradient method. Rice University, 1991.

Dai, Y.H., Liao, L.-Z.: R-linear convergence of the Barzilai andBorwein gradient method. IMA J. Numer.Anal. 26, 1–10(2002)

L. Pronzato, A. Zhigljavsky: Gradient algorithm for quadraticoptimization with fast convergence rates. Springer, 2010.

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 28/28

top related