faster gradient descent methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (cg),...

28
Faster Gradient Descent Methods Rychlejˇ ı gradientn´ ı sp´ adov´ e metody Ing. Luk´ s Posp´ ıˇ sil, Ing. Martin Menˇ ık Katedra aplikovan´ e matematiky, V ˇ SB - Technick´ a univerzita Ostrava 24.1.2012 Ing. Luk´ s Posp´ ıˇ sil, Ing. Martin Menˇ ık — Faster Gradient Descent Methods 1/28

Upload: others

Post on 22-Feb-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Faster Gradient Descent MethodsRychlejsı gradientnı spadove metody

Ing. Lukas Pospısil, Ing. Martin Mensık

Katedra aplikovane matematiky,VSB - Technicka univerzita Ostrava

24.1.2012

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 1/28

Page 2: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Osnova prezentace

Motivace

Stochasticke metody

Barzilai-Borweinova metoda

Numericke testy

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 2/28

Page 3: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Motivace

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 3/28

Page 4: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Zakladnı uloha kvadratickeho programovanı

Uloha

Naleznete minimum ryze konvexnı kvadraticke funkce, tj.

x = arg minx∈Rn

1

2xTAx − bT x ,

ekvivalentne reste soustavu linearnıch rovnic

Ax = b ,

kde A ∈ Rn,n je SPD, b ∈ Rn, x ∈ Rn.(Naleznete koreny rovnice g(x) := Ax − b = 0)

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 4/28

Page 5: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Iteracnı metody

Krylovovske metody

Metody tvorıcı ortogonalnı bazi podprostoruKm = {g0,Ag0, . . . ,A

m−1g0}Velmi rychle

Velmi nachylne na chybu

sdruzene gradienty (CG), Lanzosova metoda, ...

Gradientnı spadove metody

Minimalizace ve smeru gradientu

Velmi stabilnı

Velmi pomale

Richardsonova metoda, metoda nejvetsıho spaduIng. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 5/28

Page 6: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Gradientnı spadove metody

Obecna spadova metoda

xk+1 = xk −1

βkgk

Metoda nejvetsıho spadu

xk+1 = xk −(gk , gk)

(Agk , gk)gk

Richardsonova metoda s optimalnı delkou kroku

xk+1 = xk −2

λAmax + λAmin

gk

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 6/28

Page 7: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Rekurentnı vypocet gradientu

Jelikoz

xk+1 = xk −1

βkgk ,

lze jednoduse upravit

gk+1 = Axk+1 − b = A(xk −1

βkgk)− b = gk −

1

βkAgk .

µ(k)α =

(Aαgk , gk)

(gk , gk).

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 7/28

Page 8: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Stochasticka volba delky kroku

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 8/28

Page 9: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Pravdepodobnostnı mıra nad gradientem

Pro jednoduchost predpokladejme (bez ujmy na obecnosti) A zadiagonalnı matici A = diag{λ1, . . . , λd}, kde0 < m = λ1 ≤ · · · ≤ λd = M <∞

zk =gk√

(gk , gk)

p(k)i = {zk}2

i

p(k+1)i =

(λi − βk)2

β2k − 2βkµ

(k)1 + µ

(k)2

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 9/28

Page 10: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Atraktivnı (pritazlive) extremy (okraje) p

Mejme posloupnost βk > 0, βk 6∈ {m,M} pro vsechna k sdistribucnı funkcı F (β) s nosicem 〈m′,M ′〉, kde 0 < m′ ≤ M ′ <∞a navıc platı:∫

log(β−λ)2dF (β) < max{∫

log(M−β)2dF (β),

∫log(m−β)2dF (β)}

∀λ ∈ {λ2, . . . , λd−1}Pak existujı konstanty C > 0, k0 > 0, 0 ≤ θ < 1 takove, ze:

d−1∑i=2

= p(k)i ≤ C Θk ∀k > k0

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 10/28

Page 11: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Vyhodne n-tice

Pro pokles budeme volit opakujıcı se n-tice {β0, . . . , βN}symetricke podle stredu spektra - m+M

2

Odhad R

R22 (β) =

(β −m)(M − β)

β(m + M − β)

RN =

N∏j=0

(βj −m)2

β2j

1N+1

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 11/28

Page 12: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

a dostavame se tak k...

Rarcsin,ε =

(M −m + 2

√ε(M −m − ε)

M + m + 2√

(M − ε)(m + ε)

)2

Rarcsin,ε = R∞(1 + 4√ε(M −m)) + O(ε), ε→ 0

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 12/28

Page 13: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Algoritmus

vyberme male kladne τ , napr. τ = 10−6, z0 = 0

pro k = 0, 1 volme βk = µ(k)1 a zacneme s odhady m,M

pro k > 1, nastavme εk = τ(M −m)

pro k = 2j : zj = zj−1 + ϕ aβ2j = mk + εk + (cos(πzj) + 1)(M −m − 2ε)pro k = 2j + 1 : β2j+1 = M + m − β2j

vylepseme odhad m,M

V algoritmu je pouzita konstanta ϕ = 12 (√

5− 1).

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 13/28

Page 14: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Problemy s nepresnostı

Gradient gk+1 se obvykle pocıta rekurzivne:gk+1 = A(k+1)xk+1 − b = (A(k+1)xk − b)− 1

βkA(k+1)gk =

gk − 1βk

A(k+1)gk

Problem vezı v nepresne aplikaci A:gk = A(k)xk − b 6= A(k+1)xk − b = gk

Ignorace vede k vysledu ... ale spatnemu!

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 14/28

Page 15: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Resenı lezı v kompromisu

Mame dva zpusoby ja zıskat gk+1

Rekurzivnı

gk+1 = gk − 1βk

A(k+1)gk

PRO: poskytuje kvalitnı odhad spektra

CON: vede k spatnemu vysledku

Restartovany

gk+1 = A(k+1)xk+1 − b

PRO: vede ke spravnemu vysledku

CON: odhad spektra je velmi slaby

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 15/28

Page 16: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Barzilai-Borweinova metoda

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 16/28

Page 17: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Odvozenı

Metoda secen (modifikace Newtonovy metody)

Naleznete koreny rovnice g(x) = 0, g : R→ R

xk+1 = xk −xk − xk−1

gk − gk−1gk

(pokud g na 〈a, b〉 3 xk splnuje urcite podmınky)

Metoda secen je spadova metoda

xk+1 = xk −1

βkgk ⇒ βk =

gk − gk−1

xk − xk−1

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 17/28

Page 18: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Odvozenı

Metoda secen v nD?

Naleznete koreny rovnice g(x) = Ax − b = 0, g : Rn → Rn.

βk =gk − gk−1

xk − xk−1nelze dosadit

Mısto toho resme tzv. secant equation

(xk − xk−1)βk = gk − gk−1

metodou nejmensıch ctvercu, tj.

βk := arg minβ∈R‖(xk − xk−1)β − (gk − gk−1)‖2

2 .

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 18/28

Page 19: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Odvozenı

Metoda secen v nD?

Oznacmesk := xk − xk−1, yk := gk − gk−1 ,

pak resenım minimalizacnı ulohy

βk := arg minβ∈R‖skβ − yk‖2

2 .

je (z nutne podmınky existence minima)

βk =(sk , yk)

(sk , sk).

Hotovo?

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 19/28

Page 20: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Prekvapenı

Mensı uprava a prekvapenı

Jelikoz

yk = gk − gk−1 = (Axk − b)− (Axk−1 − b) = Asksk = xk − xk−1 = (xk−1 − β−1

k gk−1)− xk−1 = −β−1k gk−1

pak dosazenım

βk =(sk ,Ask)

(sk , sk)=

(−β−1k gk−1,−β−1

k Agk−1)

(−β−1k gk−1,−β−1

k gk−1)=

(Agk−1, gk−1)

(gk−1, gk−1).

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 20/28

Page 21: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Prekvapenı

Mensı uprava a prekvapenı

Tedy Barzilai-Borweinova metoda ma predpis

xk+1 = xk −(gk−1, gk−1)

(Agk−1, gk−1)gk .

≈ metoda nejvetsıho spadu s opozdenım.

Rekurentnı vypocet gradientu

Navıc

gk+1 = Axk+1 − b = A(xk −1

βkgk)− b = gk −

1

βkAgk .

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 21/28

Page 22: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Konvergence spadovych metod

Konvergence spadovych metod

gk = γk1 v1 + · · ·+ γkn vn

kde v1, . . . , vk je ortogonalnı baze z vlastnıch vektoru a γki jsousouradnice vektoru gk v teto bazi.Jelikoz

gk+1 = gk −1

βkAgk

pak dosazenım a upravou

∀i = 1, . . . , n : γk+1i =

(1− λi

βk

)γki

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 22/28

Page 23: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Konvergence spadovych metod

Konvergence spadovych metod

Dale take

‖gk‖22 = (gk , gk) =

(n∑

i=1

γki vi ,n∑

i=1

γki vi

)=

n∑i=1

(γki )2

limk→∞

‖gk‖2 = 0 ⇔ limk→∞

γki = 0, ∀i = 1, . . . , n

Proto chovanı funkcı(

1− λiβk

)je v otazkach konvergence klıcove.

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 23/28

Page 24: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Rychlost konvergence BB metody

Rychlost konvergence BB metody

Necht’ minimalizovana kvadraticka funkce f je ryze konvexnı.Necht’ {xk} je posloupnost generovana metodou BB. Pak

existuje konecne k takove, ze gk = 0

posloupnost {‖gk‖2} konverguje k nule R-linearne, konkretne

‖gk‖2 ≤(

1

2

)k

.C‖g1‖2 ,

kde C ∈ R je konstanta zavisla na λmax a λmin.

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 24/28

Page 25: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Numericke testy

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 25/28

Page 26: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Numericky test 1 - jednoducha matice

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 26/28

Page 27: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Numericky test 2 - 3D elektrostatika (Doc. Lukas)

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 27/28

Page 28: Faster Gradient Descent Methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (CG), Lanzosova metoda, ... Gradientn sp adov e metody Minimalizace ve sm eru gradientu Velmistabiln

Dekujeme za pozornost

J. Barzilai, J. M. Borwein: Two point step size gradientmethods. IMA Journal of Numerical Analysis, 8:141-148, 1988.

M. Raydan: Convergence properities of the Barzilai andBorwein gradient method. Rice University, 1991.

Dai, Y.H., Liao, L.-Z.: R-linear convergence of the Barzilai andBorwein gradient method. IMA J. Numer.Anal. 26, 1–10(2002)

L. Pronzato, A. Zhigljavsky: Gradient algorithm for quadraticoptimization with fast convergence rates. Springer, 2010.

Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 28/28