faster gradient descent methodshomel.vsb.cz/~pos220/research/sna.pdfsdru zen e gradienty (cg),...
TRANSCRIPT
Faster Gradient Descent MethodsRychlejsı gradientnı spadove metody
Ing. Lukas Pospısil, Ing. Martin Mensık
Katedra aplikovane matematiky,VSB - Technicka univerzita Ostrava
24.1.2012
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 1/28
Osnova prezentace
Motivace
Stochasticke metody
Barzilai-Borweinova metoda
Numericke testy
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 2/28
Motivace
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 3/28
Zakladnı uloha kvadratickeho programovanı
Uloha
Naleznete minimum ryze konvexnı kvadraticke funkce, tj.
x = arg minx∈Rn
1
2xTAx − bT x ,
ekvivalentne reste soustavu linearnıch rovnic
Ax = b ,
kde A ∈ Rn,n je SPD, b ∈ Rn, x ∈ Rn.(Naleznete koreny rovnice g(x) := Ax − b = 0)
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 4/28
Iteracnı metody
Krylovovske metody
Metody tvorıcı ortogonalnı bazi podprostoruKm = {g0,Ag0, . . . ,A
m−1g0}Velmi rychle
Velmi nachylne na chybu
sdruzene gradienty (CG), Lanzosova metoda, ...
Gradientnı spadove metody
Minimalizace ve smeru gradientu
Velmi stabilnı
Velmi pomale
Richardsonova metoda, metoda nejvetsıho spaduIng. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 5/28
Gradientnı spadove metody
Obecna spadova metoda
xk+1 = xk −1
βkgk
Metoda nejvetsıho spadu
xk+1 = xk −(gk , gk)
(Agk , gk)gk
Richardsonova metoda s optimalnı delkou kroku
xk+1 = xk −2
λAmax + λAmin
gk
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 6/28
Rekurentnı vypocet gradientu
Jelikoz
xk+1 = xk −1
βkgk ,
lze jednoduse upravit
gk+1 = Axk+1 − b = A(xk −1
βkgk)− b = gk −
1
βkAgk .
µ(k)α =
(Aαgk , gk)
(gk , gk).
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 7/28
Stochasticka volba delky kroku
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 8/28
Pravdepodobnostnı mıra nad gradientem
Pro jednoduchost predpokladejme (bez ujmy na obecnosti) A zadiagonalnı matici A = diag{λ1, . . . , λd}, kde0 < m = λ1 ≤ · · · ≤ λd = M <∞
zk =gk√
(gk , gk)
p(k)i = {zk}2
i
p(k+1)i =
(λi − βk)2
β2k − 2βkµ
(k)1 + µ
(k)2
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 9/28
Atraktivnı (pritazlive) extremy (okraje) p
Mejme posloupnost βk > 0, βk 6∈ {m,M} pro vsechna k sdistribucnı funkcı F (β) s nosicem 〈m′,M ′〉, kde 0 < m′ ≤ M ′ <∞a navıc platı:∫
log(β−λ)2dF (β) < max{∫
log(M−β)2dF (β),
∫log(m−β)2dF (β)}
∀λ ∈ {λ2, . . . , λd−1}Pak existujı konstanty C > 0, k0 > 0, 0 ≤ θ < 1 takove, ze:
d−1∑i=2
= p(k)i ≤ C Θk ∀k > k0
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 10/28
Vyhodne n-tice
Pro pokles budeme volit opakujıcı se n-tice {β0, . . . , βN}symetricke podle stredu spektra - m+M
2
Odhad R
R22 (β) =
(β −m)(M − β)
β(m + M − β)
RN =
N∏j=0
(βj −m)2
β2j
1N+1
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 11/28
a dostavame se tak k...
Rarcsin,ε =
(M −m + 2
√ε(M −m − ε)
M + m + 2√
(M − ε)(m + ε)
)2
Rarcsin,ε = R∞(1 + 4√ε(M −m)) + O(ε), ε→ 0
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 12/28
Algoritmus
vyberme male kladne τ , napr. τ = 10−6, z0 = 0
pro k = 0, 1 volme βk = µ(k)1 a zacneme s odhady m,M
pro k > 1, nastavme εk = τ(M −m)
pro k = 2j : zj = zj−1 + ϕ aβ2j = mk + εk + (cos(πzj) + 1)(M −m − 2ε)pro k = 2j + 1 : β2j+1 = M + m − β2j
vylepseme odhad m,M
V algoritmu je pouzita konstanta ϕ = 12 (√
5− 1).
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 13/28
Problemy s nepresnostı
Gradient gk+1 se obvykle pocıta rekurzivne:gk+1 = A(k+1)xk+1 − b = (A(k+1)xk − b)− 1
βkA(k+1)gk =
gk − 1βk
A(k+1)gk
Problem vezı v nepresne aplikaci A:gk = A(k)xk − b 6= A(k+1)xk − b = gk
Ignorace vede k vysledu ... ale spatnemu!
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 14/28
Resenı lezı v kompromisu
Mame dva zpusoby ja zıskat gk+1
Rekurzivnı
gk+1 = gk − 1βk
A(k+1)gk
PRO: poskytuje kvalitnı odhad spektra
CON: vede k spatnemu vysledku
Restartovany
gk+1 = A(k+1)xk+1 − b
PRO: vede ke spravnemu vysledku
CON: odhad spektra je velmi slaby
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 15/28
Barzilai-Borweinova metoda
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 16/28
Odvozenı
Metoda secen (modifikace Newtonovy metody)
Naleznete koreny rovnice g(x) = 0, g : R→ R
xk+1 = xk −xk − xk−1
gk − gk−1gk
(pokud g na 〈a, b〉 3 xk splnuje urcite podmınky)
Metoda secen je spadova metoda
xk+1 = xk −1
βkgk ⇒ βk =
gk − gk−1
xk − xk−1
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 17/28
Odvozenı
Metoda secen v nD?
Naleznete koreny rovnice g(x) = Ax − b = 0, g : Rn → Rn.
βk =gk − gk−1
xk − xk−1nelze dosadit
Mısto toho resme tzv. secant equation
(xk − xk−1)βk = gk − gk−1
metodou nejmensıch ctvercu, tj.
βk := arg minβ∈R‖(xk − xk−1)β − (gk − gk−1)‖2
2 .
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 18/28
Odvozenı
Metoda secen v nD?
Oznacmesk := xk − xk−1, yk := gk − gk−1 ,
pak resenım minimalizacnı ulohy
βk := arg minβ∈R‖skβ − yk‖2
2 .
je (z nutne podmınky existence minima)
βk =(sk , yk)
(sk , sk).
Hotovo?
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 19/28
Prekvapenı
Mensı uprava a prekvapenı
Jelikoz
yk = gk − gk−1 = (Axk − b)− (Axk−1 − b) = Asksk = xk − xk−1 = (xk−1 − β−1
k gk−1)− xk−1 = −β−1k gk−1
pak dosazenım
βk =(sk ,Ask)
(sk , sk)=
(−β−1k gk−1,−β−1
k Agk−1)
(−β−1k gk−1,−β−1
k gk−1)=
(Agk−1, gk−1)
(gk−1, gk−1).
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 20/28
Prekvapenı
Mensı uprava a prekvapenı
Tedy Barzilai-Borweinova metoda ma predpis
xk+1 = xk −(gk−1, gk−1)
(Agk−1, gk−1)gk .
≈ metoda nejvetsıho spadu s opozdenım.
Rekurentnı vypocet gradientu
Navıc
gk+1 = Axk+1 − b = A(xk −1
βkgk)− b = gk −
1
βkAgk .
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 21/28
Konvergence spadovych metod
Konvergence spadovych metod
gk = γk1 v1 + · · ·+ γkn vn
kde v1, . . . , vk je ortogonalnı baze z vlastnıch vektoru a γki jsousouradnice vektoru gk v teto bazi.Jelikoz
gk+1 = gk −1
βkAgk
pak dosazenım a upravou
∀i = 1, . . . , n : γk+1i =
(1− λi
βk
)γki
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 22/28
Konvergence spadovych metod
Konvergence spadovych metod
Dale take
‖gk‖22 = (gk , gk) =
(n∑
i=1
γki vi ,n∑
i=1
γki vi
)=
n∑i=1
(γki )2
limk→∞
‖gk‖2 = 0 ⇔ limk→∞
γki = 0, ∀i = 1, . . . , n
Proto chovanı funkcı(
1− λiβk
)je v otazkach konvergence klıcove.
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 23/28
Rychlost konvergence BB metody
Rychlost konvergence BB metody
Necht’ minimalizovana kvadraticka funkce f je ryze konvexnı.Necht’ {xk} je posloupnost generovana metodou BB. Pak
existuje konecne k takove, ze gk = 0
posloupnost {‖gk‖2} konverguje k nule R-linearne, konkretne
‖gk‖2 ≤(
1
2
)k
.C‖g1‖2 ,
kde C ∈ R je konstanta zavisla na λmax a λmin.
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 24/28
Numericke testy
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 25/28
Numericky test 1 - jednoducha matice
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 26/28
Numericky test 2 - 3D elektrostatika (Doc. Lukas)
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 27/28
Dekujeme za pozornost
J. Barzilai, J. M. Borwein: Two point step size gradientmethods. IMA Journal of Numerical Analysis, 8:141-148, 1988.
M. Raydan: Convergence properities of the Barzilai andBorwein gradient method. Rice University, 1991.
Dai, Y.H., Liao, L.-Z.: R-linear convergence of the Barzilai andBorwein gradient method. IMA J. Numer.Anal. 26, 1–10(2002)
L. Pronzato, A. Zhigljavsky: Gradient algorithm for quadraticoptimization with fast convergence rates. Springer, 2010.
Ing. Lukas Pospısil, Ing. Martin Mensık — Faster Gradient Descent Methods 28/28