a robust accelerated optimization algorithm for strongly convex … · 2020. 12. 15. · saman...

A robust accelerated optimization algorithm for

strongly convex functions

Saman Cyrus Bin Hu Bryan Van Scoy Laurent Lessard

University of Wisconsin–Madison

June 27, 2018

Iterative algorithms

Unconstrained optimization with f strongly convex.

minx∈Rn

f (x)

mI � ∇2f (x) � LI , and define κ = Lm

(condition number)

• Gradient method:xk+1 = xk − α∇f (xk)

• Nesterov’s accelerated gradient method (AGM)yk = xk + β(xk − xk−1)

xk+1 = yk − α∇f (yk)

2

Iteration complexity

100 101 102 103

100

101

102

103

Condition number κ

Iterationsto

convergence

Gradient Method α = 1L

Gradient Method α = 2m+L

Nesterov’s AGM

Triple Momentum

Theoretical Lower Bound

κ log 1ε

√κ log 1

ε

3

Noise robustness

• ∇f is inexact/approximate.

• Many different noise models have been studied.

• We use relative deterministic noise:

‖∇fnoisy −∇fexact‖‖∇fexact‖

≤ δ

• Ex: round-off error.

4

Performance in the presence of noise

Gradient method: slow and robust

100 101 102 10310−1

100

101

102

103

Iterationsto

convergence

Iterations for different δδ ∈ {.01, .02, .05, .1, .2, .5}Gradient method

Condition number κ

Nesterov’s AGM: fast and fragile

100 101 102 103 10410−1

100

101

102

103

Iterationsto

convergence

Iterations for different δδ ∈ {.05, .1, .2, .3, .4, .5}Nesterov (quadratic)

27

Condition number κ

Simulated by solving a small SDP(Lessard et al., SIAM Journal on Opt., 2016)

5

Proposed algorithm: Robust momentum method

Iteration update:

xk+1 = xk + β(xk − xk−1)− α∇f (yk)yk = xk + γ(xk − xk−1)

with parameters:

α =κ(1− ρ)2(1 + ρ)

L, β =

κρ3

κ− 1, γ =

ρ3

(κ− 1)(1− ρ)2(1 + ρ)

single tuning parameter ρ:

1− 1√κ︸︷︷︸

fast + fragile

≤ ρ ≤ 1− 1κ︸︷︷︸

slow + robust

6

Control interpretation

Nesterov’s AGM:

xk+1 = yk − α∇f (yk)yk = xk + β(xk − xk−1)

G

∇fyu

G =

[xk+1xk

]=

[1 + β −β

1 0

] [xkxk−1

]+

[−α0

]uk

yk =[1 + β −β

] [ xkxk−1

]

7

Frequency domain condition

Transfer function for the linear part:

G (z) = −α (1 + γ)z − γ(z − 1)(z − β)

Sufficient condition for linear convergence: xk − x? = O(ρk)

Re

[(ρz−1 − 1) 1− LG (ρz)

1−mG (ρz)

]< 0 for all |z | = 1

• combination of Circle Criterion and Zames-Falb multipliers

8

Gradient method

−1 −0.5 0 0.5 1−1.5

−1

−0.5

0

0.5

1

1.5

real part

imaginarypart

ρ = 1

ρ = 0.98

ρ = 0.95

ρ = Opt.

ρ = 1

ρ = 0.98

ρ = 0.95

ρ = Opt.

α = 1L

α = 2m+L

9

Idea: add a margin to improve robustness

Re

[(ρz−1 − 1) 1− LG (ρz)

1−mG (ρz)

]+ ν ≤ 0 for all |z | = 1

• ν > 0 is a robustness margin• tune α, β, γ to obtain a vertical line at −ν.

10

Robust Momentum Method

−1 −0.5 0 0.5 1−1

−0.5

0

0.5

1

real part

imaginarypart

ρ = 0.90

ρ = 0.85

ρ = 0.80

ρ = Opt.

ρ = 0.90

ρ = 0.85

ρ = 0.80

ρ = Opt.

11

Nesterov’s AGM — not a line!

−10 −8 −6 −4 −2 0−10

−5

0

5

10

real part

imaginarypart

ρ = 0.80

ρ = 0.77

ρ = 0.76

ρ = Opt.

12

Trade-off: Performance vs noise

0 0.2 0.4 0.6 0.8 11− 1√

κ

κ− 1κ+ 1

1− 1κ

1

Noise strength (δ)

Con

vergence

rate

(ρ)

GM (α = 1/L)

GM (α = 2/(L+m))

GM (min α ∈ [0, 2/L])Nesterov’s AGM

RMM (ν = 0)

RMM (ν = 1/3)

RMM (ν = 2/3)

RMM (min ν ∈ [0, 1))

13

Simulation with relative gradient noise

f (x) = x21 + 10x22

0 50 10010−10

10−5

100

105

iteration

δ = 0

RMM (ν = 0)

RMM (ν = 0.55)

AGM

0 50 100

iteration

δ = 0.25

0 50 100

iteration

δ = 0.5

14

Thank you

a robust accelerated optimization algorithm for strongly convex … · 2020. 12. 15. · saman...

Documents