18-660: numerical methods for engineering design and

24
Slide 1 18-660: Numerical Methods for Engineering Design and Optimization Xin Li Department of ECE Carnegie Mellon University Pittsburgh, PA 15213

Upload: others

Post on 04-Nov-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 18-660: Numerical Methods for Engineering Design and

Slide 1

18-660: Numerical Methods for Engineering Design and Optimization

Xin Li Department of ECE

Carnegie Mellon University Pittsburgh, PA 15213

Page 2: 18-660: Numerical Methods for Engineering Design and

Slide 2

Overview

Unconstrained Optimization Gradient method Newton method

Page 3: 18-660: Numerical Methods for Engineering Design and

Slide 3

Unconstrained Optimization

Linear regression with regularization

Unconstrained optimization: minimizing a cost function without any constraint Golden section search Downhill simplex method Gradient method Newton method

BA =α 1

2

2min αλαα

⋅+− BA

Non-derivative method

Rely on derivatives (this lecture)

Page 4: 18-660: Numerical Methods for Engineering Design and

Slide 4

Gradient Method

If a cost function is smooth, its derivative information can be used to search optimal solution

x

f(x)

( )xfx

min

Page 5: 18-660: Numerical Methods for Engineering Design and

Slide 5

Gradient Method

For illustration purpose, we start from a one-dimensional case

( )xfx

min

( ) ( ) ( ) ( )

( )

( )( )01 >⋅−=−=∆ + k

x

kkkk

kdxdfxxx λλ

Step size Derivative

( ) ( ) ( )

( )

( )( )01 >⋅−=+ k

x

kkk

kdxdfxx λλ

Page 6: 18-660: Numerical Methods for Engineering Design and

Slide 6

Gradient Method

One-dimensional case (continued) ( ) ( ) ( ) ( )

( )

( )[ ] ( )[ ]( )

( )k

x

kk

x

kkkk xdxdfxfxf

dxdfxxx

kk

∆⋅+≈⋅−=−=∆ ++ 11 &λ

( )[ ] ( )[ ]( )

( )

( )

⋅−⋅+≈+

kk x

k

x

kk

dxdf

dxdfxfxf λ1

( )[ ] ( )[ ] ( )

( )

21

kx

kkk

dxdfxfxf ⋅−≈+ λ

The cost function f(x) keeps decreasing if the derivative is non-zero

λ(k) > 0

Page 7: 18-660: Numerical Methods for Engineering Design and

Slide 7

Gradient Method

One-dimensional case (continued)

0=⋅−=∆ dxdfx λ

x

f(x)

Derivative is zero at local optimum (gradient method converges)

Page 8: 18-660: Numerical Methods for Engineering Design and

Slide 8

Gradient Method

Two-dimensional case ( )21,

,min21

xxfxx

( )

( )

( ) ( )

( ) ( )( ) ( ) ( )[ ]kkk

kk

kk

k

k

xxfxxxx

xx

212

12

11

1

2

1 ,∇⋅−=

−−

=

∆∆

+

+

λ

( )

∂∂∂∂

=∇2

121, xf

xfxxf

( )

( )

( )

( )( ) ( ) ( )[ ]kkk

k

k

k

k

xxfxx

xx

212

11

2

11 ,∇⋅−

=

+

+

λ

Page 9: 18-660: Numerical Methods for Engineering Design and

Slide 9

Gradient Method

Two-dimensional case (continued)

The cost function f(x1, x2) keeps decreasing if the gradient is non-zero

λ(k) > 0

( )

( )( ) ( ) ( )[ ]

( ) ( )[ ] ( ) ( )[ ] ( ) ( )[ ]( )

( )

∆∆

⋅∇+≈

∇⋅−=

∆∆

++k

kTkkkkkk

kkkk

k

xx

xxfxxfxxf

xxfxx

2

12121

12

11

212

1

,,,

( ) ( )[ ] ( ) ( )[ ] ( ) ( ) ( )[ ] ( ) ( )[ ]kkTkkkkkkk xxfxxfxxfxxf 2121211

21

1 ,,,, ∇⋅∇⋅−≈++ λ

( ) ( )[ ] ( ) ( )[ ] ( ) ( ) ( )[ ] 2

221211

21

1 ,,, kkkkkkk xxfxxfxxf ∇⋅−≈++ λ

Page 10: 18-660: Numerical Methods for Engineering Design and

Slide 10

Gradient Method

N-dimensional case ( )Xf

Xmin

( )

∂∂∂∂

=∇

2

1

xfxf

Xf

( ) ( ) ( ) ( )[ ]kkkk XfXX ∇⋅−=+ λ1

Page 11: 18-660: Numerical Methods for Engineering Design and

Slide 11

Newton Method

Gradient method relies on first-order derivative information Each iteration is simple, but it converges to optimum slowly I.e., require a large number of iteration steps

The step size λ(k) can be optimized by one-dimensional search

for fast convergence

Alternative algorithm: Newton method Rely on both first-order and second-order derivatives Each iteration is more expensive But it converges to optimum more quickly, i.e., requires a

smaller number of iterations to reach convergence

( )( ) ( ) ( )[ ]{ }kkk XfXf

k∇⋅−λ

λmin

Page 12: 18-660: Numerical Methods for Engineering Design and

Slide 12

Newton Method

One-dimensional case ( )xf

xmin

( ) ( ) ( )

( ) ( )[ ]kk

xxx

xxdx

fddxdf

dxdf

kkk

−⋅+≈ +

+

12

2

1

( ) ( )

( ) ( )[ ]kk

xx

xxdx

fddxdf

kk

−⋅+= +12

2

0

First-order derivative is zero at local optimum

Page 13: 18-660: Numerical Methods for Engineering Design and

Slide 13

Newton Method

One-dimensional case (continued) ( ) ( )

( ) ( )[ ] 012

2

=−⋅+ + kk

xx

xxdx

fddxdf

kk

( ) ( ) ( )

( ) ( )kk xx

kkk

dxdf

dxfdxxx ⋅−=−=∆

+

1

2

21

( ) ( )

( ) ( )kk xx

kk

dxdf

dxfdxx ⋅−=

+

1

2

21

Page 14: 18-660: Numerical Methods for Engineering Design and

Slide 14

Newton Method

One-dimensional case (continued)

( )

( ) ( )

( )[ ] ( )[ ]( )

( )k

x

kk

xx

k xdxdfxfxf

dxdf

dxfdx

kkk

∆⋅+≈⋅−=∆ +

11

2

2

&

( )[ ] ( )[ ]( ) ( ) ( )

⋅−⋅+≈

+

kkk xxx

kk

dxdf

dxfd

dxdfxfxf

1

2

21

( )[ ] ( )[ ]( ) ( )

21

2

21

kk xx

kk

dxdf

dxfdxfxf ⋅−≈

+

Positive (convex function)

Page 15: 18-660: Numerical Methods for Engineering Design and

Slide 15

Newton Method

One-dimensional case (continued)

Newton method gives an estimation of the optimal step size λ(k) using second-order derivative

( ) ( )

( ) ( )kk xx

kk

dxdf

dxfdxx ⋅−=

+

1

2

21( ) ( ) ( )

( )kx

kkk

dxdfxx ⋅−=+ λ1

Gradient method Newton method

( )

( )

01

2

2

>=−

kx

k

dxfdλ (convex function)

Page 16: 18-660: Numerical Methods for Engineering Design and

Slide 16

Newton Method

One-dimensional case (continued) The step size estimation is based on the following linear

approximation for first-order derivative

The approximation is exact if and only if f(x) is quadratic and, therefore, df/dx is linear

( ) ( ) ( )

( ) ( )[ ]kk

xxx

xxdx

fddxdf

dxdf

kkk

−⋅+≈ +

+

12

2

1

( ) baxdx

dfcbxaxxf +=⇒++= 22

Page 17: 18-660: Numerical Methods for Engineering Design and

Slide 17

Newton Method

One-dimensional case (continued) If the actual f(x) is not quadratic, the following step size

estimation may be non-optimal

Using this step size may result in bad convergence

In these cases, a damping factor β is typically introduced

( )

( )

1

2

2 −

=kx

k

dxfdλ

( ) ( )

( ) ( )( )10

1

2

21 <<⋅⋅−=

+ ββkk xx

kk

dxdf

dxfdxx

Page 18: 18-660: Numerical Methods for Engineering Design and

Slide 18

Newton Method

Two-dimensional case

( ) ( )[ ] ( ) ( )[ ] ( ) ( )[ ]( ) ( )

( ) ( )

−−

⋅∇+∇≈∇+

+++

kk

kkkkkkkk

xxxx

xxfxxfxxf2

12

11

121

221

12

11 ,,,

( )21,,min

21

xxfxx

( ) ( )

∂∂∂∂∂∂∂∂∂∂

=∇

∂∂∂∂

=∇ 22

221

221

221

2

212

2

121 ,&,

xfxxfxxfxf

xxfxfxf

xxf

Hessian matrix

Page 19: 18-660: Numerical Methods for Engineering Design and

Slide 19

Newton Method

Two-dimensional case (continued)

( ) ( )[ ] ( ) ( )[ ] ( ) ( )[ ]( ) ( )

( ) ( ) 0,,,2

12

11

121

221

12

11 =

−−

⋅∇+∇≈∇+

+++

kk

kkkkkkkk

xxxx

xxfxxfxxf

( )

( )

( ) ( )

( ) ( )( ) ( )[ ] ( ) ( )[ ]kkkk

kk

kk

k

k

xxfxxfxxxx

xx

211

212

21

2

11

11

2

11 ,, ∇⋅−∇=

−−

=

∆∆ −

+

+

+

+

( )

( )

( )

( )( ) ( )[ ] ( ) ( )[ ]kkkk

k

k

k

k

xxfxxfxx

xx

211

212

2

11

2

11 ,, ∇⋅∇−

=

+

+

Page 20: 18-660: Numerical Methods for Engineering Design and

Slide 20

Newton Method

Two-dimensional case (continued)

Positive definite (convex function)

( )

( )( ) ( )[ ] ( ) ( )[ ]

( ) ( )[ ] ( ) ( )[ ] ( ) ( )[ ]( )

( )

∆∆

⋅∇+≈

∇⋅−∇=

∆∆

++

k

kTkkkkkk

kkkkk

k

xx

xxfxxfxxf

xxfxxfxx

2

12121

12

11

211

212

2

1

,,,

,,

( ) ( )[ ] ( ) ( )[ ] ( ) ( )[ ] ( ) ( )[ ] ( ) ( )[ ]kkkkTkkkkkk xxfxxfxxfxxfxxf 211

212

21211

21

1 ,,,,, ∇⋅∇⋅∇−≈−++

Page 21: 18-660: Numerical Methods for Engineering Design and

Slide 21

Newton Method

Two-dimensional case (continued)

Gradient method and Newton method do not move along the same direction

Gradient method Newton method

( ) ( ) ( ) ( )[ ]kkkk xxfX 21 ,∇⋅−=∆ λ ( ) ( ) ( )[ ] ( ) ( )[ ]kkkkk xxfxxfX 211

212 ,, ∇⋅−∇=∆

x1

x2 Gradient

Newton

( ) CXBAXXXf TT ++=

Page 22: 18-660: Numerical Methods for Engineering Design and

Slide 22

Newton Method

Newton method can be extended to high-dimensional cases

The following Hessian matrix is N×N if we have N variables

Numerically computing the Hessian matrix and its inverse (by Cholesky decomposition) can be quite expensive for large N

( )

∂∂∂∂∂∂∂∂

∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂

=∇

222

21

2

222

22

212

12

2122

12

2

NNN

N

N

xfxxfxxf

xxfxfxxfxxfxxfxf

Xf

( ) ( ) ( )[ ] ( )[ ]kkkk XfXfXX ∇⋅∇−=−+ 121

Page 23: 18-660: Numerical Methods for Engineering Design and

Slide 23

Newton Method

A number of modified algorithms were developed to address this complexity issue E.g., quasi-Newton method

The key idea is to approximate the Hessian matrix and its

inverse so that: The computational cost is significantly reduced Fast convergence can still be achieved

More details can be found at

Numerical Recipes: The Art of Scientific Computing, 2007

Page 24: 18-660: Numerical Methods for Engineering Design and

Slide 24

Summary

Unconstrained Optimization Gradient method Newton method