lecture 16: switched linear quadratic regulationjianghai/teaching/ece695/lec_16.pdf · switched...

Lecture 16: Switched Linear QuadraticRegulation

1 / 37

Optimal Control of D-T Nonswitched Systems

A discrete-time controlled dynamical system

xk+1 = f (xk , uk), k = 0, 1, . . .

Problem: Given a time horizon k ∈ N ∪ {∞}, find the optimal inputsequence u = (u0, . . . , uN−1) that minimizes

J(u) =N−1∑k=0

`(xk , uk) + φ(xN)

• Running cost `(xk , uk) ≥ 0 and terminal cost φ(xN) ≥ 0

• State constraint x ∈ X and control constraint u ∈ U

2 / 37

D-T Linear Quadratic Regulation

A discrete-time LTI system with given initial condition x0:

xk+1 = Axk + Buk

yk = Cxk + Duk

Problem: find optimal input sequence u = (u0, . . . , uN−1) that minimizes

J(u) =N−1∑k=0

(xTk Qxk + uTk Ruk

)︸︷︷︸

running cost

+ xTN Qf xN︸︷︷︸terminal cost

• State weight matrix Q = QT � 0

• Control weight matrix R = RT � 0 (no free control)

• Terminal state weight matrix Qf = QTf � 0

3 / 37

Switched Linear Quadratic Regulation

Problem: For the switched linear system

xk+1 = Aσkxk + Bσkuk

yk = Cσkxk + Dσkuk

Find the optimal input sequence (u0, . . . , uN−1) and mode sequence(σ0, . . . , σN−1) that minimize the cost function

N−1∑k=0

(xTk Qσkxk + uTk Rσkuk

)+ xTN Qf xN

• Both dynamics and costs are different in different modes

• Special case: no continuous input (Bi = 0, Di = 0 for all i)

• Challenge: number of mode sequences increases exponentially with N

4 / 37

Example: Energy Efficient Building Control

Zone3

Zone2

Zone11clQ

1htQ

2clQ

2htQ

3clQ

3htQ

• Goal:

1 Maintain zone temperatures near setpoints (loss-of-productivity cost)2 Less energy consumption (energy cost)

• Control: cooling/heating equipments (some on/off only)

5 / 37

Solution Roadmap

• Solve LQR problem using dynamic programming method

• Extend the method to solve SLQR problem

• Complexity reduction techniques

LQR as a constrained optimization problem:

Minimize

[N−1∑k=0

(xTk Qxk + uTk Ruk

)+ xTN Qf xN

]subject to xk+1 = Axk + Buk , k = 0, . . . ,N − 1

x0 fixed

6 / 37

Direct Solution

LQR as a least square problem:

Minimize xT

Q

Q. . .

Qf

︸︷︷︸

Q

x + uT

R

R. . .

R

︸︷︷︸

R

u

such that

x1x2...xN

︸︷︷︸

x

=

B 0 · · ·AB B 0 · · ·

......

. . .

AN−1B AN−2B · · · B

︸︷︷︸

G

u0u1...

uN−1

︸︷︷︸

u

+

AA2

...AN

︸︷︷︸

H

x0

Optimal control sequence is u∗ = −(R + GTQG)−1GTQHx0

• Not numerically viable for large N (e.g., N =∞)

7 / 37

Dynamical Programming Approach

Idea: Solve a sequence of optimal control problems over time horizons{t, . . . ,N}, for t = N,N − 1, . . . , 0

• Value function at time t is the optimal cost over horizon {t, . . . ,N}:

Vt(x) = minut ,ut+1,...,uN−1

N−1∑k=t

(xTk Qxk + uTk Ruk

)+ xTN Qf xN

with the initial condition xt = x

• Value function iteration: Vt−1(·) can be computed based on Vt(·)

• Optimal cost of original problem is V0(x0)

• Optimal input sequence can be recovered from value functions

8 / 37

Motivating Example

• Start from point A

• End at point B

• Each step only move right

• Cost labeled on each edge

Problem: find the path from A to B with the least cost

• For `-by-` grid, total number of legal paths is (2`)!(`!)2

9 / 37

Value Function

Value function V(z) at node z is the least cost to reach B from z

Value function iteration: V (z) satisfies

V (z) = min{wu + V (z ′u),wd + V (z ′d)}

Principle of Optimality: If x∗0 = z → x∗1 → x∗2 → · · · → B is a least-cost pathfrom z to B, then any truncation of it is also a least-cost pathe.g., x∗1 → x∗2 → · · · → B is a least-cost path from x∗1 to B

Reduced computational complexity: for `-by-` grid, only need to compute `2

value functions, each with fixed complexity

10 / 37

Iteration Results

• Stage 1: compute the value functions from right to left

• Stage 2: recover the least-cost path from left to right

11 / 37

Value Functions of LQR Problem

Value function at time t ∈ {0, 1, . . . ,N} and state x ∈ Rn is

Vt(x) = minut ,ut+1,...,uN−1

{N−1∑k=t

(xTk Qxk + uTk Ruk

)+ xTN Qf xN

∣∣∣∣ xt = x

}• Vt(x) is the optimal cost from time t to N starting from xt = x

• V0(x0) is the optimal cost of the original LQR problem

Bellman equation: (value function iteration)

Vt(x) = minut=v

[xTQx + vTRv︸︷︷︸current running cost

+Vt+1(Ax + Bv)︸︷︷︸cost to go

]• Principle of Optimality: cost-to-go from next state xt+1 is also optimal

12 / 37

Value Function Iteration

• Value function at time N is quadratic: VN(x) = xTQf x

• Suppose Vt+1(x) = xTPt+1x is quadratic, then

Vt(x) = minv

[xTQx + vTRv + Vt+1(Ax + Bv)

]= min

v

[xv

]T [Q + ATPt+1A ATPt+1BBTPt+1A R + BTPt+1B

] [xv

]= xTPtx

where Pt is the Schur complement of R + BTPt+1B in the block matrix:

Pt := ρ(Pt+1) = Q + ATPt+1A− ATPt+1B(R + BTPt+1B)−1BTPt+1A︸︷︷︸Riccati mapping

• Minimum achieved by the optimal control

u∗t = − (R + BTPt+1B)−1BTPt+1A︸︷︷︸Kalman gain

x = −Ktx

13 / 37

Useful Properties of Riccati Mapping

Riccati mapping ρ : S+ → S+ maps PSD matrices to PSD matrices:

Pt = Q + ATPt+1A− ATPt+1B(R + BTPt+1B)−1BTPt+1A

• derived from xTPtx = minv

[xv

]T [Q + ATPt+1A ATPt+1B

BTPt+1 R + BTPt+1B

] [xv

]

• ρ is PSD-monotone:

P � P ′ implies ρ(P) � ρ(P ′)

• ρ is PSD-concave, i.e., for P,P ′ ∈ S+ and θ ∈ [0, 1]

ρ(θP + (1− θ)P ′

)� θ · ρ(P) + (1− θ) · ρ(P ′)

“Kalman filtering with intermittent observations,” Sinopoli, B. et al, IEEE Trans.Automatic Control, vol. 49, no. 9, pp. 1453-1464, 2004.

14 / 37

LQR Solution Algorithm

Stage 1: Compute the value functions backward in time:

Set PN = Qf

for t = N − 1, . . . , 1, 0 doPt = Q + ATPt+1A− ATPt+1B(R + BTPt+1B)−1BTPt+1A

end for

Return xT0 P0x0 as the optimal cost

Stage 2: Compute the optimal controls and state solution forward in time:

Set x∗0 = x0for t = 0, 1, . . . ,N − 1 dou∗t = −(R + BTPt+1B)−1BTPt+1Ax

∗t

x∗t+1 = Ax∗t + Bu∗tend for

Return u∗t and x∗t as the optimal control and state sequences

15 / 37

Inverted Pendulum: Model

States x =[z θ z θ

]T:

• θ: angle of pendulum

• θ: angular velocity of pendulum

• z : horizontal position of cart

• z : velocity of cart

Linearized dynamics near xe =[0 0 0 0

]T(sampled at T = 0.5 s):

xk+1 =

1 0.1054 0.06128 0.016880 5.753 −1.859 1.1540 0.5287 −0.1617 0.10370 27.31 −9.328 5.665

︸︷︷︸

A

xk +

0.057630.24420.15261.225

︸︷︷︸

B

uk

16 / 37

Inverted Pendulum: Stabilization

x0 =[−1 0 0 0

]TGoal: Find u0, . . . , uN−1 to minimize

J = α

N∑k=1

‖xk‖2 + β

N−1∑k=0

|uk |2

• LQR formulation: Q = Qf = αI , R = β

• Stabilization with energy consideration

17 / 37

Inverted Pendulum: Solutions

18 / 37

Infinite Horizon Case (N =∞)

• If (A,B) is stabilizable, then Riccati iteration will converge to asolution Pss of the Algebraic Riccati Equation (ARE):

Pss = Q + ATPssA− ATPssB(R + BTPssB)−1BTPssA

• If further (Q,A) is detectable, then Pss is unique, and under thesteady-state optimal control gain

Kss = (R + BTPssB)−1BTPssA,

the closed-loop system Acl = A− BKss is stable

• Hence stabilization can be studied via LQR

“On the discrete time matrix Riccati equation of optimal control,” P.E. Caines andD.Q. Mayne, Int. J. Control, vol. 12, no. 5, pp. 785-794, 1970.

19 / 37

Switched LQR Problem

Find control sequence u0, . . . , uN−1 and mode sequence σ0, . . . , σN−1 to

minimize

[N−1∑k=0

(xTk Qσkxk + uTk Rσkuk

)+ xTN Qf xN

]subject to xk+1 = Aσkxk + Bσkuk , k = 0, . . . ,N − 1, and fixed x0

Value function at each t = 0, 1, . . . ,N and x is the optimal cost overhorizon {t, . . . ,N} assuming xt = x

Vt(x) = minσt ,...,σN−1ut ,...,uN−1

N−1∑k=t

(xTk Qσkx

Tk + uTk Rσkuk

)+ xTN Qf xN

• V0(x0) is the optimal cost of the original SLQR problem

• Vt(x) does not depend on mode (why?)

20 / 37

SLQR Bellman Equation

Value functions can be obtained from the iteration

Vt(x) = minσt=σut=v

[xTQσx + vTRσv + Vt+1(Aσx + Bσv)

]• Optimal control and mode are the ones achieving minimum above

• Optimal state feedback switching policy σ∗t (x)

• Optimal state feedback control policy u∗t (x)

• Value functions are in general not quadratic

“On the value functions of the discrete-time switched LQR problem,” W. Zhang, J. Hu,and A. Abate, TAC 2009.

21 / 37

Value Function at t = N − 1

VN−1(x) = minσ

minv

[xTQσx + vTRσv + VN(Aσx + Bσv)

]︸︷︷︸

Bellman equation of σ-th subsystem

= minσ

[xTρσ(Qf )x

]Here, ρσ is the Riccati mapping of subsystem (Aσ,Bσ) with weights Qσ,Rσ

22 / 37

Optimal Control at t = N − 1

VN−1(x) is pointwise minimum of a number of quadratic functions

VN−1(x) = minP∈PN−1

xTPx

where PN−1 = {ρ1(Qf ), . . . , ρM(Qf )} := ρM(Qf )

• State space partitioned into sectors (cones)

• Each sector has an optimal mode and aKalman gain

• May have multiple sectors with differentKalman gains for a mode

23 / 37

General t Case

Suppose Vt+1(x) = minP∈Pt+1 xTPx for some set Pt+1 ⊂ Sn+

At time t, the value function Vt(x) is given by

Vt(x) = minP∈Pt

xTPx

where Pt is obtained from Pt+1 by switched Riccati mapping:

Pt = ρM(Pt+1) := {ρi (P) |P ∈ Pt+1, i ∈M} ⊂ Sn+

24 / 37

SLQR Solution Algorithm

Stage 1: Compute the value functions backward in time:

Set PN = {Qf }for t = N − 1, . . . , 1, 0 doPt = ρM(Pt+1)

end for

Return minP∈P0

xT0 Px0 as the optimal cost

Stage 2: Compute optimal mode/control/state sequences forward in time:

Set x∗0 = x0for t = 0, 1, . . . ,N − 1 do

(σ∗t , u∗t ) = arg min

σ,v

[(x∗t )TQσx

∗t + vTRσv + Vt+1(Aσx

∗t + Bσv)

]x∗t+1 = Aσ∗t x

∗t + Bσ∗t u

∗t

end for

Return σ∗t , u∗t and x∗t as the optimal mode/control/state sequences

25 / 37

Complexity Reduction

Issue: Number of matrices in Pt grows exponentially

In the set Pt defining the value function Vt(x) = minP∈Pt xTPx

• Matrix P ∈ Pt is called effective if for at least one x 6= 0

xTPx < xTP ′x , ∀P ′ ∈ Pt \ {P}

• Otherwise P is called redundant and can be discarded without loss(due to monotonicity of Riccati mapping)

Sufficient condition for P ∈ Pt to be redundant:

P � a convex combination of matrices in Pt \ {P}

• Checked via an LMI feasibility problem

26 / 37

Decision Tree Pruning

27 / 37

Example

Switched LQR problem specified by

A1 =

[2 00 2

], B1 =

[12

], A2 =

[1.5 10 1.5

], B2 =

[10

]Qσ = I , Rσ = 1, N = 20

0 0.5 1 1.5 2 2.5 3 3.50

5

10

15

20

25

30

16 matrices in P16

28 / 37

Further Reduction by Relaxation

Matrix P in Pt is called ε-redundant if P + εI is redundant

• ε > 0 is a small relaxation parameter

• A sufficient condition is given by

P + εI � a convex combination of matrices in Pt \ {P}

• Oftentimes a small ε could result in significant reduction

“Infinite horizon switched LQR problems in discrete time: A suboptimal algorithm withperformance analysis,” W. Zhang, J. Hu and A. Abate, TAC 2012.

29 / 37

Example

A1 =

[2 10 1

], B1 =

[11

], A2 =

[2 10 0.5

], B2 =

[12

]Q1 = Q2 = I , R1 = R2 = 1, Qf = I , N = 100

k |PN−k |1 2

2 4

3 5

4 5

5 5

6 5

30 / 37

Example (cont.)Optimal switching policy (Gray region: mode 1 optimal; Black region: mode 2 optimal)

31 / 37

Another Example

A1 =

[2 00 2

], B1 =

[12

]A2 =

[1.5 10 1.5

], B2 =

[10

]Qσ = I , Rσ = 1, N = 20

0 5 10 15 2010

0

102

104

106

108

1010

1012

# of steps left

Com

plex

ity

|Hk|

|Hk|

|Hǫk|

36014

1012

• Without any reduction, complexity grows exponentially

• With reduction, complexity saturates at 360 matrices

• With relaxation (ε = 10−3), complexity saturates at 14 matrices

32 / 37

SLQR Problem with Switching Cost

Cost function to be minimized:

N−1∑k=0

xTk Qσkxk + uTk Rσkuk + w(σk ,σk+1)(xk)︸︷︷︸switching cost

+ xTN Qf xN

Value function Vt(σ, x) is the optimal cost-to-go starting from xt = xwith previous mode being σt−1 = σ

Bellman equation:

Vt(σ, x) = minσt=σ′ut=v

[xTQσ′x + vTRσ′v + w(σ,σ′)(x) + Vt+1(σ′,Aσ′x + Bσ′v)

]• Optimal switching policy σ∗

t (σ, x) and optimal control u∗t (σ, x)

• Both depend on previous mode σ and current state x

• Same technique if w(σ,σ′)(·) is quadratic, linear, or constant,

33 / 37

Continuous-Time LQR Problem

For the continuous-time LTI system

x = Ax + Bu, x(0) = x0

Find the control input u(t) over the time horizon [0, tf ] to

minimize

∫ tf

0

(xTQx + uTRu

)dt + x(tf )TQf x(tf )

• Q = QT � 0, R = RT � 0, Qf = QTf � 0

Value function Vt(x) is the optimal cost-to-go at time t from state x :

Vt(x) = minu(s), s∈[t,tf ]

∫ tf

t

[x(s)TQx(s)T + u(s)TRu(s)

]ds + x(tf )TQf x(tf )

• Value functions are still quadratic: Vt(x) = xTP(t)x , t ∈ [0, tf ]

34 / 37

Continuous-Time Bellman Equation

Vt(x) ' minu(·)≡w

δ(xTQx + wTRw)︸︷︷︸cost during [t, t + δ]

+Vt+δ(x + δ(Ax + Bw))︸︷︷︸cost-to-go from time t + δ

As δ → 0, we obtain the Riccati (matrix) differential equation:

−P(t) = Q + P(t)A + ATP(t)− P(t)BR−1BTP(t), P(tf ) = Qf

The optimal control is a linear state feedback controller:

u∗t = −R−1BTP(t)x

35 / 37

Continuous-Time SLQR Problem

For the continuous-time switched linear system

x = Aσx + Bσu, x(0) = x0

Find the mode σ(t) ∈M and input u(t) over the time horizon [0, tf ] to

minimize

∫ tf

0

(xTQσx + uTRσu

)dt + x(tf )TQf x(tf )

• Qσ = QTσ � 0, Rσ = RT

σ � 0, Qf = QTf � 0

Value function Vt(x) is the optimal cost-to-go at time t from state x :

Vt(x)= minu(s),σ(s)

∫ tf

t

[x(s)TQσ(s)x(s)T +u(s)TRσ(s)u(s)

]ds + x(tf )TQf x(tf )

36 / 37

Value Functions of C.-T. SLQR Problem

The value function Vt(x) is still of the form

Vt(x) = infP∈P(t)

xTPx

• |P(t)| =∞ as infinite switchings possible in a short time interval

• P(t) can be computed from the Riccati differential inclusion

−P(t) ∈ {Qλ + P(t)Aλ + ATλP(t)− P(t)BλR

−1λ BT

λ P(t) : λ ∈ S}

where Aλ,Bλ,Qλ,Rλ is any convex combination of Ai ,Bi ,Qi ,Ri for i ∈M

In general, P(t) is very difficult to compute analytically and numerically

• Impose dwell-time or maximum switching times constraints

• Discretize the C.-T. SLS into D.-T. SLS

37 / 37

lecture 16: switched linear quadratic regulationjianghai/teaching/ece695/lec_16.pdf · switched...

Documents