4/1: search methods and heuristics progression: sapa (tlplan; ff) regression: tp4 partial order:...

33
4/1: Search Methods and Heuristics Progression: Sapa (TLPlan; FF) Regression: TP4 Partial order: Zeno (IxTET)

Post on 19-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

4/1: Search Methods and Heuristics

Progression: Sapa (TLPlan; FF) Regression: TP4 Partial order: Zeno (IxTET)

Reading List

(3/27)Papers on Metric Temporal Planning Paper on PDDL-2.1 standard (read up to--not including-

-section 6)

Paper on SAPA Paper on Temporal TLPlan

(see Section 3 for a slightly longer description of the progression search used in SAPA). (regression search for Temporal Planning

Paper on TP4 (regression search for Temporal Planning Paper on Zeno (Plan-space search for Temporal Planning)

State-Space Search:Search is through time-stamped states

Search states should have information about -- what conditions hold at the current time slice (P,M below) -- what actions have we already committed to put into the plan (,Q below)

S=(P,M,,Q,t)

Set <pi,ti> of predicates pi and thetime of their last achievement ti < t.

Set of functions represent resource values.

Set of protectedpersistent conditions(could be binary or resource conds).

Event queue (contains resource as wellAs binary fluent events).

Time stamp of S.

In the initial state, P,M, non-empty Q non-empty if we have exogenous events

(:durative-action cross_cellar:parameters ():duration (= ?duration 10):condition (and (at start have_light)

(over all have_light)(at start at_steps))

:effect (and (at start (not at_steps)) (at start crossing)(at end at_fuse_box)

)

Let current state S be P:{have_light@0; at_steps@0}; Q:{~have_light@15} t: 0(presumably after doing the light-candle action) Applying cross_cellar to this state gives

S’= P:{have_light@0; crossing@0}; :{have_light,<0,10>} Q:{at_fuse-box@10;~have_light@15} t: 0

(:durative-action burn_match:parameters ():duration (= ?duration 15):condition: (and (at start have_match)

(at start have_strikepad)):effect (and (at start have_light)

(at end (not have_light)))

)

Light-match

Light-match

Cross-cellar

1510

Time-stamp

“Advancing” the clock as a device for concurrency control

To support concurrency, we need to consider advancing the clock How far to advance the clock?

One shortcut is to advance the clock to the time of the next earliest event event in the event queue; since this is the least advance needed to make changes to P and M of S.

At this point, all the events happening at that time point are transferred from Q to P and M (to signify that they have happened)

This This strategy will find “a” plan for every problem—but will

have the effect of enforcing concurrency by putting the concurrent actions to “align on the left end”

In the candle/cellar example, we will find plans where the crossing cellar action starts right when the light-match action starts

If we need slack in the start times, we will have to post-process the plan

If we want plans with arbitrary slacks on start-times to appears in the search space, we will have to consider advancing the clock by arbitrary amounts (even if it changes nothing in the state other than the clock time itself).

Light-match

Cross-cellar

~have-light

1510

In the cellar plan above, the clock,If advanced, will be advanced to 15,Where an event (~have-light will occur)This means cross-cellar can either be doneAt 0 or 15 (and the latter makes no sense)

Cross-cellar

Search Algorithm (cont.) Goal Satisfaction: S=(P,M,,Q,t) G if <pi,ti> G either:

<pi,tj> P, tj < ti and no event in Q deletes pi.

e Q that adds pi at time te < ti. Action Application: Action A is applicable in S if:

All instantaneous preconditions of A are satisfied by P and M.

A’s effects do not interfere with and Q. No event in Q interferes with persistent

preconditions of A. A does not lead to concurrent resource change

When A is applied to S: P is updated according to A’s instantaneous

effects. Persistent preconditions of A are put in Delayed effects of A are put in Q.

Flying

(in-city ?airplane ?city1)

(fuel ?airplane) > 0

(in-city ?airplane ?city1) (in-city ?airplane ?city2)

consume (fuel ?airplane)

Flying

(in-city ?airplane ?city1)

(fuel ?airplane) > 0

(in-city ?airplane ?city1) (in-city ?airplane ?city2)

consume (fuel ?airplane)

S=(P,M,,Q,t)

Search: Pick a state S from the queue. If S satisfies the goals, endElse non-deterministically do one of

--Advance the clock (by executing the earliest event in Qs

--Apply one of the applicable actions to S

[TLplan; Sapa; 2001]

Regression Search is similar…

In the case of regression over durative actions too, the main generalization we need is differentiating the “advancement of clock” and “application of a relevant action”

Can use same state representation S=(P,M,,Q,t) with the semantics that P and M are binary and resource

subgoals needed at current time point Q are the subgoals needed at earlier

time points are subgoals to be protected over

specific intervals We can either add an action to support

something in P or Q, or push the clock backward before considering subgoals

If we push the clock backward, we push it to the time of the latest subgoal in Q

TP4 uses a slightly different representation (with State and Action information)

[TP4; 1999]

A2:X

A3:W

A1:Y

Q

RWXy

We can either workOn R at tinf or R and QAt tinf-D(A3)

(:durative-action cross_cellar:parameters ():duration (= ?duration 10):condition (and (at start have_light)

(over all have_light)(at start at_steps))

:effect (and (at start (not at_steps)) (at start crossing)(at end at_fuse_box)

)

Let current state S be P:{at_fuse_box@0} t: 0

Regressing cross_cellar over this state gives

S’= P:{}; :{have_light,< 0 , -10>} Q:{have_light@ -10;at_stairs@-10} t: 0

(:durative-action burn_match:parameters ():duration (= ?duration 15):condition: (and (at start have_match)

(at start have_strikepad)):effect (and (at start have_light)

(at end (not have_light)))

)

Cross_cellar

Have_light

This example changed since the class

Notice that in contrast to progression,Regression will align the end points of Concurrent actions…(e.g. when we put inLight-match to support have-light)

(:durative-action cross_cellar:parameters ():duration (= ?duration 10):condition (and (at start have_light)

(over all have_light)(at start at_steps))

:effect (and (at start (not at_steps)) (at start crossing)(at end at_fuse_box)

)

S’= P:{}; :{have_light,< 0 , -10>} Q:{have_light@-10;at_stairs@-10} t: 0

If we now decide to support the subgoal in QUsing light-match

S’’=P:{} Q:{have-match@-15;at_stairs@-10} :{have_light,<0 , -10>} t: 0

(:durative-action burn_match:parameters ():duration (= ?duration 15):condition: (and (at start have_match)

(at start have_strikepad)):effect (and (at start have_light)

(at end (not have_light)))

)

Cross_cellar

Have_light

Notice that in contrast to progression,Regression will align the end points of Concurrent actions…(e.g. when we put inLight-match to support have-light)

Cross_cellar

Have_light

Light-match

PO (Partial Order) Search

[Zeno; 1994]

Split theInterval intoMultiple overlappingintervals

Involves Posting temporal Constraints, andDurative goals

Involves LPsolving overLinear constraints(temporal constraintsAre linear too);Waits for nonlinear constraintsTo become linear.

More on Temporal planningby plan-space planners (Zeno)

The “accommodation” to complexity that Zeno makes by refusing to handle nonlinear constraints (waiting instead until they become linear) is sort of hilarious given it doesn’t care much about heuristic control otherwise Basically Zeno is trying to keep the “per-node” cost of the search down (and if

you do nonlinear constraint consistency check, even that is quite hard) Of course, we know now that there is no obvious reason to believe that reducing the

per-node cost will, ipso facto, also lead to reduction in overall search. The idea of “goal reduction” by splitting a temporal subgoal to multiple sub-

intervals is used only in Zeno, and helps it support a temporal goal over a long duration with multiple actions. Neat idea.

Zeno doesn’t have much of a problem handling arbitrary concurrency—since we are only posting constraints on temporal variables denoting the start points of the various actions. In particular, Zeno does not force either right or left alignment of actions.

In addition to Zeno, IxTeT is another influential metric temporal planner that uses plan-space planning idea.

(:durative-action cross_cellar:parameters ():duration (= ?duration 10):condition (and (at start have_light)

(over all have_light)(at start at_steps))

:effect (and (at start (not at_steps)) (at start crossing)(at end at_fuse_box)

)

at_fuse_box@G}

(:durative-action burn_match:parameters ():duration (= ?duration 15):condition: (and (at start have_match)

(at start have_strikepad)):effect (and (at start have_light)

(at end (not have_light)))

)

Cross_cellar GI

At_fusebox

Have_light@<t1,t2>

t1

t2

t2-t1 =10t1 < tGtI < t1

Have_light@t1

(:durative-action cross_cellar:parameters ():duration (= ?duration 10):condition (and (at start have_light)

(over all have_light)(at start at_steps))

:effect (and (at start (not at_steps)) (at start crossing)(at end at_fuse_box)

)

at_fuse_box@G}

(:durative-action burn_match:parameters ():duration (= ?duration 15):condition: (and (at start have_match)

(at start have_strikepad)):effect (and (at start have_light)

(at end (not have_light)))

)

Cross_cellar GI

At_fusebox

Have_light@<t1,t2>

t1

t2

t2-t1 =10t1 < tGtI < t1T4<tGT4-t3=15T3<t1T4<t3 V t1<t4

Have_light@t1

Burn_match

t3 t4~have-light

The ~have_light effect at t4 can violate the <have_light, t3,t1> causal link! Resolve by Adding T4<t3 V t1<t4

(:durative-action cross_cellar:parameters ():duration (= ?duration 10):condition (and (at start have_light)

(over all have_light)(at start at_steps))

:effect (and (at start (not at_steps)) (at start crossing)(at end at_fuse_box)

)

at_fuse_box@G}

(:durative-action burn_match:parameters ():duration (= ?duration 15):condition: (and (at start have_match)

(at start have_strikepad)):effect (and (at start have_light)

(at end (not have_light)))

)

Cross_cellar GI

At_fusebox

Have_light@<t1,t2>

t1

t2t2-t1 =10t1 < tGtI < t1T4<tGT4-t3=15T3<t1T4<t3 V t1<t4T3<t2T4<t3 V t2<t4

Have_light@t1

Burn_match

t3 t4~have-light

To work on have_light@<t1,t2>, we can either --support the whole interval directly by adding a causal link <have-light, t3,<t1,t2>> --or first split <t1,t2> to two subintervals <t1,t’> <t’,t2> and work on supporting have-light on both intervals

Notice that zenoallows arbitraryslack betweenthe two actions

4/3

Discussion of the Sapa/Tp4/Zeno search algorithmsHeuristics for temporal planning

Q/A on Search Methods for Temporal Planning

Menkes: What is meant by the argument that resources are always easy to handle for progression planners? The idea is that the partial plans in the search space of a progression

planner are “position constrained”—so you know exactly when each action starts. Given then, it is a simple matter to check if a particular resource constraint (however complicated and nonlinear) holds over a time point or interval.

In contrast, partial order planners only have constraints on the start points. So, checking that a resource constraint is valid involves checking that it holds on every possible assignment of times to the temporal variables.

The difference is akin to the difference between model checking and theorem proving [Halpern & Vardi; KR91] (you can check the consistency of more complicated formulas in more complicated logics if you only need to do model-checking rather than inference/theorem proving

Q/A contd.

Dan: Can the “interval goal reduction” used in Zeno be made more goal directed? Yes. For example, regressing a goal have_light@[1 15] over

an action that gives have_ligth@[1 7] will make it have_light@[7 15]

Making the reduction goal directed may be actually a smarter idea (especially for position constrained planners—for zeno, it doesn’t make much difference since it splits the interval into two variable-sized intervals.

Q/A contd

Romeo: TL Plan paper says that their strategy is to keep adding concurrent actions until no more actions can be added at the current point, and only then advance the clock. Is this used in SAPA too? Rao: I am surprised to hear that TLPlan does that. If this is used as a

“strategy” rather than as a “heuristic”, then it can lead to loss of completeness. In general, we just because an action can be done doesn’t mean that it should be done.

For example, consider a problem where you want a goal G. Ultimately, all actions that give G wind up requiring, among other conditions, the condition P*. P* is present in the init state. There is an action A that deletes P* and no action gives P*. It is applicable in the init state and doesn’t interfere with ANY of the other actions. Now, if we put A in the plan, just because it can be done concurrently, then we know we are doomed.

I (Rao) made this mistake in my ECP-97 paper on Graphplan (see Footnote 2 in http://rakaposhi.eas.asu.edu/pub/rao/ewsp-graphplan.ps), and figured out my error later

Tradeoffs: Progression/Regression/PO Planning for metric/temporal planning

Compared to PO, both progression and regression do a less than fully flexible job of handling concurrency (e.g. slacks may have to be handled through post-processing).

Progression planners have the advantage that the exact amount of a resource is known at any given state. So, complex resource constraints are easier to verify. PO (and to some extent regression), will have to verify this by posting and then verifying resource constraints.

Currently, SAPA (a progression planner) does better than TP4 (a regression planner). Both do oodles better than Zeno/IxTET. However TP4 could be possibly improved significantly by giving up the insistence

on admissible heuristics Zeno (and IxTET) could benefit by adapting ideas from RePOP.

Heuristic Control

Temporal planners have to deal with more branchingpossibilities More critical to have good heuristic guidance

Design of heuristics depends on the objective function

Classical PlanningNumber of actionsParallel execution timeSolving time

Temporal Resource PlanningNumber of actionsMakespanResource consumptionSlack…….

In temporal Planning heuristics focus on richer obj. functions that guide both planning and scheduling

Objectives in Temporal Planning

Number of actions: Total number of actions in the plan. Makespan: The shortest duration in which we can possibly

execute all actions in the solution. Resource Consumption: Total amount of resource consumed by

actions in the solution. Slack: The duration between the time a goal is achieved and its

deadline. Optimize max, min or average slack values

Combinations there-of

Deriving heuristics for SAPA

We use phased relaxation approach to derive different heuristics

Relax the negative logical and resource effectsto build the Relaxed Temporal Planning Graph

Pruning a bad statewhile preservingthe completeness.

Deriving admissible heuristics:–To minimize solution’s makespan.–To maximize slack-based objective functions.

Find relaxed solution which is used as distance heuristics

Adjust the heuristic valuesusing the negative interaction

(Future work)

Adjust the heuristic valuesusing the resource consumptionInformation.

[AltAlt,AIJ2001]

Heuristics in Sapa are derived from the Graphplan-stylebi-level relaxed temporal planning graph (RTPG)

Progression; so constructed anew for each state..

Relaxed Temporal Planning Graph

Relaxed Action:No delete effects

May be okay given progression planningNo resource consumption

Will adjust later

PersonAirplane

Person

A B

Load(P,A)

Fly(A,B) Fly(B,A)

Unload(P,A)

Unload(P,B)

Init Goal Deadline

t=0 tg

while(true) forall Aadvance-time applicable in S S = Apply(A,S)

Involves changing P,,Q,t{Update Q only with positive effects; and only when there is no other earlier event giving that effect}

if SG then Terminate{solution}

S’ = Apply(advance-time,S) if (pi,ti) G such that ti < Time(S’) and piS then Terminate{non-solution} else S = S’end while; Deadline goals

Details on RTPG ConstructionAll our heuristics are based on the relaxed temporal planning graph structure (RTPG). This is a Graphplanstyle[2] bi-level planning graph generalized to temporal domains. Given a state S = (P;M; ¦; Q; t), the RTPG is built from S using the set of relaxed actions, which are generated from original

actions by eliminating all effects which (1) delete some fact (predicate) or (2) reduce the level of some resource. Since delete effects are ignored, RTPG will not contain any mutex relations, which considerably reduces the cost of constructing RTPG. The algorithm to build the RTPG structure is summarized in Figure 4.

To build RTPG, we need three main datastructures: a fact level, an action level, and an unexecuted event queue Each fact f or action A is marked in, and appears in the RTPG’s fact/action level at time instant tf /tA if it can beachieved/executed at tf /tA. In the beginning, only facts which appear in P are marked in at t, the action level is empty, and the event queue holds all the

unexecuted events in Q that add new predicates. Action A will be marked in if (1) A is not already marked in and (2) all of A’s preconditions are marked in. When action A is in, then all of A’s unmarked instant add effects will also be marked in at t. Any delayed effect e of A that adds fact f is put into the event queue Q if (1) f is not marked in and (2) there is no event e0

in Q that is scheduled to happen before e and which also adds f. Moreover, when an event e is added to Q, we will take out from Q any event e0 which is scheduled to occur after e and also adds f.

When there are no more unmarked applicable actions in S, we will stop and return no-solution if either(1) Q is empty or (2) there exists some unmarked goal with a deadline that is smaller than the time of theearliest event in Q. If none of the situations above occurs, then we will apply advance-time action to S andactivate all events at time point te0 of the earliest event e’ in Q. The process above will be repeated until all the goals are marked in or one of the conditions indicating non-solution occurs.

[From Do & Kambhampati; ECP 01]

Heuristics directly from RTPG

For Makespan: Distance from a state S to the goals is equal to the duration between time(S) and the time the last goal appears in the RTPG.

For Min/Max/Sum Slack: Distance from a state to the goals is equal to the minimum, maximum, or summation of slack estimates for all individual goals using the RTPG. Slack estimate is the difference

between the deadline of the goal, and the expected time of achievement of that goal.

Proof: All goals appear in the RTPG at times smalleror equal to their achievable times.

ADMISSIBLE

PersonAirplane

Person

A B

Load(P,A)

Fly(A,B) Fly(B,A)

Unload(P,A)

Unload(P,B)

Init Goal Deadline

t=0 tg

PersonAirplane

Person

A B

Load(P,A)

Fly(A,B) Fly(B,A)

Unload(P,A)

Unload(P,B)

Init Goal Deadline

t=0 tg

Heuristics from Relaxed Plan Extracted from RTPG

RTPG can be used to find a relaxed solution which is thenused to estimate distance from a given state to the goals

Sum actions: Distance from a state S to the goals equals the number of actions in the relaxed plan.

Sum durations: Distance from a state S to the goals equals the summation of action durations in the relaxed plan.

PersonAirplane

Person

A B

Load(P,A)

Fly(A,B) Fly(B,A)

Unload(P,A)

Unload(P,B)

Init Goal Deadline

t=0 tg

Resource-based Adjustments to Heuristics

Resource related information, ignored originally, can be used to improve the heuristic values

Adjusted Sum-Action:

h = h + R (Con(R) – (Init(R)+Pro(R)))/R

Adjusted Sum-Duration:

h = h + R [(Con(R) – (Init(R)+Pro(R)))/R].Dur(AR)

Will not preserve admissibility

Aims of Empirical Study

Evaluate the effectiveness of the different heuristics.

Ablation studies: Test if the resource adjustment technique helps

different heuristics. Compare with other temporal planning systems.

Empirical Results

  Adjusted Sum-Action Sum-DurationProb time #act nodes dur time #act nodes dur

Zeno1 0.317 5 14/48 320 0.35 5 20/67 320

Zeno2 54.37 23 188/1303 950 - - - -

Zeno3 29.73 13 250/1221 430 6.20 13 60/289 450

Zeno9 13.01 13 151/793 590 98.66 13 4331/5971 460

Log1 1.51 16 27/157 10.0 1.81 16 33/192 10.0

Log2 82.01 22 199/1592 18.87 38.43 22 61/505 18.87

Log3 10.25 12 30/215 11.75 - - - -

Log9 116.09 32 91/830 26.25 - - - -

Sum-action finds solutions faster than sum-dur Admissible heuristics do not scale up to bigger problems

Sum-dur finds shorter duration solutions in most of the casesResource-based adjustment helps sum-action, but not sum-durVery few irrelevant actions. Better quality than TemporalTLPlan.

So, (transitively) better than LPSAT

Empirical Results (cont.)

Logistics domain with driving restricted to intra-city(traditional logistics domain)

0102030405060708090100

0 100 200 300 400 500 600 700 800 900 1000

Solving Time (s)

Pro

ble

m S

olv

ed

(%

)

SAPA

TP4

TGP

Sapa is the only planner that can solve all 80 problems

Empirical Results (cont.)

The “sum-action” heuristic used as the default in Sapa can be mislead by the long duration actions...

Logistics domain with inter-city driving actions

Future work on fixed point time/level propagation

0102030405060708090100

0 100 200 300 400 500 600 700 800 900 1000

Solving Time (s)

Pro

ble

ms

So

lve

d (

%)

SAPA

TP4

TGP

Multi-objective search

Multi-dimensional nature of plan quality in metric temporal planning: Temporal quality (e.g. makespan, slack) Plan cost (e.g. cumulative action cost, resource consumption)

Necessitates multi-objective optimization: Modeling objective functions Tracking different quality metrics and heuristic estimation Challenge: There may be inter-dependent

relations between different quality metric

Next C

lass: