16.1 Introduction
As we saw in the preceding chapter, constraints divide the set of all states into two groups - those that are possible and those that are not possible. Goals are similar in that they partition the set of all states into two groups - those that satisfy our goals and those that do not. The difference is that constraints are concerned with the possibility or impossibility of states whereas goals are concerned with the desirability or undesirability of states.
It is worth noting that desirability can come in varying degrees. One state may be more desirable than a second state; that state may be more desirable than a third; and so forth. In our treatment, we ignore this possibility and consider just two cases - desirable and undesirable. The concepts and techniques that we discuss can be extended to handle relative desirability, but limiting ourselves to just two groups makes it easier to discuss the underlying issues.
As with constraints, we characterize goals by writing rules defining a special 0-ary relation. In this case, we call the relation goal. One difference is that our definitions are the opposite of our definitions of constraints. Rather than saying when a state does not satisfy our goals, we write rules that say when a state does satisfy our goals.
As an example, consider the Blocks World goal definition shown below. Here we say that a state satisfies our goal if there is a block X is on a block Y and block Y is on block Z.
| goal :- on(X,Y) & on(Y,Z) |
This goal definition does not mention any specific blocks. Any stack of three blocks will do. Moreover, we do not care how the other blocks in the scene are arranged. The upshot is that there can be multiple states that satisfy this goal.
Given a goal, we might want to come up with a dataset / state that satisfies that goal. For example, if an architect were given a set of goals for a building, he might want to formulate a design that satisfies those goals.
Luckily, we already know how to do this. The constraint satisfaction techniques described in the last chapter can be used with goals to find such datasets. We just need to adjust the techniques to deal with positively defined goals rather than negatively defined constraints.
Dynamic worlds present more interesting problems. Given a goal and a set of legal actions, we might want to come up with a plan of action that transforms an initial state into a state that satisfies the goal. In this case, we are not just interested in finding a state that satisfies the goal; we also want a plan for getting there.
Planning to achieve goals using legal actions is more complex than simply finding states that satisfies those goals. In this chapter, we describe a variety of techniques for solving such planning problems. In the next section, we define various different types of plans. We then look at two different ways to find plans - forward planning and goal regression. One we have done that, we show how we can recast planning as a problem in Logic Programming.
16.2 Sequential Plans
A sequential plan is a sequence of actions. A sequential plan is legal if and only if every action in the sequence is legal in the state in which that action is performed. A plan is complete if and only if it leads from the initial state to a goal state. A plan is minimal if and only if none of the intermediate states is a goal state.
Consider the 8-puzzle game described above. The sequential plans shown below are all legal, complete, and minimal. The actions in the sequences shown are all legal. Each leads from the initial state to a goal state. And none of the intermediate states produced during the execution is a goal state.
| [right,down,right,down] | |
| [right,down,left,right,right,down] | |
| [right,down,left,right,left,right,right,down] | |
Sequential planning is the process of finding sequential plans. A sequential planner is said to be admissible if and only if it returns an sequential plan.
16.3 Forward Planning
A depth-first search is conceptually the simplest approach to sequential planning. The procedure takes a player and a state as arguments and returns the corresponding utility and plan as value. The procedure takes the form of a recursive exploration of the game tree. If the state supplied as argument is terminal, the output is just the player's reward for that state and the empty plan. Otherwise, the output is the maximum of the utilities of the states resulting from executing any of the player's legal actions in the given state and the corresponding plan.
| |
function bestplan (role,state)
{if (terminalp(state)) {return [findreward(role,state,game),[]]};
var actions = findlegals(role,state,game);
var result = bestplan(role,findnext(roles,[actions[0]],state,game));
var score = result[0];
var plan = result[1];
plan[plan.length] = actions[0];
for (var i=1; i<actions.length; i++)
{var result = bestplan(role,findnext(roles,[actions[i]],state,game));
if (result[0]>score}
{score = result[0];
plan = result[1];
plan[plan.length] = actions[i]};
return [score,plan]}
|
Note that this procedure may not produce the shortest plan. However, it is guaranteed to produce an optimal plan as defined above.
16.4 Planning as Logic Programming
A dynamic system is one that changes state over time. In some cases, the changes occur in response to purely internal events (such as the ticks of a clock). In some cases, these changes are prompted by external inputs. In this chapter, we look at the use of Logic Programming to model purely reactive systems, i.e. those that change in response to external inputs.
Once again consider the Blocks World introduced in Chapter 2. One state of the Blocks World is shown below. Here block C is on block A, block A is on block B, and block E is on block D.
Now, let's consider a dynamic variation of this world, one in which we have operations that can modify the state of the world. For example, unstacking C from A results in the state shown below on the left, and unstacking E from D results in the state shown on the right.
In this chapter, we look at a common way of modeling the behavior of systems like this. In the next section, we introduce fluents (facts that change truth value from one state to another; we then look at actions (inputs that cause such state changes); and, in the section after that, we look at how to write view rules that define the effects of actions on fluents. Given this axiomatization, we then show how to simulate the effects of actions and we show how to plan sequences of actions that lead to desirable states.
In previous chapters, we saw how to describe the state of the Blocks World as a dataset using the unary relation block and the binary relation on, and we saw how to define others relations, such as clear and table in terms of these base relations.
Unfortunately, this representation is insufficient for describing dynamic behavior. In a dynamic version of the Blocks World, the properties of blocks and their relationships change over time, and we have to take this into account. Fortunately, we can do this by slightly modifying the items in our previous vocabulary and adding in a few additional items.
First of all, we add a symbol s1 to our language to stand for the initial state of the world. We could also add symbols for other states of the world; but, as we shall see in the next section, we can refer to these other states in more convenient way. Second, we introduce a notion of a fluent and provide a suitable representation. A fluent is a condition that can change truth value from one state to the next. In our formalization, we take the ground atoms of our static representation to be fluents. However, we now treat them as terms instead of factoids. For example, we now treat on(a,b) and clear(a) as terms rather than as factoids. They are no longer conditions that are always true or false; they are now terms that are true in some states and false in others.
In order to talk about the truth of fluents in specific states, we introduce the binary predicate tr and use it to capture the fact that a specified fluent is true in a specified state.
For example, we can represent the initial state of the world shown above with the sentences shown below. Block c is on block a in state s1; block a is on block b; and so forth.
| tr(clear(c),s1) |
| tr(on(c,a),s1) |
| tr(on(a,b),s1) |
| tr(table(b),s1) |
| tr(clear(e),s1) |
| tr(on(e,d),s1) |
| tr(table(d),s1) |
In order to talk about actions that can change the world, we introduce constructors, one for each action. For example, in our Blocks World setting, we would add two new binary constructors u and s. u(X,Y) represents the action of unstacking block X from block Y, and s(X,Y) represents the action of stacking block X onto block Y.
In order to define the effects of our actions, we add a binary constructor do to talk about the result of performing a given action in a given state. For example, we would write do(u(c,a),s1) to refer to the state that results from performing action u(c,a) in state s1.
To capture the physics of the world, we write rules that say how the world changes as a result of performing each of our actions. See below.
| tr(table(X),do(u(X,Y),S)) :- tr(clear(X),S) & tr(on(X,Y),S) |
| tr(clear(Y),do(u(X,Y),S)) :- tr(clear(X),S) & tr(on(X,Y),S) |
| |
tr(on(X,Y),do(s(X,Y),S)) :- tr(clear(X),S) & tr(table(X),S) & tr(clear(Y),S)
|
Note that, in addition to describing changes, we also need to write sentences that records inertia (what stays the same). For example, if we unstack one block from another, all blocks that were clear remain clear; all blocks that were on the table remain on the table; and all blocks that were on top of each other remain on top of each other except for the blocks involved in the unstacking operation. The following sentences capture the inertial behavior of the Blocks World.
| tr(clear(U),do(u(X,Y),S)) :- tr(clear(U),S) |
| tr(table(U),do(u(X,Y),S)) :- tr(table(U),S) |
| tr(on(U,V),do(u(X,Y),S)) :- tr(on(U,V),S) & distinct(U,X) |
| tr(on(U,V),do(u(X,Y),S)) :- tr(on(U,V),S) & distinct(V,Y) |
| |
| tr(clear(U),do(s(X,Y),S)) :- tr(clear(U),S) & distinct(U,Y) |
| tr(table(U),do(s(X,Y),S)) :- tr(table(U),S) & distinct(U,X) |
| tr(on(U,V),do(s(X,Y),S)) :- tr(on(U,V),S) |
Sentences like these, which express what remains the same, are often called frame axioms. Many people are irked by the need to formalize what remains the same in representing dynamics. Luckily, there are alternative ways of formalizing dynamics that eliminates the need for frame axioms. For example, instead of formalizing what is true in the state that results from performing an action, we can formalize what changes from the current state (in the form of add lists and delete lists). See Chapter 14 for more discussion of this technique.
16.5 Simulation
Simulation is the process of determining the state that results from the execution of a given action of sequence of actions in a given state. Once we have a representation of the physics of our world, simulation is easy.
As an example, consider the initial state described in the preceding section, and consider the following sequence of two actions. We first unstack c from a, and we then stack c onto e. If we were interested in the state of affairs after the execution of these actions, we could write the query shown below.
| goal(P) :- tr(P,do(s(c,d),do(u(c,a),s1))) |
The initial state is shown below, once again.
| tr(clear(c),s1) |
| tr(on(c,a),s1) |
| tr(on(a,b),s1) |
| tr(table(b),s1) |
| tr(clear(e),s1) |
| tr(on(e,d),s1) |
| tr(table(d),s1) |
Using this data and the change rules and frame axioms, we see that executing u(c,a) in this state results in the data shown below.
| tr(clear(c),do(u(c,a),s1)) |
| tr(table(c),do(u(c,a),s1)) |
| tr(clear(a),do(u(c,a),s1)) |
| tr(on(a,b),do(u(c,a),s1)) |
| tr(table(b),do(u(c,a),s1)) |
| tr(clear(e),do(u(c,a),s1)) |
| tr(on(e,d),do(u(c,a),s1)) |
| tr(table(d),do(u(c,a),s1)) |
Applying the change rules and frame rules once again, we get the following conclusions.
| tr(clear(c),do(s(c,e),do(u(c,a),s1))) |
| tr(on(c,e),do(s(c,e),do(u(c,a),s1))) |
| tr(clear(a),do(s(c,e),do(u(c,a),s1))) |
| tr(on(a,b),do(s(c,e),do(u(c,a),s1))) |
| tr(table(b),do(s(c,e),do(u(c,a),s1))) |
| tr(on(e,d),do(s(c,e),do(u(c,a),s1))) |
| tr(table(d),do(s(c,e),do(u(c,a),s1))) |
Finally, using the rule defining the goal predicate, we end up with the data shown below.
| goal(clear(c)) |
| goal(on(c,e)) |
| goal(clear(a)) |
| goal(on(a,b)) |
| goal(table(b)) |
| goal(on(e,d)) |
| goal(table(d)) |
The partial results shown above make his process look complicated, but in reality the process is fairly simple and not very expensive. Finding a plan to achieve a goal state is not so simple.
16.6 Green's Method
Planning is in some ways the opposite of simulation. In simulation, we start with an initial state and a plan, i.e. a sequence of actions; and we use simulation to determine the result of executing the plan in the initial state. In planning, we start with an initial state and a goal, i.e. a set of desirable states; and we use planning to compute a plan that achieves one of the goal states.
In what follows, we again use the unary predicate goal, except, in this case, we define it to be true of desired states rather than fluents as in the formalization in the last section. For example, the following rule defines goal to be true of a state if and only if the fluents on(a,b) and on(b,c) are true in that state.
| goal(S) :- tr(on(a,b),S) & tr(on(b,c),S) |
Using this definition of goal and the rules in Section 16.4, it is easy to see that the following conclusion is true, i.e. on(a,b) and on(b,c) are both true in state do(s(a,b),do(s(b,c),do(u(a,b),do(u(c,a),s1)))), i.e. the state that results from unstacking c, unstacking a, stacking b onto c, and stacking a onto b.
| goal(do(s(a,b),do(s(b,c),do(u(a,b),do(u(c,a),s1))))) |
In principle, we should be able to derive this conclusion through bottom-up or top-down evaluation. Unfortunately, bottom-up evaluation explores many plans that have nothing to do with the goals. Top-down evaluation stays focussed on the goal. Unfortunately, with the rules shown above, it is likely to go into an infinite loop, exploring longer and longer plans when the simple 4-step plan shown above works.
The solution to this dilemma is to use a hybrid of the two approaches. We define the binary predicate plan as shown below. plan is true of the empty plan and a state if and only if that state satisfies the goal. It is true of a non-empty sequence of actions if and only if the "tail" of the sequence achieves the goal when executed in the result of applying the first action in the given state.
| plan(nil,S) :- goal(S) |
| plan(A!L,S) :- plan(L,do(A,S)) |
Given this definition, we can pose the question plan(L,s1), and top-down evaluation will try plans of increasing length, trying short plans to see if the achieve the goal and moving to longer plans only if none fails to achieve the goal.
This approach is somewhat faster than bottom-up execution and guarantees to produce the shortest possible plan. That said, the method is still expensive. A significant amount of research has been done to find ways to produce plans more effectively. However, it is possible to show that a complete search of the plan space is sometimes needed.
Exercises
Exercise 16.1: Suppose we wanted to augment the Blocks World with an action move(X,Y,Z) that moves block X from block Y to block Z. Write the change axioms for this new action in terms of the vocabulary introduced in this chapter.
Exercise 16.2: Given the change and frame axioms in Section 16.2 and the data shown below, evaluate the query goal(P) :- tr(P,do(s(b,e),do(u(a,b),do(u(c,a),s1)))).
| tr(clear(c),s1) |
| tr(on(c,a),s1) |
| tr(on(a,b),s1) |
| tr(table(b),s1) |
| tr(clear(e),s1) |
| tr(on(e,d),s1) |
| tr(table(d),s1) |
Exercise 16.3: Assuming the change axioms and frame axioms in Section 16.2, the goal definition in Section 16.4, and the data shown below, give one answer to the query result([X,Y]) :- plan([X,Y],s2).
| tr(clear(a),s2) |
| tr(table(a),s2) |
| tr(clear(b),s2) |
| tr(on(b,c),s2) |
| tr(table(c),s2) |
| tr(clear(e),s2) |
| tr(on(e,d),s2) |
| tr(table(d),s2) |
|