Chapter 4 - Query Evaluation

Logic Programming

What
versus
How

4.1 Introduction

In Chapter 3, we described the semantics of queries in terms of instances of query rules. While the definition is easy to understand and mathematically precise, enumerating instances is not a practical method for computing answers to queries. In this chapter, we present an algorithm that produces the same results but in a more efficient manner.

We begin this chapter with a discussion of evaluation of queries without variables. In Section 4.3, we look at a way of comparing expressions containing variables. In the section after that, we show how to combine that technique with the procedure described here to produce an evaluation procedure for queries with variables. We then discuss the computational complexity of our evaluation algorithm. We close with some techniques for reformulating queries to decrease the cost of query evaluation.

4.2 Evaluating Ground Queries

If a query contains multiple query rules, we check whether the body of each rule is true. If so, we add the head of the rule to the extension of our query. The procedure for determining whether the body is true depends on the type of the body.

If the body of a query rule is a single atom, we check whether that atom is contained in our dataset. If so, the body is true.

If the body is a negated atom, we check whether the atom is contained in our dataset. If so, the body is false. If the atom is not contained in our dataset, then the body is true.

If the body is a conjunction of literals, we first execute this procedure on the first conjunct. If the answer is true, we move on to the next conjunct and so forth until we are done. If the answer to any one of the conjuncts is false, then the value of the body as a whole is false.

Consider the dataset shown below. There are four symbols a, b, c, and d; and there is a single binary predicate p.

p(a,b)

p(b,c)

p(c,d)

Now, imagine that we are asked to evaluate the query shown below. In this case there are three rules. To compute all answers, we execute our procedure on each of these rules.

goal(a) :- p(a,c)

goal(b) :- p(a,b) & p(b,a)

goal(c) :- p(c,d) & ~p(d,c)

In the first rule, the body p(a,c) is an atom, so we just check whether or not it is in the dataset. Since it is not there, nothing is contributed to our result.

The body of the second rules is a conjunction p(a,b) & p(b,a), and so we evaluate the conjuncts in order to see whether they are all true. In this case, the first conjunct is true but the second is false, so the conjunction as a whole is false and again nothing is contributed to our result.

The body of the third rule is also a conjunction p(c,d) & ~p(d,c). Again, we check the conjuncts in turn. p(c,d) is true, so we move on to ~p(d,c). p(d,c) is false so the negation is true. Since both conjuncts of the conjunction are true, the conjunction as a whole is true. Consequently, we add the head of our rule goal(c) to our result.

4.3 Matching

Matching is the process of determining whether a pattern (an expression with or without variables) matches an instance (an expression without variables), i.e. whether the two expressions can be made identical by appropriate substitutions for the variables in the pattern.

A substitution is a finite mapping of variables to ground terms. In what follows,, we write substitutions as sets of replacement rules, like the one shown below. In each rule, the variable to which the arrow is pointing is to be replaced by the term from which the arrow is pointing. In this case, X is associated with a and Y is associated with b.

{X←a, Y←b}

The result of applying a substitution σ to an expression φ is the expression φσ obtained from φ by replacing every occurrence of every variable with a binding in the substitution by the term to which it is bound.

`p(X,b)`{`X`←`a`, `Y`←`b`}	=	`p(a,b)`
`q(X,Y,X)`{`X`←`a`, `Y`←`b`}	=	`q(a,b,a)`

A substitution is a matcher for a pattern and an instance if and only if applying the substitution to the pattern results in the given instance. One good thing about our language is that there is a simple and inexpensive procedure for computing a matcher of a pattern and an expression if it exists.

See below for a version of a matching procedure encoded in Javascript. The procedure assumes a representation of expressions as sequences of subexpressions. For example, the expression p(X,f(b)) can be thought of as a sequence with three elements, viz. the predicate p, the variable X, and the expression f(b).


function match (x,y,theta)
 {if (x===y) {return theta};
  if (variable(x)) {return matchvar(x,y,theta)};
  if (constant(x) || constant(y)) {return false};
  if (x.length!==y.length) {return false};
  for (var i=0; i<x.length; i++)
      {theta = match(x[i],y[i],theta);
       if (theta===false) {return false}};
  return theta}

function matchvar (x,y,theta)
 {if (theta[x]!==undefined) {return unify(theta[x],y,theta)};
  return extend(theta,x,y)}

function extend (theta,x,y)
 {var gamma = {};
  for (var key in theta) {gamma[key] = theta[key]};
  gamma[x] = y;
  return gamma}

We start the procedure with two expressions (the pattern and the instance) and a substitution (initially empty). We then recursively process the two expressions, comparing the subexpressions at each point. Along the way, we extend the substitution with variable assignments as described below.

If the pattern and the instance are the same, then the procedure succeeds, returning the unmodified substitution as result.

If the pattern is a variable with a binding, we compare the binding for the variable with the instance. If the pattern is a variable without a binding, we return the composition of the given substitution with a substitution in which the variable is bound to the expression in the given instance.

If either the pattern or the instance is a constant, then the procedure fails.

If the pattern and the instance have different lengths, we fail.

Otherwise, we iterate across the corresponding subexpressions of the pattern and the instance. If we fail to match a sub-pattern and a sub-instance at any point in this process, the procedure as a whole fails. If we finish this recursive comparison of the pattern and the instance, the procedure as a whole succeeds and the accumulated substitution at that point is the resulting matcher.

As an example of this procedure in operation, consider the process of matching the pattern p(X,Y) and the instance p(a,b) with the initial substitution {}. A trace of the execution of the procedure for this case is shown below. We show the beginning of a comparison with a line labelled Compare together with the expressions being compared and the input substitution. We show the result of each comparison with a line labelled Result. The indentation shows the depth of recursion of the procedure.

Compare: p(X,Y), p(a,b), {}

Compare: p, p, {}

Result: {}

Compare: X, a, {}

Result: {X←a}

Compare: Y, b, {X←a}

Result: {X←a, Y←b}

Result: {x←a, y←b}

As another example, consider the process of matching the pattern p(X,X) and the instance p(a,a). A trace is shown below. By the time we compare the last arguments in the two expressions, X is bound to a, so the match succeeds.

Compare: p(X,X), p(a,a), {}

Compare: p, p, {}

Result: {}

Compare: X, a, {}

Result: {X←a}

Compare: X, a, {X←a}

Result: {X←a}

As a final example, consider the process of trying to match the pattern p(X,X) and the expression p(a,b). The main interest in this example comes in comparing the last argument in the two expressions, viz. X and b. By the time we reach this point, X is bound to a and the corresponding instance is b. Since the pattern is a symbol and the instance is a different symbol, the match attempt fails.

Compare: p(X,X), p(a,b), {}

Compare: p, p, {}

Result: {}

Compare: X, a, {}

Result: {X←a}

Compare: X, b, {X←a}

Result: false

This matching procedure is quite simple. However, it is worth understanding thoroughly, as it is the basis for more complicated matching procedures defined later.

4.4 Evaluating Queries With Variables

Evaluating queries with variables is complicated by the fact that there can be multiple variable bindings that make the query true; and, consequently, there can be multiple possible answers. In some cases, we want just a single answer; in some cases, we want several answers; and, in other cases, we want all answers. In what follows, we talk about a procedure for generating all answers. The procedures for the other cases are analogous.

See below for a top-down query evaluation procedure encoded in Javascript. The procedure takes as arguments an answer pattern, a query expression, and a dataset (a list of factoids). It calls a subroutine compall to produce a list of substitutions that make the query true, substitutes the variable bindings into the answer pattern, and returns a list of the resulting instances.


function compute (x,p,facts)
 {var bindings = compall([p],{},facts);
  var results = [];
  for (var i=0; i<bindings.length; i++)
      {var result = plug(x,bindings[i]);
       if (!find(result,results)) {results.push(result)}};
  return results}

function plug (x,theta)
 {if (variable(x)) {return plugvar(x,theta)};
  if (constant(x)) {return x};
  var exp = [];
  for (var i=0; i<x.length; i++)
      {exp[i] = plug(x[i],theta)};
  return exp}

function plugvar (x,theta)
 {if (theta[x]===undefined) {return x};
  return theta[x]}

The compall procedure does all of the hard work in evaluating queries. It takes as arguments a list of subgoals, a substitution (initially empty), and a dataset. It returns a list of all variable substitutions that make all of the given subgoals true on the basis of the given dataset.

The procedure assumes that the subgoals in the query and the factoids in the dataset are all represented as sequences of expressions. For example, the expression p(X,f(b)) can be thought of as a sequence with three elements, viz. the predicate p, the variable X, and the expression f(b).


function compall (pl,theta,facts)
 {if (pl.length===0) {return [theta]};
  var p = pl[0]; pl = tail(pl);
  if (p[0]==='not')
     {if (compall([p[1]],theta,facts).length>0) {return []};
      return compall(pl,theta,facts)};
  if (p[0]==='and')
     {return compall(tail(p).concat(pl),theta,facts)};
  var results = [];
  for (var i=0; i<facts.length; i++)
      {var gamma = match(p,facts[i],theta);
       if (gamma!==false)
          {var newresults = compall(pl,gamma,facts);
           results = results.concat(newresults)}};
  return results}

The procedure processes the subgoals in order. If at any point there are no subgoals, then the procedure returns a list of the given substitution as answer. Otherwise, it processes the first subgoal to get a list of answers that make it true and then executes the procedure recursively on the remaining subgoals, once for each answer to the first subgoal.

If the first subgoal is a negation, the procedure calls itself on the argument of the negation and the given substitution. If the result is a non-empty set (i.e. there are any substitutions that work), then the negation is false and the procedure returns false as answer. If the result of the recursive execution is the empty set (i.e. there are no substitutions that work), then the negation as a whole is true and the procedure calls itself recursively with the remaining subgoals and the corresponding substitution and returns the result.

If the query is a conjunction, the procedure adds the conjuncts to the subgoal list and calls itself recursively.

Otherwise, the subgoal must be an atom. In this case, the procedure tries matching the atom to the factoids in the given dataset. For each factoid that matches the atom, the procedure calls itself recursively with the remaining subgoals and the corresponding substitution and returns the answers obtained from these recursive calls.

To illustrate the overall procedure, consider the dataset shown below.

p(a,b)

p(a,c)

p(b,c)

p(c,d)

Now, consider the answer pattern goal(Y) and the query p(a,Y). The subgoal p(a,Y) matches two factoids in our dataset, and so there are two workable substitutions, leading to two overall results.

goal(b)

goal(c)

Suppose instead we had the query p(a,Y) & p(Y,d). Once again, the subgoal p(a,Y) matches just two factoids in our dataset. Given {Y←b}, the subgoal p(Y,d) does not match any factoids; given {Y←c}, the pattern p(Y,d) matches just p(c,d). Thus, there is just one answer in this case.

goal(c)

Given the conjunctive query p(a,Y) & ~p(Y,d), we would again find two answers to the first conjunct, viz. {Y←b} and {Y←c}. Given the first of these, the negation ~p(Y,d)is satisfied, and so the conjunction is true. Given the second answer to the first conjunct, the negation fails and so there is no answer in this case. As with the last query, there is just one answer.

goal(b)

Finally, for a query with more than one rule, we would get the union of the answers to the individual rules.

4.5 Computational Analysis

One nice feature of our query evaluation algorithm is that computational analysis is straightforward. In this section, we assume a standard left-to-right implementation of evaluation with no indexing of datasets and no caching of results once they are computed.

Consider the query shown below.

goal(a,c) :- p(a,Y) & p(Y,c)

What is the cost of evaluating this query? In the worst case, there are n² facts in the database, where n is the number of ground terms in our language. So, we need n² steps to evaluate the first conjunct. There are at most n facts that have a as first argument. For each of these, we look at n² possibilities for the second conjunct. Hence, the cost of computing the instance can be expressed as shown below.

n² + n*n² = n² + n³

Now consider the general version of this query shown below.

goal(X,Z) :- p(X,Y) & p(Y,Z)

What is the cost of computing all answers to this query? In the worst case, there are n² facts in the database, where n is the number of objects in the domain. So, we need n² steps to evaluate the first subgoal. There are at most n² possible values for Y. For each of these, we look at n² possibilities for the second subgoal. The resulting cost is shown below.

n² + n²*n² = n² + n⁴

Before leaving our analysis of complexity, it is instructive to compare the cost of computing answers using this algorithm with the cost of computing answers in accordance with the semantics described in the preceding chapter, i.e. by enumerating all instances of rules and, for each such instance, checking whether the body of each such instance is true or false.

In the preceding example, the query has just three variables. Consequently, for a domain with n objects, there are n³ instances. To evaluate each of these instances, we must compare each of our two subgoals to every factoid in our dataset, and in the worst case there are n² of these. The overall cost is shown below.

n³(n² + n²) = n⁵ + n⁵ = 2n⁵

Our algorithm is clearly better in this case, and the relative benefits are greater when we consider sparse datasets, i.e. datasets that do not include all possible factoids. In such cases, the "semantics-based" algorithm must still look at all instances, but our algorithm needs to look at only those instances that are derived form factoids in the dataset.

Note that, in the presence of dataset indexing or caching of results, the details of these analyses are likely to be different, but the style of analysis is the same and the relative merits of the two algorithms are more or less the same.

4.6 Query Optimization

Two queries are semantically equivalent if and only if they produce identical results for every dataset. In such cases, it does not matter which query we write if all we care about is getting the right answers. On the other hand, it is possible that one query is computationally better than another in that our evaluation algorithm gets those answers more quickly.

In the following section, we look at a variety of techniques for optimizing queries, i.e. converting expensive queries into semantically equivalent queries that are computationally less complex. We begin with a discussion of subgoal ordering within query rules. We then look at eliminating useless subgoals. And we finish with a discussion of eliminating rules from queries with multiple rules.

4.7 Subgoal Ordering

One very common source of inefficiency in evaluation stems from non-optimal ordering of subgoals within queries. The good news is that it is often possible to find better orderings just by looking at the form of the queries involved even without looking at the data to which the queries are applied.

As an example of inefficiency due to poor subgoal ordering, consider the query shown below. Our goal is true of X and Y if p is true of X and r is true of X and Y and q is true of X.

goal(X,Y) :- p(X) & r(X,Y) & q(X)

Intuitively, this seems like a bad way to write this query for our usual evaluation procedure. It seems as though the q condition should come before the r condition, as in the following query.

goal(X,Y) :- p(X) & q(X) & r(X,Y)

In fact, there is good reason for this intuition. For our standard evaluation procedure, the worst-case cost of evaluating the first query is n⁴, where n is the size of the domain of objects. By contrast, the worst-case cost of evaluating the second query is just n³.

Let's look at these two cases in more detail. In the worst case, there are n² + 2n facts in the database, where n is the number of objects in the domain.

In evaluating the first query, our algorithm would first examine all n^2 + 2n facts to find those that match p(X). There would be at most n answers. For each of these, the algorithm would again look at n^2 + 2n facts to find those that match r(X,Y). There would be n for each value of X. Finally, for each of these pairs, the algorithm would again examine at n^2 + 2n facts to find those that match q(X). The grand total is shown below.

(n^2 + 2n) + n*((n^2 + 2n) + n*(n^2 + 2n)) = n^4 + 3n^3 + 3n^2 + 2n

In evaluating the second query, the algorithm would again examine all n^2 + 2n facts to find those that match p(X). There would be at most n answers. For each of these, the algorithm would again look at n^2 + 2n facts to find those that match q(X). There would be at least one for each value of X. Finally, for each such X, the algorithm would examine n^2 + 2n facts to find those that match r(X,Y). The grand total is shown below.

(n^2 + 2n) + n*((n^2 + 2n) + 1*(n^2 + 2n)) = 2n^3 + 5n^2 + 2n

Suppose, for example, there were 4 objects in the domain. In this case, there would be at most 4 p facts and 4 q facts and 16 r facts. Evaluating the first query would require 504 matches, while evaluating the second query would require only 208.

In the presence of indexing, the asymptotic complexity is the same for both orderings. However, the lower degree terms for the second ordering suggest that it is still the better ordering. Moreover, it is possible to show that, averaging over all possible databases, the second query is better than the first.

Fortunately, there is a simple method for reordering subgoals in situations like this. The basic idea is to assemble a new body for the query incrementally, picking a subgoal on each step and removing it from the list of remaining subgoals to be considered. In making its choice, the method examines the remaining subgoals in left-to-right order. If it encounters a subgoal all of whose variables are bound by subgoals already chosen, then that subgoal is added to the new query and removed from the list of remaining subgoals. If not, the method removes the first remaining subgoal from the list, adds it to the new query, updates its list of bound variables, and moves on to the next step.

As an example of this method in action, consider the first query shown above. At the start, the list of remaining subgoals consists of all three subgoals in the query. At this point, none of the three subgoals is ground, so the method chooses the first subgoal p(X), adds it to its new query, and puts X on the list of bound variables. On the second step, the method looks at the remaining two subgoals. The subgoal r(X,Y) contains the unbound variable Y and so it is not chosen. By contrast, all of the variables in the subgoal q(X) are bound, so the method outputs this subgoal next. On the third step, the final subgoal is added forming the second query shown above.

4.8 Subgoal Removal

Another common source of inefficiency in evaluation stems from the presence of redundant subgoals within queries. In many cases, it is possible to detect and eliminate such redundancies.

As a simple example of the problem, consider the query shown below. Relation goal is true of X and Y if p is true of X and Y and q is true of Y and q is also true of some Z.

goal(X,Y) :- p(X,Y) & q(Y) & q(Z)

It should be clear that the subgoal q(Z) is redundant here. If there is a value for Y such that q(Y) is true, then that value for Y also works as a value for Z. Consequently, we can drop the q(Z) subgoal (while retaining q(Y)), resulting in the query shown below.

goal(X,Y) :- p(X,Y) & q(Y)

Note that the opposite is not true. If we were to drop q(Y) and retain q(Z), we would lose the constraint on the second argument of p, which is also an argument to r.

Fortunately, it is easy to determine which subgoals can be removed and which need to be retained. The basic idea is to assemble a new body for the query incrementally, picking a subgoal on each step, checking for redundancy, and adding the subgoal once if it is not redundant.

As an illustration of this method in action, consider the example shown above. The method first computes the variables in the head of the query, viz. [X,Y] and initializes the variable newquery to a new query with the same head as the original query. It then iterates over the body of the query, adding subgoals to the new query once they are checked for redundancy.

On the first step of the iteration, the method focusses on p(X,Y). It creates a dataset consisting of instances of the remaining subgoals, viz. q(x1) and q(x2); it then tries to derive p(x0,x1); and, in this case, it fails. So p(X,Y) is added to the new query.

On the second step, the method focusses on q(Y). It creates a dataset consisting of instances of the remaining subgoals, viz. p(x0,x1) and q(x2); it then tries to derive q(x1); and, once again, it fails. So q(Y) is added to the new query.

Finally, the method focusses on q(Z). It creates a dataset consisting of instances of the other subgoals, viz. p(x0,x1) and q(x1); it then tries to derive an instance of q(Z). Note that Z is not bound here since it does not occur as a head variable or a variable in any of the other subgoals. In this case, the test succeeds; and so q(Z) is not added to the new query.

This method is sound in that it removes only redundant subgoals. As a result, any query produced by this method is equivalent to the query it is given as input.

Unfortunately, the method is not complete. There are redundant subgoals that it is does not detect. The problem arises when multiple redundant subgoals share variables that prevent the method from detecting the redundancy.

As an example, consider the query shown below. The relation goal is true of X if p is true of X and Y and q is true of X and Y and p is true of X and Z and q is true of X and Z.

goal(X) :- p(X,Y) & q(X,Y) & p(X,Z) & q(X,Z)

Clearly the last two subgoals are redundant with the first two subgoals. Unfortunately, our method does not detect that either of the subgoals in either pair is redundant because of the variables shared with the other subgoal of the pair. Try it.

Detecting this sort of redundancy can be done mechanically by considering subsets of subgoals and not just individual subgoals. However, this is more expensive than the simple method outlined above.

4.9 Rule Removal

An analogous form of inefficiency in evaluation stems from the presence of redundant rules. As with redundant subgoals within rules, it is often easy to detect and eliminate such redundancies.

As an example of the problem, consider the rules shown below. Relation goal is true of X if p is true of X and Y and q is true of b and r is true of Z. Relation goal is true of X if p is true of X and Y and q is true of Y.

goal(X) :- p(X,b) & q(b) & r(Z)

goal(X) :- p(X,Y) & q(Y)

Any answer produced by the first rule here is also produced by the second rule, so the first rule is redundant and can be eliminated.

The trick to detecting such redundancies is to recognize that the second rule subsumes the first, i.e. all of the answers produced by the second rule are produced by the first rule. If we replace some or all of the variables in the body the second rule that do not occur in the head, then the heads remain the same and all of its subgoals are members of the body of the first rule. In this case, all we need to do is to replace Y by b, and we get the rule goal(X) :- p(X,b) & q(b), which is the same as the first rule except with fewer subgoals. Every output of the first rule is, therefore, an output of the second; and, consequently, the first rule can be dropped.

4.10 Example - Cryptarithmetic

A cryptarithmetic problem is a constraint satisfaction problem characterized by a finite set of letters and a finite set of numbers and an arithmetic constraint written in terms of the letters. A solution to the problem is a 1-1 mapping of letters to numbers such that, when the letters are replaced by their corresponding numbers, the arithmetic constraint is satisfied.

A classic cryptarithmetic is shown below. Here, the letters are {S, E, N, D, M, O, R, Y} and the numbers are the digits from 0 through 9 and we are looking for an assignment of the letters to digits such that the equation holds.

SEND

+MORE

MONEY

We can formulate this problem up as a query in much the same way that we formalized the map coloring problem presented in Chapter 3. First, we have a dataset listing the allowable digits.

digit(0)

digit(1)

digit(2)

digit(3)

digit(4)

digit(5)

digit(6)

digit(7)

digit(8)

digit(9)

Next, we write a query listing the eight letters as goal values, with subgoals to fix the ranges of these variables, disjointness constraints, and additional constraints to capture the arithmetic conditions. See below. In the interest of conciseness here, we have used ordinary arithmetic operators in place of the corresponding builtins, e.g. we have used E!=S for used distinct(E,S), and we have used used 1000*S for times(1000,S).

goal(S,E,N,D,M,O,R,Y) :-

digit(S) & digit(E) & digit(N) & digit(D) &

digit(M) & digit(O) & digit(R) & digit(Y) &

S!=0 & E!=S & N!=S & N!=E & D!=S & D!=E & D!=N &

M!=0 & M!=S & M!=E & M!=N & M!=D &

O!=S & O!=E & O!=N & O!=D & O!=M &

R!=S & R!=E & R!=N & R!=D & R!=M & R!=O &

Y!=S & Y!=E & Y!=N & Y!=D & Y!=M & Y!=O & Y!=R &

evaluate(1000*S+100*E+10*N+D),Send) &

evaluate(1000*M+100*O+10*R+E),More) &

evaluate(10000*M+1000*O+100*N+10*E+Y),Money) &

evaluate(plus(Send,More),Money)

Having expressed our goal in this way, we can use our evaluation procedure to generate the answer to this problem. However, this is not without difficulty. Given the way this query is written, the evaluation process will take a long time. There are 10⁸ = 100,000,000 possible variable bindings. In the worst case (where there is no solution), it would check all of these. On average, it would have to look at a substantial fraction.

The good news is that we can use subgoal ordering to transform this query into one that is easier to evaluate. We simply move the inequalities up to the point where the variables are bound. See below.

goal(S,E,N,D,M,O,R,Y) :-

digit(S) & S!=0 &

digit(E) & E!=S &

digit(N) & N!=S & N!=E &

digit(D) & D!=S & D!=E & D!=N &

digit(M) & M!=0 & M!=S & M!=E & M!=N & M!=D &

digit(O) & O!=S & O!=E & O!=N & O!=D & O!=M &

digit(R) & R!=S & R!=E & R!=N & R!=D & R!=M & R!=O &

digit(Y) & Y!=S & Y!=E & Y!=N & Y!=D & Y!=M & Y!=O & Y!=R &

evaluate(1000*S+100*E+10*N+D),Send) &

evaluate(1000*M+100*O+10*R+E),More) &

evaluate(10000*M+1000*O+100*N+10*E+Y),Money) &

evaluate(plus(Send,More),Money)

Having done this, we eliminate many of the possibilities before they are even generated. The upshot is that this query takes orders of magnitude less time to evaluate, on most computers taking no more than a fraction of a second.

Exercises

Exercise 4.1: For each of the following patterns and instances, say whether the instance matches the pattern and, if so, give the corresponding matcher.

(a)	`p(X,Y) and p(a,a)`
(b)	`p(X,Y)` and `p(a,f(a))`
(c)	`p(X,f(Y))` and `p(a,f(a))`
(d)	`p(X,X)` and `p(2,min(2,4))`
(e)	`p(X,min(2,4))` and `p(2,2)`

Exercise 4.2: Suppose that we want to find all goal(X,Y,Z) such that p(X,Y) & q(Y,Z). Select the formula that captures the worst case complexity of our standard evaluation algorithm for this query (assuming no indexing of the dataset). The symbol n here represents the total number of objects in the domain.

	(a)	2n² + 2n³
	(b)	2n² + 2n⁴
	(c)	2n² + n³ + n⁴

Exercise 4.3: For each of the queries shown below, select the expression that captures the worst case complexity of our standard evaluation algorithm without indexing. The symbol n represents the total number of objects in the domain.

`goal(X,Y) :- p(X,Y) & q(Y) & q(Z)`
	(a)	n⁴ + n³ + n² + n
	(b)	2n⁴ + 2n³ + n² + n
	(c)	2n⁴ + 2n³

`goal(X,Y) :- p(X,Y) & q(Y)`
	(a)	n⁴ + n³ + n² + n
	(b)	2n⁴ + 2n³ + n² + n
	(c)	2n⁴ + 2n³

Exercise 4.4: For each of the following groups of query rules, say which rule is best in terms of worst case evaluation complexity using our standard algorithm without indexing.

(a)

goal(X,Y,Z) :- p(X,Y) & q(X,X) & r(X,Y,Z)

goal(X,Y,Z) :- p(X,Y) & r(X,Y,Z) & q(X,X)

goal(X,Y,Z) :- q(X,X) & p(X,Y) & r(X,Y,Z)

goal(X,Y,Z) :- q(X,X) & r(X,Y,Z) & p(X,Y)

goal(X,Y,Z) :- r(X,Y,Z) & p(X,Y) & q(X,X)

goal(X,Y,Z) :- r(X,Y,Z) & q(X,X) & p(X,Y)

(b)

goal(X,Y,Z) :- p(X,Y) & q(a,b) & r(X,Y,Z)

goal(X,Y,Z) :- p(X,Y) & r(X,Y,Z) & q(a,b)

goal(X,Y,Z) :- q(a,b) & p(X,Y) & r(X,Y,Z)

goal(X,Y,Z) :- q(a,b) & r(X,Y,Z) & p(X,Y)

goal(X,Y,Z) :- r(X,Y,Z) & p(X,Y) & q(a,b)

goal(X,Y,Z) :- r(X,Y,Z) & q(a,b) & p(X,Y)

(c)

goal(X,Y,Z) :- p(X,Y,Z) & q(X) & ~r(X,Y)

goal(X,Y,Z) :- p(X,Y,Z) & ~r(X,Y) & q(X)

goal(X,Y,Z) :- q(X) & p(X,Y,Z) & ~r(X,Y)

goal(X,Y,Z) :- q(X) & ~r(X,Y) & p(X,Y,Z)

goal(X,Y,Z) :- ~r(X,Y) & q(X) & p(X,Y,Z)

goal(X,Y,Z) :- ~r(X,Y) & p(X,Y,Z) & q(X)

Exercise 4.5: For each of the following groups of query rules, select the alternative that is equivalent to the first rule in the group.

	(a)	`goal(X) :- p(X,Y) & q(Y) & q(Z)`
		`goal(X) :- p(X,Y)`
		`goal(X) :- p(X,Y) & q(Y)`
		`goal(X) :- p(X,Y) & q(Z)`

	(b)	`goal(X) :- p(X) & q(X) & q(W)`
		`goal(X) :- p(X)`
		`goal(X) :- p(X) & q(X)`
		`goal(X) :- p(X) & q(W)`

	(c)	`goal(X,Y,Z) :- p(X,Y) & q(Y) & q(Z) & q(W)`
		`goal(X,Y,Z) :- p(X,Y) & q(Y) & q(Z)`
		`goal(X,Y,Z) :- p(X,Y) & q(Y) & q(W)`
		`goal(X,Y,Z) :- p(X,Y) & q(Z) & q(W)`

Exercise 4.6: For each of the following pairs of queries, say whether the first query subsumes the second, i.e. whether the set of answers to the first query contains the answers to the second query.

	(a)	`goal(X) :- p(X,Y)`
		`goal(X) :- p(X,a) & p(X,b)`

	(b)	`goal(X) :- p(X,a)`
		`goal(X) :- p(X,Y)`

	(c)	`goal(X) :- p(X,Y) & p(X,Z)`
		`goal(X) :- p(X,b) & q(b)`