Chapter 5 - Query Evaluation

Introduction to
Logic Programming

What
versus
How

5.1 Introduction

In Chapter 3, we described the semantics of queries in terms of instances of query rules. While the definition is easy to understand and mathematically precise, enumerating instances is not a practical method for computing answers to queries. In this chapter, we present an algorithm that produces the same results but in a more efficient manner.

We begin this chapter with a discussion of evaluation of queries without variables. In Section 5.3, we look at a way of comparing expressions containing variables. In the section after that, we show how to combine that technique with the procedure described here to produce an evaluation procedure for queries with variables. We close with an analysis of the computational complexity of our evaluation algorithm.

5.2 Evaluating Ground Queries

If a query contains multiple query rules, we check whether the body of each rule is true. If so, we add the head of the rule to the extension of our query. The procedure for determining whether the body is true depends on the type of the body.

If the body of a query rule is a single atom, we check whether that atom is contained in our dataset. If so, the body is true.

If the body is a negated atom, we check whether the atom is contained in our dataset. If so, the body is false. If the atom is not contained in our dataset, then the body is true.

If the body is a conjunction of literals, we first execute this procedure on the first conjunct. If the answer is true, we move on to the next conjunct and so forth until we are done. If the answer to any one of the conjuncts is false, then the value of the body as a whole is false.

Consider the dataset shown below. There are four symbols a, b, c, and d; and there is a single binary predicate p.

p(a,b)

p(b,c)

p(c,d)

Now, imagine that we are asked to evaluate the query shown below. In this case there are three rules. To compute all answers, we execute our procedure on each of these rules.

goal(a) :- p(a,c)

goal(b) :- p(a,b) & p(b,a)

goal(c) :- p(c,d) & ~p(d,c)

In the first rule, the body p(a,c) is an atom, so we just check whether or not it is in the dataset. Since it is not there, nothing is contributed to our result.

The body of the second rules is a conjunction p(a,b) & p(b,a), and so we evaluate the conjuncts in order to see whether they are all true. In this case, the first conjunct is true but the second is false, so the conjunction as a whole is false and again nothing is contributed to our result.

The body of the third rule is also a conjunction p(c,d) & ~p(d,c). Again, we check the conjuncts in turn. p(c,d) is true, so we move on to ~p(d,c). p(d,c) is false so the negation is true. Since both conjuncts of the conjunction are true, the conjunction as a whole is true. Consequently, we add the head of our rule goal(c) to our result.

5.3 Matching

Matching is the process of determining whether a pattern (an expression with or without variables) matches an instance (an expression without variables), i.e. whether the two expressions can be made identical by appropriate substitutions for the variables in the pattern.

A substitution is a finite set of bindings of variables to terms. In what follows, we write substitutions as sets of replacement rules, like the one shown below. In each rule, the variable to which the arrow is pointing is to be replaced by the term from which the arrow is pointing. In this case, X is associated with a and Y is associated with b.

{X←a, Y←b}

The result of applying a substitution σ to an expression φ is the expression φσ obtained from φ by replacing every occurrence of every variable with a binding in the substitution by the term to which it is bound.

`p(X,b)`{`X`←`a`, `Y`←`b`}	=	`p(a,b)`
`q(X,Y,X)`{`X`←`a`, `Y`←`b`}	=	`q(a,b,a)`

A substitution is a matcher for a pattern and an instance if and only if applying the substitution to the pattern results in the given instance. One good thing about our language is that there is a simple and inexpensive procedure for computing a matcher of a pattern and an expression if it exists.

The procedure assumes a representation of expressions as sequences of subexpressions. For example, the expression p(X,b) can be thought of as a sequence with three elements, viz. the predicate p, the variable X, and the symbol b.

We start the procedure with two expressions and a substitution (which is initially empty). We then recursively process the two expressions, comparing the subexpressions at each point. Along the way, we expand the substitution with variable assignments as described below.

If the pattern is a symbol and the instance is the same symbol, then the procedure succeeds, returning the unmodified substitution as result.

If the pattern is a symbol and the instance is a different symbol or a compound expression, then the procedure fails.

If the pattern is a variable with a binding, we compare the binding for the variable with the given instance. If they are identical, the procedure succeeds, returning the unmodified substitution as result; otherwise it fails.

If the pattern is a variable without a binding, we include a binding for the variable in the given instance and we return that substitution as a result.

If the pattern is a compound expression and the instance is a compound expression of the same length, we iterate across the pattern and the instance.

If the pattern is a compound expression and the instance is a symbol or a compound expression of a different length, the procedure fails.

If we fail to match a sub-pattern and a sub-instance at any point in this process, the procedure as a whole fails. If we finish this recursive comparison of the pattern and the instance, the procedure as a whole succeeds and the accumulated substitution at that point is the resulting matcher.

As an example of this procedure in operation, consider the process of matching the pattern p(X,Y) and the instance p(a,b) with the initial substitution {}. A trace of the execution of the procedure for this case is shown below. We show the beginning of a comparison with a line labelled Compare together with the expressions being compared and the input substitution. We show the result of each comparison with a line labelled Result. The indentation shows the depth of recursion of the procedure.

Compare: p(X,Y), p(a,b), {}

Compare: p, p, {}

Result: {}

Compare: X, a, {}

Result: {X←a}

Compare: Y, b, {X←a}

Result: {X←a, Y←b}

Result: {x←a, y←b}

As another example, consider the process of matching the pattern p(X,X) and the instance p(a,a). A trace is shown below. By the time we compare the last arguments in the two expressions, X is bound to a, so the match succeeds.

Compare: p(X,X), p(a,a), {}

Compare: p, p, {}

Result: {}

Compare: X, a, {}

Result: {X←a}

Compare: X, a, {X←a}

Result: {X←a}

As a final example, consider the process of trying to match the pattern p(X,X) and the expression p(a,b). The main interest in this example comes in comparing the last argument in the two expressions, viz. X and b. By the time we reach this point, X is bound to a and the corresponding instance is b. Since the pattern is a symbol and the instance is a different symbol, the match attempt fails.

Compare: p(X,X), p(a,b), {}

Compare: p, p, {}

Result: {}

Compare: X, a, {}

Result: {X←a}

Compare: X, b, {X←a}

Result: false

This matching procedure is quite simple. However, it is worth understanding thoroughly, as it is the basis for more complicated matching procedures defined and used in later chapters.

5.4 Evaluating Queries With Variables

Evaluating queries with variables is complicated by the fact that there can be multiple variable bindings that make the body of a rule true; and, consequently, there can be multiple possible answers. In some cases, we want just a single answer; in some cases, we want several answers; and, in other cases, we want all answers. In what follows, we talk about a procedure for generating all answers. The procedures for the other cases are analogous.

In finding the extension for an arbitrary query rule (i.e. one with or without variables), we start with the query rule and an empty substitution. Rather than simply checking whether the body is true or false, as in the ground case, we compute the set of all variable bindings for which the body is true and, for each of these, we include in our extension the result of applying that variable binding to the head of the rule. The procedure for computing these variable bindings depends on the type of the body of the rule.

If the body of the rule is an atom, we try matching the atom to the factoids in our dataset. For each factoid that matches the atom, we add the corresponding substitution to our answer set; and we return the set of all substitutions obtained in this way.

If the query is a negation, we execute our procedure on the argument of the negation and the given substitution. If the result is a non-empty set (i.e. there are substitutions that work), then the negation is false and we return false as answer. If the result of the recursive execution is the empty set (i.e. there are no substitutions that work), then the negation as a whole is true and we return the singleton set containing the given substitution as a result.

If our query is a conjunction, we execute our procedure on the first conjunct and the given substitution. We then iterate over the list of answers, for each substitution executing the procedure on the remaining conjuncts and that substitution and return the resulting substitutions.

To illustrate this procedure, consider the dataset shown below.

p(a,b)

p(a,c)

p(b,c)

p(c,d)

Now, consider the query goal(Y) :- p(a,Y). The pattern p(a,Y) matches two factoids in our dataset, and so there are two results.

goal(b)

goal(c)

Suppose instead we had the query rule goal(Y) :- p(a,Y) & p(Y,d). Once again, the pattern p(a,Y) matches just two factoids in our dataset. Given {Y←b}, the pattern p(Y,d) does not match any factoids; given {Y←c}, the pattern p(Y,d) matches just p(c,d). Thus, there is just one answer in this case.

goal(c)

Given the conjunctive query goal(Y) :- p(a,Y) & ~p(Y,d), we would again find two answers to the first conjunct, viz. {Y←b} and {Y←c}. Given the first of these, the negation ~p(Y,d)is satisfied, and so the conjunction is true. Given the second answer to the first conjunct, the negation fails and so there is no answer in this case. As with the last query, there is just one answer.

goal(b)

Finally, for a query with more than one rule, we would get the union of the answers to the individual rules.

5.5 Computational Analysis

One nice feature of our query evaluation algorithm is that computational analysis is straightforward. In this section, we assume a standard left-to-right implementation of evaluation with no indexing of datasets and no caching of results once they are computed.

Consider the query shown below.

goal(a,c) :- p(a,Y) & p(Y,c)

What is the cost of evaluating this query? In the worst case, there are n^2 facts in the database, where n is the number of ground terms in our language. So, we need n^2 steps to evaluate the first conjunct. There are at most n facts that have a as first argument. For each of these, we look at n^2 possibilities for the second conjunct. Hence, the cost of computing the instance can be expressed as shown below.

n² + n*n² = n² + n³

Now consider the general version of this query shown below.

goal(X,Z) :- p(X,Y) & p(Y,Z)

What is the cost of computing all answers to this query? In the worst case, there are n^2 facts in the database, where n is the number of objects in the domain. So, we need n^2 steps to evaluate the first subgoal. There are at most n^2 possible values for Y. For each of these, we look at n^2 possibilities for the second subgoal. The resulting cost is shown below.

n² + n²*n² = n² + n⁴

Before leaving our analysis of complexity, it is instructive to compare the cost of computing answers using this algorithm with the cost of computing answers in accordance with the semantics described in the preceding chapter, i.e. by enumerating all instances of rules and, for each such instance, checking whether the body of each such instance is true or false.

In the preceding example, the query has just three variables. Consequently, for a domain with n objects, there are n³ instances. To evaluate each of these instances, we must compare each of our two subgoals to every factoid in our dataset, and in the worst case there are n² of these. The overall cost is shown below.

n³(n² + n²) = n⁵ + n⁵ = 2n⁵

Our algorithm is clearly better in this case, and the relative benefits are greater when we consider sparse datasets, i.e. datasets that do not include all possible factoids. In such cases, the "semantics-based" algorithm must still look at all instances, but our algorithm needs to look at only those instances that are derived form factoids in the dataset.

Note that, in the presence of dataset indexing or caching of results, the details of these analyses are likely to be different, but the style of analysis is the same and the relative merits of the two algorithms are more or less the same.

Exercises

Exercise 5.1: For each of the following patterns and instances, say whether the instance matches the pattern and, if so, give the corresponding matcher.

(a)	`p(X,Y) and p(a,a)`
(b)	`p(X,Y)` and `p(a,f(a))`
(c)	`p(X,f(Y))` and `p(a,f(a))`
(d)	`p(X,X)` and `p(2,min(2,4))`
(e)	`p(X,min(2,4))` and `p(2,2)`

Exercise 5.2: Suppose that we want to find all goal(X,Y,Z) such that p(X,Y) & q(Y,Z). Select the formula that captures the worst case complexity of our standard evaluation algorithm for this query (assuming no indexing of the dataset). The symbol n here represents the total number of objects in the domain.

	(a)	2n² + 2n³
	(b)	2n² + 2n⁴
	(c)	2n² + n³ + n⁴

Exercise 5.3: For each of the queries shown below, select the expression that captures the worst case complexity of our standard evaluation algorithm without indexing. The symbol n represents the total number of objects in the domain.

`goal(X,Y) :- p(X,Y) & q(Y) & q(Z)`
	(a)	n⁴ + n³ + n² + n
	(b)	2n⁴ + 2n³ + n² + n
	(c)	2n⁴ + 2n³

`goal(X,Y) :- p(X,Y) & q(Y)`
	(a)	n⁴ + n³ + n² + n
	(b)	2n⁴ + 2n³ + n² + n
	(c)	2n⁴ + 2n³