In addition to streams, data values are often stored in large repositories called databases. A database consists of a data store containing structured data values and an interface for retrieving subsets of the data based on their characteristics. Each value stored in a database is called a record. Records are typically retrieved via a query, which is an expression in a query programming language. By far the most ubiquitous query language in use today is called Structured Query Language or SQL (pronounced "sequel").
SQL is an example of a declarative programming language. Expressions do not describe computations directly, but instead state the form of the result of some computation. It is the role of the query interpreter of the database system to design and perform a computational process to produce such a result.
This interaction differs substantially from the procedural programming paradigm of Python or Scheme. In Python, computational processes are described directly by the programmer. A declarative language specifies the form of the result, but abstracts away procedural details.
In this section, we introduce a declarative query language called logic, designed specifically for this text. It is based upon Prolog and the declarative language in Structure and Interpretation of Computer Programs. Data records are expressed as Scheme lists, and queries are expressed as Scheme values. The logic is a complete implementation that depends upon the Scheme project of the previous chapter.
Databases store records that represent facts in the system. The purpose of the query interpreter is to retrieve collections of facts drawn directly from database records, as well as to deduce new facts from the database using logical inference. A fact statement in the logic language consists of one or more lists following the keyword fact. A simple fact is a single list. A dog breeder with an interest in U.S. Presidents might record the genealogy of her collection of dogs using the logic language as follows:
logic> (fact (parent abraham barack)) logic> (fact (parent abraham clinton)) logic> (fact (parent delano herbert)) logic> (fact (parent fillmore abraham)) logic> (fact (parent fillmore delano)) logic> (fact (parent fillmore grover)) logic> (fact (parent eisenhower fillmore))
Each fact is not a procedure application, as in a Scheme expression, but instead a relation that is declared. "The dog Abraham is the parent of Barack," declares the first fact. Relation types do not need to be defined in advance. Relations are not applied, but instead matched to queries.
A query also consists of one or more lists, but begins with the keyword query. A query may contain variables, which are symbols that begin with a question mark. Variables are matched to facts by the query interpreter:
logic> (query (parent abraham ?child)) Success! child: barack child: clinton
The query interpreter responds with Success! to indicate that the query matches some fact. The following lines show substitutions of the variable ?child that match the query to the facts in the database.
Compound facts. Facts may also contain variables as well as multiple sub-expressions. A multi-expression fact begins with a conclusion, followed by hypotheses. For the conclusion to be true, all of the hypotheses must be satisfied:
(fact <conclusion> <hypothesis0> <hypothesis1> ... <hypothesisN>)
For example, facts about children can be declared based on the facts about parents already in the database:
logic> (fact (child ?c ?p) (parent ?p ?c))
The fact above can be read as: "?c is the child of ?p, provided that ?p is the parent of ?c." A query can now refer to this fact:
logic> (query (child ?child fillmore)) Success! child: abraham child: delano child: grover
The query above requires the query interpreter to combine the fact that defines child with the various parent facts about fillmore. The user of the language does not need to know how this information is combined, but only that the result has a particular form. It is up to the query interpreter to prove that (child abraham fillmore) is true, given the available facts.
A query is not required to include variables; it may simply verify a fact:
logic> (query (child herbert delano)) Success!
A query that does not match any facts will return failure:
logic> (query (child eisenhower ?parent)) Failure.
The logic language also allows recursive facts. That is, the conclusion of a fact may depend upon a hypothesis that contains the same symbols. For instance, the ancestor relation is defined with two facts. Some ?a is an ancestor of ?y if it is a parent of ?y or if it is the parent of an ancestor of ?y:
logic> (fact (ancestor ?a ?y) (parent ?a ?y)) logic> (fact (ancestor ?a ?y) (parent ?a ?z) (ancestor ?z ?y))
A single query can then list all ancestors of herbert:
logic> (query (ancestor ?a herbert)) Success! a: delano a: fillmore a: eisenhower
Compound queries. A query may have multiple subexpressions, in which case all must be satisfied simultaneously by an assignment of symbols to variables. If a variable appears more than once in a query, then it must take the same value in each context. The following query finds ancestors of both herbert and barack:
logic> (query (ancestor ?a barack) (ancestor ?a herbert)) Success! a: fillmore a: eisenhower
Recursive facts may require long chains of inference to match queries to existing facts in a database. For instance, to prove the fact (ancestor fillmore herbert), we must prove each of the following facts in succession:
(parent delano herbert) ; (1), a simple fact (ancestor delano herbert) ; (2), from (1) and the 1st ancestor fact (parent fillmore delano) ; (3), a simple fact (ancestor fillmore herbert) ; (4), from (2), (3), & the 2nd ancestor fact
In this way, a single fact can imply a large number of additional facts, or even infinitely many, as long as the query interpreter is able to discover them.
Hierarchical facts. Thus far, each fact and query expression has been a list of symbols. In addition, fact and query lists can contain lists, providing a way to represent hierarchical data. The color of each dog may be stored along with the name an additional record:
logic> (fact (dog (name abraham) (color white))) logic> (fact (dog (name barack) (color tan))) logic> (fact (dog (name clinton) (color white))) logic> (fact (dog (name delano) (color white))) logic> (fact (dog (name eisenhower) (color tan))) logic> (fact (dog (name fillmore) (color brown))) logic> (fact (dog (name grover) (color tan))) logic> (fact (dog (name herbert) (color brown)))
Queries can articulate the full structure of hierarchical facts, or they can match variables to whole lists:
logic> (query (dog (name clinton) (color ?color))) Success! color: white logic> (query (dog (name clinton) ?info)) Success! info: (color white)
Much of the power of a database lies in the ability of the query interpreter to join together multiple kinds of facts in a single query. The following query finds all pairs of dogs for which one is the ancestor of the other and they share a color:
logic> (query (dog (name ?name) (color ?color)) (ancestor ?ancestor ?name) (dog (name ?ancestor) (color ?color))) Success! name: barack color: tan ancestor: eisenhower name: clinton color: white ancestor: abraham name: grover color: tan ancestor: eisenhower name: herbert color: brown ancestor: fillmore
Variables can refer to lists in hierarchical records, but also using dot notation. A variable following a dot matches the rest of the list of a fact. Dotted lists can appear in either facts or queries. The following example constructs pedigrees of dogs by listing their chain of ancestry. Young barack follows a venerable line of presidential pups:
logic> (fact (pedigree ?name) (dog (name ?name) . ?details)) logic> (fact (pedigree ?child ?parent . ?rest) (parent ?parent ?child) (pedigree ?parent . ?rest)) logic> (query (pedigree barack . ?lineage)) Success! lineage: () lineage: (abraham) lineage: (abraham fillmore) lineage: (abraham fillmore eisenhower)
Declarative or logical programming can express relationships among facts with remarkable efficiency. For example, if we wish to express that two lists can append to form a longer list with the elements of the first, followed by the elements of the second, we state two rules. First, a base case declares that appending an empty list to any list gives that list:
logic> (fact (append-to-form () ?x ?x))
Second, a recursive fact declares that a list with first element ?a and rest ?r appends to a list ?y to form a list with first element ?a and some appended rest ?z. For this relation to hold, it must be the case that ?r and ?y append to form ?z:
logic> (fact (append-to-form (?a . ?r) ?y (?a . ?z)) (append-to-form ?r ?y ?z))
Using these two facts, the query interpreter can compute the result of appending any two lists together:
logic> (query (append-to-form (a b c) (d e) ?result)) Success! result: (a b c d e)
In addition, it can compute all possible pairs of lists ?left and ?right that can append to form the list (a b c d e):
logic> (query (append-to-form ?left ?right (a b c d e))) Success! left: () right: (a b c d e) left: (a) right: (b c d e) left: (a b) right: (c d e) left: (a b c) right: (d e) left: (a b c d) right: (e) left: (a b c d e) right: ()
Although it may appear that our query interpreter is quite intelligent, we will see that it finds these combinations through one simple operation repeated many times: that of matching two lists that contain variables in an environment.
Continue: 4.4 Unification