Tuple relational calculus

The tuple calculus is a calculus that was introduced by Edgar F. Codd as part of the relational model in order to give a declarative database query language for this data model. It formed the inspiration for the database query languages QUEL and SQL of which the latter, although far less faithful to the original relational model and calculus, is now used in almost all relational database management systems as the ad-hoc query language. Along with the tuple calculus Codd also introduced the domain calculus which is closer to first-order logic and showed that these two calculi (and the relational algebra) are equivalent in expressive power. Subsequently query languages for the relational model were called relationally complete if they could express at least all these queries.

Definition of the Calculus

Relational Database

Since the calculus is a query language for relational databases we first have to define what we consider a relational database. We first assume the existence of a set C of column names, examples of which are "name", "author", "address" et cetera. Then we define a relational database schema as a tuple (D, R, h) where D is the domain of atomic values (see relational model for more on the notions of domain and atomic value), R is a finite set of relation names, and h : R -> 2^C a function that associates a header, i.e., a set of column names, whith each relation name in R. (Note that this is a simplification from the full relational model where there is more than one domain and a header is not just a set of column names but also maps these column names to a domain.) Given a domain D we define a tuple over D as a partial function t : C -> D that maps some column names to an atomic value in D. An example would be (name : "Harry", age : 25). The set of all tuples over D is denoted as T_D. Finally we define a relational database given a schema S = (D, R, h) as a function db : R -> 2^T_S that maps the relation names in R to finite subsets of T_S such that for all relation name r in R and tuple t in db(r) the domain of t (i.e., the column names for which t is defined) equals h(r). The latter requirement simply says that all the tuples in a relation should contain the same column names namely those defined for it in the schema.

The variables in the calculus are tuple variables, which means that they will range over tuples as defined above. In fact, when we query a certain database db over schema (D, R, h) we will assume that the variables range over the tuples over the active domain of db, denoted as adom(db), which is the subset of D that is contained in any tuple in db.

Formulas

The atoms of our language are statements about tuple variables and given a schema S = (D, R, h) defined by the following (abstract) context-free grammar:

A ::= (V.C = V.C) | (V.C = D) | R(V)

where V denotes the set of tuple variables. Examples of atoms are

(t.name = "Codd") -- tuple t has a name attribute and its value is "Codd"
(t.age = s.age) -- t has an age attribute and s has an age attrbute with the same value
Book(t) -- tuple t is present in relation Book.

The formal semantics of such atoms is defined given a database db over S and a tuple variable binding B : V -> T_adom(db) that maps tuple variables to tuples over the active domain of db:

" v.a = w.b " is true iff B(v)(a) = B(w)(b)
" v.a = d " is true iff B(v)(a) = d
" r(v) " is true iff B(v) is in db(r)

The atoms can be combined into formulas, as is usual in first-order logic, with the logicial operators ∧ (and), ∨ (or) and ¬ (not), and we can use the existential quantifier (∃) and the universal quantifier (∀) to bind the variables:

F ::= A | ( F ∧ F ) | ( F ∨ F ) | ¬ F | ∃ V ( F ) | ∀ V ( F )

Examples of formulas:

t.name = "C. J. Date" ∨ t.name = "H. Darwen"
Book(t) ∨ Magazine(t)
∀ t ( ¬ ( Book(t) ∧ t.author = "C. J. Date" ∧ ¬ ( t.subject = "relational model")))

Note that the last formula states that all books that are written by C. J. Date have as their subject the relational model. As usual we omit brackets if this causes no ambiguity about the semantics of the formula.

We will assume that the quantifiers quantify over the universe of all tuples over the active domain of the queried database. This leads to the following formal semantics for formulas given a database db over S and a tuple variable binding B : V -> T_adom(db):

" f₁ ∧ f₂ " is true iff " f₁ " is true and " f₂ " is true,
" f₁ ∨ f₂ " is true iff " f₁ " is true or " f₂ " is true or both are true,
" ¬ f " is true iff " f " is not true,
" ∃ v ( f ) " is true if there is a tuple t over adom(db) such that for B[v:=t] the formula " f " is true where B[v:=t] is defined as the tuple variable binding equal to B except that it maps v to t, and
" ∀ v ( f ) " is true if for all tuples t over adom(db) the formula " f " is true for B[v:=t].

Queries

Finally we define what a query expression looks like:

Q ::= { ( a₁:v₁.b₁, ..., a_k:v_k.b_k ) | f(v₁, ..., v_k) }

where a₁, ..., a_k are k distinct column names, v₁, ..., v_k are (not necessarily distinct) k tuple variables, b₁, ..., b_k are (not necessarily distinct) k column names and f(v₁, ..., v_k) a formula with v₁, ..., v_k as its free variables.

Examples of query expressions are:

{ (name:t.name) | Employee(t) ∧ t.wage = 50.000 }
{ (supplier:s.name, article:p.name) | Supplier(s) ∧ Project(p) ∧ ∃ a ( Supplies(a) ∧ s.s# = a.s# ∧ a.p# = p.p# ) }

The formal semantics of a query expression { ( a₁:v₁.b₁, ..., a_k:v_k.b_k ) | f(v₁, ..., t_k) } for a database db is defined as a set of tuples that contains a tuple ( a₁:d₁, ..., a_k:d_k ) iff there is a binding of tuples variables B over db that satisfies f and (b_i, d_i) in B(v_i) for all 1 <= i <= k.

Well-typed Queries

Sometimes the result of query expression is always empty because it conflicts with the headers of the relations in the database. For example, if the header of the table Employee has the attribute names "empName" and "wage" then the query expression { (name:t.name) | Employee(t) ∧ t.wage = 50.000 } returns always the empty result. To prevent such query expressions often a primitive typing system is introduced where types are simply finite sets of attribute names. If we can assign to each variable such a type such that

if in the query expression t.a is written that the type of t contains a
if in the query expression R(t) is written then the type of t is exactly all attribute names of the relation R in the database schema
for formulas such as ∃ t ( f ) and ∀ t ( f ) we may locally in f assign another type to t

then we say that the query expression is well-typed.

Domain Independent and Safe Queries

The set V of atomic values that was assumed at the beginning, and from which all tuples in the database are built, is usually called the domain of a relational database. (See the relational model for more explanation of the term "domain") Note that not all values in the domain will be used in the database, in fact, since the domain is usually infinite and the database finite most of them are not be used. The subset of the domain that is used is referred to as the active domain of the database. It is possible to write a calculus expression that returns the complete domain:

{ (a:t.a) | t.a = t.a }

Ideally we would like to prevent such queries (with a possibly infinitely big result) and for this purpose the notion of domain independent queries is introduced, which are the queries that only depend upon the contents of the database and not upon the domain from which these contents are constructed. This means that the results of such a query for a given database contents should be the same if we take as its domain any superset of its active domain.

to be continued ...