Thursday, October 4, 2012

Algebraic Type Systems / Combinatorial Species 101

This post is not meant to be a formal introduction into type theory or combinatorial species, but rather attempts to introduce some of the intuition behind the theory. Readers should have a basic understanding of naïve set theory, calculus, and data types in modern programming languages.

Introduction

Let us begin by examining how types manifest themselves in programming. Many C-like programming languages provide basic types called int, float, char, and bool representing integers, floating point numbers, character literals, and booleans, respectively. Many languages also provide more complicated composite types, which are built up from simpler types. Simple examples include pair< x, y >, which consists of a value of type x and a value of type y, or a list< x > containing elements of type x. (In this article we will use uppercase letters \( X, Y, Z, \ldots \) to denote arbitrary types; feel free to substitute these abstract types with concrete types like int or char if it makes it easier to understand.)

In mathematical language, we could equivalently define \( P(X,Y) \) (representing a pair< x, y >) to be the set \( X \times Y \). Algebraic type theory allows us to represent these mathematical definitions as algebraic expressions, which we will call type expressions. In this case, the type expression for a pair is $$ P( X, Y ) = X \times Y = X Y $$ (Such a type is called a product type.) These expressions can be manipulated like normal algebraic expressions -- for example, a pair of two pairs pair< pair< x, x >, pair< y, z > > could be represented as $$ P( P( X, X ), P( Y, Z ) ) = ( X \times X ) \times ( Y \times Z ) = X^2 Y Z $$

A more complicated question is how to represent lists. If a list< x > contains exactly \( n \) elements, then it has the type \( X^n \); however, \( n \) is not fixed, so it is not clear how to represent lists of arbitrary length. We solve this problem by introducing another bit of notation, namely the \( + \) operator, representing alternation. The type expression \( T( A, B ) = A + B \) means that a variable of type \( T( A, B ) \) can take on a value of either type \( A \) or type \( B \). Mathematically, we could write \( T( A, B ) = A \sqcup B \), where \( \sqcup \) denotes the disjoint union operator. These sum types manifest themselves in the rarely-used union structure in C.

Now, equipped with this notation, we can write a type expression for list< x >: $$ L(X) = X^0 + X^1 + X^2 + X^3 + \ldots $$ Since \( X^n \) denotes a structure containing \( n \) values, the above expression indicates that lists can contain any nonnegative number of values. But wait, what does \( X^0 \) mean here? In normal arithmetic, \( X^0 = 1 \); as it turns out, this definition also makes sense in type expressions. We denote the symbol \( 1 \) to be a type which can take on exactly one, constant value. This is often called a unit type. Here \( X^0 = 1 \) denotes an empty list; since all empty lists can be regarded as equal (mathematically, at least), it makes sense that they form a unit type.

A unit type is trivial in the sense that it yields no additional information -- it can only be a constant value. Thus, when we write $$ Pair( 1, X ) = 1 \times X = X $$ the expression indicates that pair consisting of a constant and an \( X \) can be completely described by specifying a value in \( X \).

What else can we do with unit types? For starters, we could consider adding them together. As we all learned in kindergarten, $$ 1 + 1 = 2 $$ But what does this mean in terms of types? Well \( 2 \) seems to a type which can either be one constant, or a different constant; if we call the first constant true and the second constant false, then we see that \( 2 \) is the same as the boolean type. We can similarly represent the set of \( 32 \)-bit integers with the expression \( 1 + 1 + \ldots + 1 = 2^{32} \); we can view this expression as being able to choose one value out of \( 4294967296 \) possible options, or alternatively, fixing \( 32 \) bits, each of which can take on \( 2 \) values.

Just as we can define \( 1 \) to be the multiplicative identity, we can also define \( 0 \) to be the additive identity. I.e., for any \( X \), $$ 0 + X = X $$ \( 0 \) is referred to as the bottom type, and it is a type with no values whatsoever. Thus, the expression \( 0 + X \), meaning "either nothing or \( X \)", is the same as just \( X \).

Recursive Data Types and Generating Functions

Earlier we defined \( L( X ) \) with the following type expression: $$ L( X ) = 1 + X + X^2 + X^3 + \ldots $$ It is also possible to define lists recursively in the following manner: $$ L( X ) = 1 + X \times L( X ) $$ We read the expression thus: a list is either an empty list (the \( 1 \)), or a node which holds one element (the \( X \)) and points to a successor list (the \( L( X ) \)). This is the standard way of defining singly-linked lists in Lisp. Alternatively, in C++, this could be written

class node
{
 int* value;
 node* next;
};
typedef node* list;
list EMPTY = NULL;

What is interesting about this recursive definition is that it is exactly the same as the previous definition. If we repeatedly expand the expression, we get $$ L( X ) = 1 + X \times L( X ) = 1 + X ( 1 + X \times L( X ) ) = 1 + X + X^2 \times L( X ) = 1 + X + X^2 + X^3 \times L( X ) = \ldots $$

Furthermore, the type expression is a special polynomial called a generating function. A generating function is a polynomial, such that the coefficient of \( X^n \) is some special function of \( n \). Whenever we write a type expression for some container \( C( X ) \), the coefficient of \( X^n \) usually denotes the number of 'differently-shaped' containers that hold exactly \( n \) \( X \)'s. In this case, every singly linked list of length \( n \) has the same shape. Therefore, the coefficient of \( X^n \) is 1.

A simple data structure which could have many different shapes for some fixed size is a binary tree. For example, a binary tree of size \( 3 \) could have the following shapes:

If we let \( T( X ) \) denote the type expression for binary trees, then we know that the coefficient of \( X^3 \) in \( T( X ) \) is \( 5 \), since there are exactly \( 5 \) binary trees of size \( 3 \).

As with lists, we can write type expressions for binary trees. One possible way to define a binary tree is either an empty node, or a node containing an element and two children. This gives us the expression $$ T( X ) = 1 + X \times T( X )^2 $$ Notice that this is a standard quadratic equation in terms of \( T( X ) \). Suspending disbelief for a moment, we can actually solve for a solution to \( T( X ) \): $$ T( X ) = \frac{ 1 \pm \sqrt{ 1 - 4 X } }{ 2 X } $$ Here, the square root of \( 1 - 4 X \) obviously makes no sense in the standard numerical interpretation of the square root; rather, we take \( \sqrt{ 1 - 4 X } \) to mean the power series of \( X \), that when squared, is equal to \( 1 - 4 X \). In this case, we can simply take the Taylor series of the expression, which gives us $$ \frac{ 1 \pm \sqrt{ 1 - 4 X } }{ 2 X } = 1 \mp X \mp 2 X^2 \mp 5 X^3 \mp 14 X^4 \mp 42 X^5 \mp \ldots $$ The 'minus' root is the only one that makes sense here, so we have $$ T( X ) = 1 + X + 2 X^2 + 5 X^3 + 14 X^4 + 42 X^5 + \ldots $$

As a sanity check, we can verify that there is exactly \( 1 \) tree of size \( 0 \), \( 1 \) tree of size \( 1 \), \( 2 \) trees of size \( 2 \), and \( 5 \) trees of size \( 3 \). Astute readers may have already recognised that \( 1, 1, 2, 5, 14, 42 \) are the first few Catalan numbers, a sequence which appears often in combinatorics. One of the interesting properties of the Catalan numbers is that the \( n \)-th Catalan number corresponds exactly to the number of binary trees of size \( n \). By expanding a type expression for binary trees, we have come up with a generating function for Catalan numbers!

Differentiating Data Types

Surprisingly enough, we can actually perform a variation of differentiation on data structures. In functional programming, it is often useful to consider a data structure with a 'hole' punched in it; i.e., if we take a single value out of a container and mark the spot with a placeholder. For example, in a list< int >, we can take out an int at some point in the list and keep a pointer to the position where we removed the int. These modified data structures are called one-hole contexts because one you 'fill in' the hole with a concrete value, you get a data structure of the same shape as the original.

We begin by examining the lowly pair \( P( X, X ) \), which has the type \( X^2 \). Suppose we pluck out one of the values in the pair -- how do we represent what remains? If we remove either element, only one element remains; this can be represented with an \( X \). However, we also have to keep track of which element we removed -- a hole in the left element is not the same as a hole in the right element. Thus, the one-hole context of \( P( X, X ) \) is \( X + X = 2 X \). You may alternatively think of \( 2 X \) as a tuple containing a boolean (denoting whether the left or right element was removed) and an \( X \) (representing the remaining element). Interestingly enough, \( 2 X \) happens to be the derivative of \( X^2 \) with respect to \( X \). As it turns out, this is true in general -- it has been proven that one-hole contexts have an exact correspondence with the derivatives of type expressions!

A more complicated example is the list \( L( X ) \). First, let's compute the derivative of \( L( X ) \); we begin by rewriting $$ 1 + X + X^2 + X^3 + \ldots = \frac{ 1 }{ 1 - X } $$ Then, $$ L'( X ) = \frac{ d }{ d X } \frac{ 1 }{ 1 - X } = \left( \frac{ 1 }{ 1 - X } \right)^2 = L( X )^2 $$ Whenever we punch a hole in a list, it leaves a hole, along with the lists before and after it. Thus, the two lists in \( L( X )^2 \) represent, respectively, the prefix and suffixes of the one-hole context. We can thus 'fill in' the context and construct a new list by concatenating the prefix, some value \( X \), and the suffix.

Likewise, we can perform differentiation on binary trees: $$ T'( X ) = T( X )^2 + 2 X \times T( X ) \times T'( X ) $$ This definition is somewhat more cryptic, but it can be deciphered with the same type of analysis that we used earlier. Given the root node of a binary tree with a hole in it, either

  1. The hole lies in the root node, in which case our node has form \( T( X )^2 \).
  2. Or the hole lies in either the left or right subtree; in this case the root node can be described by specifying its content \( X \), a complete subtree \( T( X ) \), and an incomplete subtree \( T'( X ) \). Since there are two choices for where the hole lies, we multiply this term by 2.

Further Reading

  1. Conor McBride's original paper demonstrating that one-hole contexts are equivalent to derivatives.
  2. Wikipedia's articles on algebraic type theory and combinatorial species, an equivalent way of defining types.
  3. A series of two articles (1), (2) on Lab49 about differentiating data structures.