CS452 Foundations of Software
 Writing a parser with parser combinators
 Arithmetic expressions — abstract syntax and proof principles
 Lambda calculus
⚠ Work in progress
Writing a parser with parser combinators
In Scala, you can (ab)use the operator overload to create an embedded DSL (EDSL) for grammars. While a grammar may look as follows in a grammar description language (Bison, Yak, ANTLR, …):
1
2
3
Expr ::= Term {'+' Term  '−' Term}
Term ::= Factor {'∗' Factor  '/' Factor}
Factor ::= Number  '(' Expr ')'
In Scala, we can model it as follows:
1
2
3
def expr: Parser[Any] = term ~ rep("+" ~ term  "−" ~ term)
def term: Parser[Any] = factor ~ rep("∗" ~ factor  "/" ~ factor)
def factor: Parser[Any] = "(" ~ expr ~ ")"  numericLit
This is perhaps a little less elegant, but allows us to encode it directly into our language, which is often useful for interop.
The ~
, 
, rep
and opt
are parser combinators. These are primitives with which we can construct a full parser for the grammar of our choice.
Boilerplate
First, let’s define a class ParseResult[T]
as an adhoc monad; parsing can either succeed or fail:
1
2
3
sealed trait ParseResult[T]
case class Success[T](result: T, in: Input) extends ParseResult[T]
case class Failure(msg : String, in: Input) extends ParseResult[Nothing]
👉
Nothing
is the bottom type in Scala; it contains no members, and nothing can extend it
Let’s also define the tokens produced by the lexer (which we won’t define) as case classes extending Token
:
1
2
3
4
5
sealed trait Token
case class Keyword(chars: String) extends Token
case class NumericLit(chars: String) extends Token
case class StringLit(chars: String) extends Token
case class Identifier(chars: String) extends Token
Input into the parser is then a lazy stream of tokens (with positions for error diagnostics, which we’ll omit here):
1
type Input = Reader[Token]
We can then define a standard, sample parser which looks as follows on the typelevel:
1
2
3
class StandardTokenParsers {
type Parser = Input => ParseResult
}
The basic idea
For each language (defined by a grammar symbol S
), define a function f
that, given an input stream i
(with tail i'
):
 if a prefix of
i
is inS
, returnSuccess(Pair(x, i'))
, wherex
is a result forS
 otherwise, return
Failure(msg, i)
, wheremsg
is an error message string
The first is called success, the second is failure. We can compose operations on this somewhat conveniently, like we would on a monad (like Option
).
Simple parser primitives
All of the above boilerplate allows us to define a parser, which succeeds if the first token in the input satisfies some given predicate pred
. When it succeeds, it reads the token string, and splits the input there.
1
2
3
4
5
def token(kind: String)(pred: Token => boolean) = new Parser[String] {
def apply(in : Input) =
if (pred(in.head)) Success(in.head.chars, in.tail)
else Failure(kind + " expected ", in)
}
We can use this to define a keyword parser:
1
2
3
4
implicit def keyword(chars: String) = token("'" + chars + "'") {
case Keyword(chars1) => chars == chars1
case _ => false
}
Marking it as implicit
allows us to write keywords as normal strings, where we can omit the keyword
call (this helps us simplify the notation in our DSL; we can write "if"
instead of keyword("if")
).
We can make other parsers for our other case classes quite simply:
1
2
3
def numericLit = token("number")( .isInstanceOf[NumericLit])
def stringLit = token("string literal")( .isInstanceOf[StringLit])
def ident = token("identifier")( .isInstanceOf[Identifier])
Parser combinators
We are going to define the following parser combinators:

~
: sequential composition 
<~
,>~
: sequential composition, keeping left / right only 

: alternative 
opt(X)
: option (like a?
quantifier in a regex) 
rep(X)
: repetition (like a*
quantifier in a regex) 
repsep(P, Q)
: interleaved repetition 
^^
: result conversion (like amap
on anOption
) 
^^^
: constant result (like amap
on anOption
, but returning a constant value regardless of result)
But first, we’ll write some very basic parser combinators: success
and failure
, that respectively always succeed and always fail:
1
2
3
4
5
6
7
def success[T](result: T) = new Parser[T] {
def apply(in: Input) = Success(result, in)
}
def failure(msg: String) = new Parser[Nothing] {
def apply(in: Input) = Failure(msg, in)
}
All of the above are methods on a Parser[T]
class. Thanks to infix space notation in Scala, we can denote x.y(z)
as x y z
, which allows us to simplify our DSL notation; for instance A ~ B
corresponds to A.~(B)
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
abstract class Parser[T] {
// An abstract method that defines the parser function
def apply(in : Input): ParseResult
def ~[U](rhs: Parser[U]) = new Parser[T ~ U] {
def apply(in: Input) = Parser.this(in) match {
case Success(x, tail) => rhs(tail) match {
case Success(y, rest) => Success(new ~(x, y), rest)
case failure => failure
}
case failure => failure
}
}
def (rhs: => Parser[T]) = new Parser[T] {
def apply(in : Input) = Parser.this(in) match {
case s1 @ Success(_, _) => s1
case failure => rhs(in)
}
}
def ^^[U](f: T => U) = new Parser[U] {
def apply(in : Input) = Parser.this(in) match {
case Success(x, tail) => Success(f(x), tail)
case x => x
}
}
def ^^^[U](r: U): Parser[U] = ^^(x => r)
}
👉 In Scala,
T ~ U
is syntactic sugar for~[T, U]
, which is the type of the case class we’ll define below
For the ~
combinator, when everything works, we’re using ~
, a case class that is equivalent to Pair
, but prints the way we want to and allows for the concise typelevel notation above.
1
2
3
case class ~[T, U](_1 : T, _2 : U) {
override def toString = "(" + _1 + " ~ " + _2 +")"
}
At this point, we thus have two different meanings for ~
: a function ~
that produces a Parser
, and the ~(a, b)
case class pair that this parser returns (all of this is encoded in the function signature of the ~
function).
Note that the 
combinator takes the righthand side parser as a callbyname argument. This is because we don’t want to evaluate it unless it is strictly needed—that is, if the lefthand side fails.
^^
is like a map
operation on Option
; P ^^ f
succeeds iff P
succeeds, in which case it applies the transformation f
on the result of P. Otherwise, it fails.
Shorthands
We can now define shorthands for common combinations of parser combinators:
1
2
3
4
5
6
7
def opt[T](p : Parser[T]): Parser[Option[T]] = p ^^ Some  success(None)
def rep[T](p : Parser[T]): Parser[List[T]] =
p ~ rep(p) ^^ { case x ~ xs => x :: xs }  success(Nil)
def repsep[T, U](p : Parser[T], q : Parser[U]): Parser[List[T]] =
p ~ rep(q ~> p) ^^ { case r ~ rs => r :: rs }  success(Nil)
Note that none of the above can fail. They may, however, return None
or Nil
wrapped in success
.
As an exercise, we can implement the rep1(P)
parser combinator, which corresponds to the +
regex quantifier:
1
def rep1[T](p: Parser[T]) = p ~ rep(p)
Example: JSON parser
We did not mention lexical.delimiters
and lexical.reserved
in the above, and for the sake of brevity, we omit the implementation of stringLit
and numericLit
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
object JSON extends StandardTokenParsers {
lexical.delimiters += ("{", "}", "[", "]", ":")
lexical.reserved += ("null", "true", "false")
// Return Map
def obj: Parser[Any] = "{" ~ repsep(member, ",") ~ "}" ^^ (ms => Map() ++ ms)
// Return List
def arr: Parser[Any] = "[" ~> repsep(value, ",") <~ "]"
// Return name/value pair:
def member: Parser[Any] = stringLit ~ ":" ~ value ^^ {
case name ~ ":" ~ value => (name, value)
}
// Return correct Scala type
def value: Parser[Any] =
obj
 arr
 stringLit
 numericLit ^^ (_.toInt)
 "null" ^^^ null
 "true" ^^^ true
 "false" ^^^ false
}
The trouble with leftrecursion
Parser combinators work topdown and therefore do not allow for leftrecursion. For example, the following would go into an infinite loop, where the parser keeps recursively matching the same token unto expr
:
1
def expr = expr ~ "" ~ term
Let’s take a look at an arithmetic expression parser:
1
2
3
4
5
6
object Arithmetic extends StandardTokenParsers {
lexical.delimiters ++= List("(", ")", "+", "−", "∗", "/")
def expr: Parser[Any] = term ~ rep("+" ~ term  "−" ~ term)
def term: Parser[Any] = factor ~ rep("∗" ~ factor  "/" ~ factor)
def factor: Parser[Any] = "(" ~ expr ~ ")"  numericLit
}
This definition of expr
, namely term ~ rep("" ~ term)
produces a rightleaning tree. For instance, 1  2  3
produces 1 ~ List("" ~ 2, ~ "" ~ 3)
.
The solution is to combine calls to rep
with a final foldLeft on the list:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
object Arithmetic extends StandardTokenParsers {
lexical.delimiters ++= List("(", ")", "+", "−", "∗", "/")
def expr: Parser[Any] = term ~ rep("+" ~ term  "−" ~ term) ^^ reduceList
def term: Parser[Any] = factor ~ rep("∗" ~ factor  "/" ~ factor) ^^ reduceList
def factor: Parser[Any] = "(" ~ expr ~ ")"  numericLit
private def reduceList(list: Expr ~ List[String ~ Expr]): Expr = list match {
case x ~ xs => (x foldLeft ps)(reduce)
}
private def reduce(x: Int, r: String ~ Int) = r match {
case "+" ~ y => x + y
case "−" ~ y => x − y
case "∗" ~ y => x ∗ y
case "/" ~ y => x / y
case => throw new MatchError("illegal case: " + r)
}
}
👉 It used to be that the standard library contained parser combinators, but those are now a separate module. This module contains a
chainl
(chainleft) method that reduces after arep
for you.
Arithmetic expressions — abstract syntax and proof principles
This section follows Chapter 3 in TAPL.
Basics of induction
Ordinary induction is simply:
Suppose P is a predicate on natural numbers.
Then:
If P(0)
and, for all i, P(i) implies P(n) holds for all n
then P(n) holds for all n
We can also do complete induction:
Suppose P is a predicate on natural numbers.
Then:
If for each natural number n,
given P(i) for all i < n we can show P(n)
then P(n) holds for all n
It proves exactly the same thing as ordinary induction, it is simply a restated version. They’re interderivable; assuming one, we can prove the other. Which one to use is simply a matter of style or convenience. We’ll see some more equivalent styles as we go along.
Mathematical representation
Let’s assume the following grammar:
1
2
3
4
5
6
7
8
t ::=
true
false
if t then t else t
0
succ t
pred t
iszero t
What does this really define? A few suggestions:
 A set of character strings
 A set of token lists
 A set of abstract syntax trees
It depends on how you read it; a grammar like the one above contains information about all three.
However, we are mostly interested in the ASTs. The above grammar is therefore called an abstract grammar. Its main purpose is to suggest a mapping from character strings to trees.
For our use of these, we won’t be too strict with these. For instance, we’ll freely use parentheses to disambiguate what tree we mean to describe, even though they’re not strictly supported by the grammar. What matters to us here aren’t strict implementation semantics, but rather that we have a framework to talk about ASTs. For our purposes, we’ll consider that two terms producing the same AST are basically the same; still, we’ll distinguish terms that only have the same evaluation result, as they don’t necessarily have the same AST.
How can we express our grammar as mathematical expressions? A grammar describes the legal set of terms in a program by offering a recursive definition. While recursive definitions may seem obvious and simple to a programmer, we have to go through a few hoops to make sense of them mathematically.
Mathematical representation 1
We can use a set of terms; it is the smallest set such that:
 ,
 If then ,
 If then we also have .
Mathematical representation 2
We can also write this somewhat more graphically:
This is exactly equivalent to representation 1, but we have just introduced a different notation. Note that “the smallest set closed under…” is often not stated explicitly, but implied.
Mathematical representation 3
Alternatively, we can build up our set of terms as an infinite union:
We can thus build our final set as follows:
Note that we can “pull out” the definition into a generating function :
The generating function is thus defined as:
Each function takes a set of terms as input and produces “terms justified by ” as output; that is, all terms that have the items of as subterms.
The set is said to be closed under F or Fclosed if .
The set of terms as defined above is the smallest Fclosed set. If is another Fclosed set, then .
Comparison of the representations
We’ve seen essentially two ways of defining the set (as representation 1 and 2 are equivalent, but with different notation):
 The smallest set that is closed under certain rules. This is compact and easy to read.
 The limit of a series of sets. This gives us an induction principle on which we can prove things on terms by induction.
The first one defines the set “from above”, by intersecting Fclosed sets.
The second one defines it “from below”, by starting with and getting closer and closer to being Fclosed.
These are equivalent (we won’t prove it, but Proposition 3.2.6 in TAPL does so), but can serve different uses in practice.
Induction on terms
First, let’s define depth:
The depth of a term is the smallest such that .
The way we defined , it gets larger and larger for increasing ; the depth gives us the step at which it is introduced into the set.
We see that if a term is in , then all of its immediate subterms must be in , meaning that they must have smaller depth.
This justifies the principle of induction on terms. Let P be a predicate on a term:
If, for each term s,
given P(r) for all immediate subterms r of s we can show P(s)
then P(t) holds for all t
All this says is that if we can prove the induction step from subterms to terms (under the induction hypothesis), then we have proven the induction.
Inductive function definitions
An inductive definition is used to define the elements in a set recursively, as we have done above. The recursion theorem states that a wellformed inductive definition indeed defines a function, but we’ll get into that later.
Let’s define our grammar function a little more formally. Constants are the basic values that can’t be expanded further; in our example, they are true
, false
, 0
. As such, the set of constants appearing in a term , written , is defined as follows:
This seems simple, but these semantics aren’t perfect. First off, a mathematical definition simply assigns a convenient name to some previously known thing. But here, we’re defining the thing in terms of itself, recursively. And the semantics above also allow us to define illformed inductive definitions:
The last rule produces infinitely large rules (if we implemented it, we’d expect some kind of stack overflow). We’re missing the rules for ifstatements, and we have a useless rule for 0
, producing empty sets.
How do we tell the difference between a wellformed inductive definition, and an illformed one as above? What is wellformedness anyway?
What is a function?
A relation over is a subset of , where the Cartesian product is defined as:
A function from (domain) to (codomain) can be viewed as a twoplace relation, albeit with two additional properties:
 It is total:
 It is deterministic:
Totality ensures that the A domain is covered, while being deterministic just means that the function always produces the same result for a given input.
Induction theorem
As previously stated, is a relation. It maps terms (A) into the set of constants that they contain (B). The induction theorem states that it is also a function. The proof is as follows.
is total and deterministic: for each term there is exactly one set of terms such that . Not that this inclusion notation is equivalent to saying . The proof is done by induction on .
To be able to apply the induction principle for terms, we must first show that for an arbitrary term , under the following induction hypothesis:
For each immediate subterm of , there is exactly one set of terms such that
Then the following needs to be proven as an induction step:
There is exactly one set of terms such that
We proceed by cases on (as if we were pattern matching 😉):

If is , or
We can immediately see from the definition that of that there is exactly one set of terms ) such that .
This constitutes our base case.

If is , or
The immediate subterm of is , and the induction hypothesis tells us that there is exactly one set of terms such that . But then it is clear from the definition that there is exactly one set of terms such that .

If is
The induction hypothesis tells us:
 There is exactly one set of terms such that
 There is exactly one set of terms such that
 There is exactly one set of terms such that
It is clear from the definition of that there is exactly one set such that .
This proves that is indeed a function.
But what about ? It is also a relation, but it isn’t a function. For instance, and , which violates determinism. To reformulate this in terms of the above, there are two sets such that , namely and .
Note that there are many other problems with , but this is sufficient to prove that it isn’t a function.
The week 2 slides have another example of a proof by induction, which I won’t go into here.
Operational semantics and reasoning
Evaluation
Suppose we have the following syntax
1
2
3
4
t ::= // terms
true // constant true
false // constant false
if t then t else t // conditional
The evaluation relation is the smallest relation closed under the following rules.
The following are computation rules, defining the “real” computation steps:
The following is a congruence rule, defining where the computation rule is applied next:
We want to evaluate the condition before the conditional clauses in order to save on evaluation; we’re not sure which one should be evaluated, so we need to know the condition first.
Derivations
We can describe the evaluation logically from the above rules using derivation trees. Suppose we want to evaluate the following (with parentheses added for clarity): if (if true then true else false) then false else true
.
In an attempt to make all this fit onto the screen, true
and false
have been abbreviated T
and F
in the derivation below, and the then
keyword has been replaced with a parenthesis notation for the condition.
The final statement is a conclusion. We say that the derivation is a witness for its conclusion (or a proof for its conclusion). The derivation records all reasoning steps that lead us to the conclusion.
We can introduce the inversion lemma, which tells us how we got to a term.
Suppose we are given a derivation witnessing the pair in the evaluation relation. Then either:
 If the final rule applied in was , then we have and for some and
 If the final rule applied in was , then we have and for some and
 If the final rule applied in was , then we have and , for some . Moreover, the immediate subderivation of witnesses .
This is super boring, but we do need to acknowledge the inversion lemma before we can do induction proofs on derivations. Thanks to the inversion lemma, given an arbitrary derivation with conclusion , we can proceed with a casebycase analysis on the final rule used in the derivation tree.
For instance, if we define to be the i at which describing it was created (so , etc.). We want to prove that if , then .
 If the final rule applied in was , then we have and , and the result is immediate from the definition of
 If the final rule applied in was , then we have and , and the result is immediate from the definition of
 If the final rule applied in was , then we have and . In this case, is witnessed by a derivation . By the induction hypothesis, , and the result is then immediate from the definition of
Structural induction
TODO
Abstract machines
An abstract machine consists of:
 A set of states
 A transition relation of states, written
means that evaluates to in one step. Note that is a relation, and that is shorthand for . Often, this relation is a partial function (not necessarily covering the domain A; there is at most one possible next state). But without loss of generality, there may be many possible next states, determinism isn’t a criterion here.
Normal forms
A normal form is a term that cannot be evaluated any further. More formally, a term is a normal form if there is no such that . A normal form is a state where the abstract machine is halted; we can regard it as the result of a computation.
Values that are normal form
Previously, we intended for our values (true and false) to be exactly that, the result of a computation. Did we get that right?
Let’s prove that a term is a value it is in normal form.
 The direction is immediate from the definition of the evaluation relation .

The direction is more conveniently proven as its contrapositive: if is not a value, then it is not a normal form, which we can prove by induction on the term .
Since is not a value, it must be of the form . If is directly
true
orfalse
, then EIfTrue or EIfFalse apply, and we are done.Now, if is an ifthenelse, and isn’t a value, by the induction hypothesis, there is a such that . Then rule EIf yields , which proves that is not in normal form.
Values that are not normal form
Let’s introduce new syntactic forms, with new evaluation rules.
1
2
3
4
5
6
7
8
9
10
11
t ::= // terms
0 // constant 0
succ t // successor
pred t // predecessor
iszero t // zero test
v ::= nv // values
nv ::= // numeric values
0 // zero value
succ nv // successor value
The evaluation rules are given as follows:
All values are still normal forms. But are all normal forms values? Not in this case. For instance, succ true
, iszero true
, etc, are normal forms. These are stuck terms: they are in normal form, but are not values. In general, these correspond to some kind of type error, and one of the main purposes of a type system is to rule these kinds of situations out.
Multistep evaluation
Let’s introduce the multistep evaluation relation, . It is the reflexive, transitive closure of singlestep evaluation, i.e. the smallest relation closed under these rules:
In other words, it corresponds to any number of single consecutive evaluations.
Termination of evaluation
We’ll prove that evaluation terminates, i.e. that for every term there is some normal form such that .
First, let’s recall our proof that . Now, for our proof by contradiction, assume that we have an infinitelength sequence such that:
But this sequence cannot exist, which is a contradiction.
Most termination proofs have the same basic form. We want to prove that the relation is terminating — that is, there are no infinite sequences etc. such that for each . We proceed as follows:
 Choose a wellsuited set with partial order such that there are no infinite descending chains in . Also choose a function .
 Show
 Conclude that are no infinite sequences such that for each . If there were, we could construct an infinite descending chain in .
As a sidenote, partial order is defined as the following properties:
 Antisymmetry:
 Transitivity:
We can add a third property to achieve total order, namely .
Lambda calculus
Lambda calculus is Turing complete, and is higherorder (functions are data). In lambda calculus, all computation happens by means of function abstraction and application.
Lambda calculus is isomorphic to Turing machines.
Suppose we wanted to write a function plus3
in our previous language:
The way we write this in lambda calculus is:
is written x => t
in Scala, or fun x > t
in OCaml. Application of our function, say plus3(succ 0)
, can be written as:
Abstraction over functions is possible using higherorder functions, which we call abstractions. An example of such an abstraction is the function below, which takes an argument and uses it in the function position.
If we apply to an argument like , we can just use the substitution rule to see how that defines a new function.
Another example: the double function below takes two arguments, as a curried function would. First, it takes the function to apply twice, then the argument on which to apply it, and then returns .
Pure lambda calculus
Once we have abstractions, we can actually throw out all other language primitives like booleans and other values; all of these can be expressed as functions, as we’ll see below. In pure lambdacalculus, everything is a function.
Variables will always denote a function, functions always take other functions as parameters, and the result of an evaluation is always a function.
The syntax of lambdacalculus is very simple:
1
2
3
4
t ::= // terms, also called λterms
x // variable
λx. t // abstraction, also called λabstractions
t t // application
A few rules and syntactic conventions:
 Application associates to the left, so means , not .
 Bodies of lambda abstractions extend as far to the right as possible, so means , not
Scope
The lambda expression binds the variable , with a scope limited to . Occurrences of inside of are said to be bound, while occurrences outside are said to be free.
Operational semantics
As we saw with our previous language, the rules could be distinguished into computation and congruence rules. For lambda calculus, the only computation rule is:
The notation means “the term that results from substituting free occurrences of in with ”.
The congruence rules are:
A lambdaexpression applied to a value, , is called a reducible expression, or redex.
Evaluation strategies
There are alternative evaluation strategies. In the above, we have chosen call by value (which is the standard in most mainstream languages), but we could also choose:
 Full betareduction: any redex may be reduced at any time. This offers no restrictions, but in practice, we go with a set of restrictions like the ones below (because coding a fixed way is easier than coding probabilistic behavior).
 Normal order: the leftmost, outermost redex is always reduced first. This strategy allows to reduce inside unapplied lambda terms
 Callbyname: allows no reductions inside lambda abstractions. Arguments are not reduced before being substituted in the body of lambda terms when applied. Haskell uses an optimized version of this, callbyneed (aka lazy evaluation).
Classical lambda calculus
Classical lambda calculus allows for full beta reduction.
Confluence in full beta reduction
The congruence rules allow us to apply in different ways; we can choose between and every time we reduce an application, and this offers many possible reduction paths.
While the path is nondeterministic, is the result also nondeterministic? This question took a very long time to answer, but after 25 years or so, it was proven that the result is always the same. This is known the ChurchRosser confluence theorem:
Let be terms such that and . Then there exists a term such that and
Alpha conversion
Substitution is actually trickier than it looks! For instance, in the expression , the first occurrence of is bound, while the second is free. This is comparable to scope in most programming languages, where we should understand that these are two different variables, and .
The above example had a variable that is both bound and free, which is something that we’ll try to avoid. This is called a hygiene condition.
We can transform a unhygienic expression to a hygienic one by renaming bound variables before performing the substitution. This is known as alpha conversion.
Let be the set of free variables in a term . Alpha conversion is given by the following conversion rule:
And these equivalence rules (in mathematics, equivalence is defined as symmetry and transitivity):
The congruence rules are as usual.
Programming in lambdacalculus
Multiple arguments
The way to handle multiple arguments is by currying:
Booleans
The fundamental, universal operator on booleans is ifthenelse, which is what we’ll replicate to model booleans. We want true
to be equivalent to if (true)
, and false
to if (false)
.
We can consider these as booleans. Equivalently tru
can be considered as a function performing (t1, t2) => if (true) t1 else t2
. To understand this, let’s try to apply to two arguments:
This works equivalently for fls
.
We can also do inversion, conjunction and disjunction with lambda calculus:

not
is a function that is equivalent tonot(b) = if (b) false else true
. 
and
is equivalent toand(b, c) = if (b) c else false

or
is equivalent toor(b, c) = if (b) true else c
Pairs
The fundamental operations are construction pair(a, b)
, and selection pair._1
and pair._2
.

pair
is equivalent topair(f, s) = (b => b f s)
 When
tru
is applied topair
, it selects the first element, by definition of the boolean, and that is therefore the definition offst
 Equivalently for
fls
applied topair
, it selects the second element
Numbers
We’ve actually been representing numbers as lambdacalculus numbers all along! Our succ
function represents what’s more formally called Church numerals.
Every number is represented by a term taking two arguments, which are and (for “successor” and “zero”), and applies to , times.
Note that ’s implementation is the same as that of (just with renamed variables).
We can also implement some functions on numbers.
 Successor : we apply the successor function to (which has been correctly instantiated with and )
 Addition : we pass the instantiated as the zero of
 Multiplication : instead of the successor function, we pass the addition by function.
 Zero test : zero has the same implementation as false, so we can lean on that to build an iszero function
What about predecessor? This is a little harder, and it’ll take a few steps to get there. The main idea is that we find the predecessor by rebuilding the whole succession up until our number. At every step, we must generate the number and its predecessor: zero is , and all other numbers are . Once we’ve reconstructed this pair, we can get the predecessor by taking the first element of the pair.
Sidenote
The story goes that Church was stumped by predecessors for a long time, and finally found out about this at the barber, and jumped out half shaved to write it down.
Lists
Now what about lists?
Recursion in lambdacalculus
Let’s start by taking a step back. We talked about normal forms and terms for which we terminate; does lambda calculus always terminate? It’s Turing complete, so it must be able to loop infinitely (otherwise, we’d have solved the halting problem!).
The trick to recursion is selfapplication:
From a typelevel perspective, we would cringe at this. This should not be possible in the typed world, but in the untyped world we can do it. We can construct a simple infinite loop in lambda calculus as follows:
The expression evaluates to itself in one step; it never reaches a normal form, it loops infinitely, diverges. This is not a stuck term though; evaluation is always possible.
In fact, there are no stuck terms in pure lambda calculus. Every term is either a value or reduces further.
So it turns out that isn’t so terribly useful. Let’s try to construct something more practical:
Now, the divergence is a little more interesting:
This function is known as a Ycombinator. It still delays infinitely (though note that while it works in classical lambda calculus, it blows up in callbyname).
Equivalence of lambda terms
We’ve seen how to define Church numerals and successor. How can we prove that is equal to ?
The naive approach unfortunately doesn’t work; they do not evaluate to the same value.
This still seems very close. If we could simplify a little further, we do see how they would be the same.
The intuition behind the Church numeral representation was that a number is represented as a term that “does something times to something else”. takes a term that “does something times to something else”, and returns a term that “does something times to something else”.
What we really care about is that behaves the same as when applied to two arguments. We want behavioral equivalence. But what does that mean? Roughly, two terms and are behaviorally equivalent if there is no “test” that distinguishes and .
Let’s define this notion of “test” this a little more precisely, and specify how we’re going to observe the results of a test. We can use the notion of normalizability to define a simple notion of a test:
Two terms and are said to be observationally equivalent if they are either both normalizable (i.e. they reach a normal form after a finite number of evaluation steps), or both diverge.
In other words, we observe a term’s behavior by running it and seeing if it halts. Note that this is not decidable (by the halting problem).
For instance, and are not observationally equivalent (one diverges, one halts), while and are (they both halt).
Observational equivalence isn’t strong enough of a test for what we need; we need behavioral equivalence.
Two terms and are said to be behaviorally equivalent if, for every finite sequence of values the applications and are observationally equivalent.
This allows us to assert that true and false are indeed different:
The former returns a normal form, while the latter diverges.
« Back