CS452 Foundations of Software
 Writing a parser with parser combinators
 Arithmetic expressions — abstract syntax and proof principles
 Lambda calculus
 Types
 Simply typed lambda calculus
 Type reconstruction and polymorphism
 Subtyping
 Objects
 Featherweight Java
 Foundations of Scala
Writing a parser with parser combinators
In Scala, you can (ab)use the operator overload to create an embedded DSL (EDSL) for grammars. While a grammar may look as follows in a grammar description language (Bison, Yak, ANTLR, …):
1
2
3
Expr ::= Term {'+' Term  '−' Term}
Term ::= Factor {'∗' Factor  '/' Factor}
Factor ::= Number  '(' Expr ')'
In Scala, we can model it as follows:
1
2
3
def expr: Parser[Any] = term ~ rep("+" ~ term  "−" ~ term)
def term: Parser[Any] = factor ~ rep("∗" ~ factor  "/" ~ factor)
def factor: Parser[Any] = "(" ~ expr ~ ")"  numericLit
This is perhaps a little less elegant, but allows us to encode it directly into our language, which is often useful for interop.
The ~
, 
, rep
and opt
are parser combinators. These are primitives with which we can construct a full parser for the grammar of our choice.
Boilerplate
First, let’s define a class ParseResult[T]
as an adhoc monad; parsing can either succeed or fail:
1
2
3
sealed trait ParseResult[T]
case class Success[T](result: T, in: Input) extends ParseResult[T]
case class Failure(msg : String, in: Input) extends ParseResult[Nothing]
👉
Nothing
is the bottom type in Scala; it contains no members, and nothing can extend it
Let’s also define the tokens produced by the lexer (which we won’t define) as case classes extending Token
:
1
2
3
4
5
sealed trait Token
case class Keyword(chars: String) extends Token
case class NumericLit(chars: String) extends Token
case class StringLit(chars: String) extends Token
case class Identifier(chars: String) extends Token
Input into the parser is then a lazy stream of tokens (with positions for error diagnostics, which we’ll omit here):
1
type Input = Reader[Token]
We can then define a standard, sample parser which looks as follows on the typelevel:
1
2
3
class StandardTokenParsers {
type Parser = Input => ParseResult
}
The basic idea
For each language (defined by a grammar symbol S
), define a function f
that, given an input stream i
(with tail i'
):
 if a prefix of
i
is inS
, returnSuccess(Pair(x, i'))
, wherex
is a result forS
 otherwise, return
Failure(msg, i)
, wheremsg
is an error message string
The first is called success, the second is failure. We can compose operations on this somewhat conveniently, like we would on a monad (like Option
).
Simple parser primitives
All of the above boilerplate allows us to define a parser, which succeeds if the first token in the input satisfies some given predicate pred
. When it succeeds, it reads the token string, and splits the input there.
1
2
3
4
5
def token(kind: String)(pred: Token => boolean) = new Parser[String] {
def apply(in : Input) =
if (pred(in.head)) Success(in.head.chars, in.tail)
else Failure(kind + " expected ", in)
}
We can use this to define a keyword parser:
1
2
3
4
implicit def keyword(chars: String) = token("'" + chars + "'") {
case Keyword(chars1) => chars == chars1
case _ => false
}
Marking it as implicit
allows us to write keywords as normal strings, where we can omit the keyword
call (this helps us simplify the notation in our DSL; we can write "if"
instead of keyword("if")
).
We can make other parsers for our other case classes quite simply:
1
2
3
def numericLit = token("number")(_.isInstanceOf[NumericLit])
def stringLit = token("string literal")(_.isInstanceOf[StringLit])
def ident = token("identifier")(_.isInstanceOf[Identifier])
Parser combinators
We are going to define the following parser combinators:

~
: sequential composition 
<~
,>~
: sequential composition, keeping left / right only 

: alternative 
opt(X)
: option (like a?
quantifier in a regex) 
rep(X)
: repetition (like a*
quantifier in a regex) 
repsep(P, Q)
: interleaved repetition 
^^
: result conversion (like amap
on anOption
) 
^^^
: constant result (like amap
on anOption
, but returning a constant value regardless of result)
But first, we’ll write some very basic parser combinators: success
and failure
, that respectively always succeed and always fail:
1
2
3
4
5
6
7
def success[T](result: T) = new Parser[T] {
def apply(in: Input) = Success(result, in)
}
def failure(msg: String) = new Parser[Nothing] {
def apply(in: Input) = Failure(msg, in)
}
All of the above are methods on a Parser[T]
class. Thanks to infix space notation in Scala, we can denote x.y(z)
as x y z
, which allows us to simplify our DSL notation; for instance A ~ B
corresponds to A.~(B)
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
abstract class Parser[T] {
// An abstract method that defines the parser function
def apply(in : Input): ParseResult
def ~[U](rhs: Parser[U]) = new Parser[T ~ U] {
def apply(in: Input) = Parser.this(in) match {
case Success(x, tail) => rhs(tail) match {
case Success(y, rest) => Success(new ~(x, y), rest)
case failure => failure
}
case failure => failure
}
}
def (rhs: => Parser[T]) = new Parser[T] {
def apply(in : Input) = Parser.this(in) match {
case s1 @ Success(_, _) => s1
case failure => rhs(in)
}
}
def ^^[U](f: T => U) = new Parser[U] {
def apply(in : Input) = Parser.this(in) match {
case Success(x, tail) => Success(f(x), tail)
case x => x
}
}
def ^^^[U](r: U): Parser[U] = ^^(x => r)
}
👉 In Scala,
T ~ U
is syntactic sugar for~[T, U]
, which is the type of the case class we’ll define below
For the ~
combinator, when everything works, we’re using ~
, a case class that is equivalent to Pair
, but prints the way we want to and allows for the concise typelevel notation above.
1
2
3
case class ~[T, U](_1 : T, _2 : U) {
override def toString = "(" + _1 + " ~ " + _2 +")"
}
At this point, we thus have two different meanings for ~
: a function ~
that produces a Parser
, and the ~(a, b)
case class pair that this parser returns (all of this is encoded in the function signature of the ~
function).
Note that the 
combinator takes the righthand side parser as a callbyname argument. This is because we don’t want to evaluate it unless it is strictly needed—that is, if the lefthand side fails.
^^
is like a map
operation on Option
; P ^^ f
succeeds iff P
succeeds, in which case it applies the transformation f
on the result of P. Otherwise, it fails.
Shorthands
We can now define shorthands for common combinations of parser combinators:
1
2
3
4
5
6
7
def opt[T](p : Parser[T]): Parser[Option[T]] = p ^^ Some  success(None)
def rep[T](p : Parser[T]): Parser[List[T]] =
p ~ rep(p) ^^ { case x ~ xs => x :: xs }  success(Nil)
def repsep[T, U](p : Parser[T], q : Parser[U]): Parser[List[T]] =
p ~ rep(q ~> p) ^^ { case r ~ rs => r :: rs }  success(Nil)
Note that none of the above can fail. They may, however, return None
or Nil
wrapped in success
.
As an exercise, we can implement the rep1(P)
parser combinator, which corresponds to the +
regex quantifier:
1
def rep1[T](p: Parser[T]) = p ~ rep(p)
Example: JSON parser
Let’s define a JSON parser. Scala’s parser combinator library has a StandardTokenParsers
that give us a variety of utility methods for lexing, like lexical.delimiters
, lexical.reserved
, stringLit
and numericLit
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
object JSON extends StandardTokenParsers {
lexical.delimiters += ("{", "}", "[", "]", ":")
lexical.reserved += ("null", "true", "false")
// Return Map
def obj: Parser[Any] = "{" ~ repsep(member, ",") ~ "}" ^^ (ms => Map() ++ ms)
// Return List
def arr: Parser[Any] = "[" ~> repsep(value, ",") <~ "]"
// Return name/value pair:
def member: Parser[Any] = stringLit ~ ":" ~ value ^^ {
case name ~ ":" ~ value => (name, value)
}
// Return correct Scala type
def value: Parser[Any] =
obj
 arr
 stringLit
 numericLit ^^ (_.toInt)
 "null" ^^^ null
 "true" ^^^ true
 "false" ^^^ false
}
The trouble with leftrecursion
Parser combinators work topdown and therefore do not allow for leftrecursion. For example, the following would go into an infinite loop, where the parser keeps recursively matching the same token unto expr
:
1
def expr = expr ~ "" ~ term
Let’s take a look at an arithmetic expression parser:
1
2
3
4
5
6
object Arithmetic extends StandardTokenParsers {
lexical.delimiters ++= List("(", ")", "+", "−", "∗", "/")
def expr: Parser[Any] = term ~ rep("+" ~ term  "−" ~ term)
def term: Parser[Any] = factor ~ rep("∗" ~ factor  "/" ~ factor)
def factor: Parser[Any] = "(" ~ expr ~ ")"  numericLit
}
This definition of expr
, namely term ~ rep("" ~ term)
produces a rightleaning tree. For instance, 1  2  3
produces 1 ~ List("" ~ 2, ~ "" ~ 3)
.
The solution is to combine calls to rep
with a final foldLeft on the list:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
object Arithmetic extends StandardTokenParsers {
lexical.delimiters ++= List("(", ")", "+", "−", "∗", "/")
def expr: Parser[Any] = term ~ rep("+" ~ term  "−" ~ term) ^^ reduceList
def term: Parser[Any] = factor ~ rep("∗" ~ factor  "/" ~ factor) ^^ reduceList
def factor: Parser[Any] = "(" ~ expr ~ ")"  numericLit
private def reduceList(list: Expr ~ List[String ~ Expr]): Expr = list match {
case x ~ xs => (x foldLeft ps)(reduce)
}
private def reduce(x: Int, r: String ~ Int) = r match {
case "+" ~ y => x + y
case "−" ~ y => x − y
case "∗" ~ y => x ∗ y
case "/" ~ y => x / y
case => throw new MatchError("illegal case: " + r)
}
}
👉 It used to be that the standard library contained parser combinators, but those are now a separate module. This module contains a
chainl
(chainleft) method that reduces after arep
for you.
Arithmetic expressions — abstract syntax and proof principles
This section follows Chapter 3 in TAPL.
Basics of induction
Ordinary induction is simply:
Suppose P is a predicate on natural numbers.
Then:
If P(0)
and, for all i, P(i) implies P(i + 1)
then P(n) holds for all n
We can also do complete induction:
Suppose P is a predicate on natural numbers.
Then:
If for each natural number n,
given P(i) for all i < n we can show P(n)
then P(n) holds for all n
It proves exactly the same thing as ordinary induction, it is simply a restated version. They’re interderivable; assuming one, we can prove the other. Which one to use is simply a matter of style or convenience. We’ll see some more equivalent styles as we go along.
Mathematical representation of syntax
Let’s assume the following grammar:
1
2
3
4
5
6
7
8
t ::=
true
false
if t then t else t
0
succ t
pred t
iszero t
What does this really define? A few suggestions:
 A set of character strings
 A set of token lists
 A set of abstract syntax trees
It depends on how you read it; a grammar like the one above contains information about all three.
However, we are mostly interested in the ASTs. The above grammar is therefore called an abstract grammar. Its main purpose is to suggest a mapping from character strings to trees.
For our use of these, we won’t be too strict with these. For instance, we’ll freely use parentheses to disambiguate what tree we mean to describe, even though they’re not strictly supported by the grammar. What matters to us here aren’t strict implementation semantics, but rather that we have a framework to talk about ASTs. For our purposes, we’ll consider that two terms producing the same AST are basically the same; still, we’ll distinguish terms that only have the same evaluation result, as they don’t necessarily have the same AST.
How can we express our grammar as mathematical expressions? A grammar describes the legal set of terms in a program by offering a recursive definition. While recursive definitions may seem obvious and simple to a programmer, we have to go through a few hoops to make sense of them mathematically.
Mathematical representation 1
We can use a set of terms. The grammar is then the smallest set such that:
 ,
 If then ,
 If then we also have .
Mathematical representation 2
We can also write this somewhat more graphically:
This is exactly equivalent to representation 1, but we have just introduced a different notation. Note that “the smallest set closed under…” is often not stated explicitly, but implied.
Mathematical representation 3
Alternatively, we can build up our set of terms as an infinite union:
We can thus build our final set as follows:
Note that we can “pull out” the definition into a generating function :
The generating function is thus defined as:
Each function takes a set of terms as input and produces “terms justified by ” as output; that is, all terms that have the items of as subterms.
The set is said to be closed under F or Fclosed if .
The set of terms as defined above is the smallest Fclosed set. If is another Fclosed set, then .
Comparison of the representations
We’ve seen essentially two ways of defining the set (as representation 1 and 2 are equivalent, but with different notation):
 The smallest set that is closed under certain rules. This is compact and easy to read.
 The limit of a series of sets. This gives us an induction principle on which we can prove things on terms by induction.
The first one defines the set “from above”, by intersecting Fclosed sets.
The second one defines it “from below”, by starting with and getting closer and closer to being Fclosed.
These are equivalent (we won’t prove it, but Proposition 3.2.6 in TAPL does so), but can serve different uses in practice.
Induction on terms
First, let’s define depth: the depth of a term is the smallest such that .
The way we defined , it gets larger and larger for increasing ; the depth of a term gives us the step at which is introduced into the set.
We see that if a term is in , then all of its immediate subterms must be in , meaning that they must have smaller depth.
This justifies the principle of induction on terms, or structural induction. Let P be a predicate on a term:
If, for each term s,
given P(r) for all immediate subterms r of s we can show P(s)
then P(t) holds for all t
All this says is that if we can prove the induction step from subterms to terms (under the induction hypothesis), then we have proven the induction.
We can also express this structural induction using generating functions, which we introduced previously.
Suppose T is the smallest Fclosed set.
If, for each set U,
from the assumption "P(u) holds for every u ∈ U",
we can show that "P(v) holds for every v ∈ F(U)"
then
P(t) holds for all t ∈ T
Why can we use this?
 We assumed that was the smallest Fclosed set, which means that for any other Fclosed set .
 Showing the precondition (“for each set , from the assumption…”) amounts to showing that the set of all terms satisfying (call it ) is itself an Fclosed set.
 Since , every element of satisfies .
Inductive function definitions
An inductive definition is used to define the elements in a set recursively, as we have done above. The recursion theorem states that a wellformed inductive definition defines a function. To understand what being wellformed means, let’s take a look at some examples.
Let’s define our grammar function a little more formally. Constants are the basic values that can’t be expanded further; in our example, they are true
, false
, 0
. As such, the set of constants appearing in a term , written , is defined recursively as follows:
This seems simple, but these semantics aren’t perfect. First off, a mathematical definition simply assigns a convenient name to some previously known thing. But here, we’re defining the thing in terms of itself, recursively. And the semantics above also allow us to define illformed inductive definitions:
The last rule produces infinitely large rules (if we implemented it, we’d expect some kind of stack overflow). We’re missing the rules for ifstatements, and we have a useless rule for 0
, producing empty sets.
How do we tell the difference between a wellformed inductive definition, and an illformed one as above? What is wellformedness anyway?
What is a function?
A relation over is a subset of , where the Cartesian product is defined as:
A function from (domain) to (codomain) can be viewed as a twoplace relation, albeit with two additional properties:
 It is total:
 It is deterministic:
Totality ensures that the A domain is covered, while being deterministic just means that the function always produces the same result for a given input.
Induction example 1
As previously stated, is a relation. It maps terms (A) into the set of constants that they contain (B). The induction theorem states that it is also a function. The proof is as follows.
is total and deterministic: for each term there is exactly one set of terms such that ^{1} . The proof is done by induction on .
To be able to apply the induction principle for terms, we must first show that for an arbitrary term , under the following induction hypothesis:
For each immediate subterm of , there is exactly one set of terms such that
Then the following needs to be proven as an induction step:
There is exactly one set of terms such that
We proceed by cases on :

If is , or
We can immediately see from the definition that of that there is exactly one set of terms ) such that .
This constitutes our base case.

If is , or
The immediate subterm of is , and the induction hypothesis tells us that there is exactly one set of terms such that . But then it is clear from the definition that there is exactly one set of terms such that .

If is
The induction hypothesis tells us:
 There is exactly one set of terms such that
 There is exactly one set of terms such that
 There is exactly one set of terms such that
It is clear from the definition of that there is exactly one set such that .
This proves that is indeed a function.
But what about ? It is also a relation, but it isn’t a function. For instance, we have and , which violates determinism. To reformulate this in terms of the above, there are two sets such that , namely and .
Note that there are many other problems with , but this is sufficient to prove that it isn’t a function.
Induction example 2
Let’s introduce another inductive definition:
We’d like to prove that the number of distinct constants in a term is at most the size of the term. In other words, that
The proof is by induction on :

is a constant; , or
The proof is immediate. For constants, the number of constants and the size are both one:

is a function; , or
By the induction hypothesis, .
We can then prove the proposition as follows:

is an ifstatement:
By the induction hypothesis, , and .
We can then prove the proposition as follows:
Operational semantics and reasoning
Evaluation
Suppose we have the following syntax
1
2
3
4
t ::= // terms
true // constant true
false // constant false
if t then t else t // conditional
The evaluation relation is the smallest relation closed under the following rules.
The following are computation rules, defining the “real” computation steps:
The following is a congruence rule, defining where the computation rule is applied next:
We want to evaluate the condition before the conditional clauses in order to save on evaluation; we’re not sure which one should be evaluated, so we need to know the condition first.
Derivations
We can describe the evaluation logically from the above rules using derivation trees. Suppose we want to evaluate the following (with parentheses added for clarity): if (if true then true else false) then false else true
.
In an attempt to make all this fit onto the screen, true
and false
have been abbreviated T
and F
in the derivation below, and the then
keyword has been replaced with a parenthesis notation for the condition.
The final statement is a conclusion. We say that the derivation is a witness for its conclusion (or a proof for its conclusion). The derivation records all reasoning steps that lead us to the conclusion.
Inversion lemma
We can introduce the inversion lemma, which tells us how we got to a term.
Suppose we are given a derivation witnessing the pair in the evaluation relation. Then either:
 If the final rule applied in was , then we have and for some and
 If the final rule applied in was , then we have and for some and
 If the final rule applied in was , then we have and , for some . Moreover, the immediate subderivation of witnesses .
This is super boring, but we do need to acknowledge the inversion lemma before we can do induction proofs on derivations. Thanks to the inversion lemma, given an arbitrary derivation with conclusion , we can proceed with a casebycase analysis on the final rule used in the derivation tree.
Let’s recall our definition of the size function. In particular, we’ll need the rule for ifstatements:
We want to prove that if , then .
 If the final rule applied in was , then we have and , and the result is immediate from the definition of
 If the final rule applied in was , then we have and , and the result is immediate from the definition of
 If the final rule applied in was , then we have and . In this case, is witnessed by a derivation . By the induction hypothesis, , and the result is then immediate from the definition of
Abstract machines
An abstract machine consists of:
 A set of states
 A transition relation of states, written
means that evaluates to in one step. Note that is a relation, and that is shorthand for . Often, this relation is a partial function (not necessarily covering the domain A; there is at most one possible next state). But without loss of generality, there may be many possible next states, determinism isn’t a criterion here.
Normal forms
A normal form is a term that cannot be evaluated any further. More formally, a term is a normal form if there is no such that . A normal form is a state where the abstract machine is halted; we can regard it as the result of a computation.
Values that are normal form
Previously, we intended for our values (true and false) to be exactly that, the result of a computation. Did we get that right?
Let’s prove that a term is a value it is in normal form.
 The direction is immediate from the definition of the evaluation relation .

The direction is more conveniently proven as its contrapositive: if is not a value, then it is not a normal form, which we can prove by induction on the term .
Since is not a value, it must be of the form . If is directly
true
orfalse
, then or apply, and we are done.Otherwise, if where isn’t a value, by the induction hypothesis, there is a such that . Then rule yields , which proves that is not in normal form.
Values that are not normal form
Let’s introduce new syntactic forms, with new evaluation rules.
1
2
3
4
5
6
7
8
9
10
11
t ::= // terms
0 // constant 0
succ t // successor
pred t // predecessor
iszero t // zero test
v ::= nv // values
nv ::= // numeric values
0 // zero value
succ nv // successor value
The evaluation rules are given as follows:
All values are still normal forms. But are all normal forms values? Not in this case. For instance, succ true
, iszero true
, etc, are normal forms. These are stuck terms: they are in normal form, but are not values. In general, these correspond to some kind of type error, and one of the main purposes of a type system is to rule these kinds of situations out.
Multistep evaluation
Let’s introduce the multistep evaluation relation, . It is the reflexive, transitive closure of singlestep evaluation, i.e. the smallest relation closed under these rules:
In other words, it corresponds to any number of single consecutive evaluations.
Termination of evaluation
We’ll prove that evaluation terminates, i.e. that for every term there is some normal form such that .
First, let’s recall our proof that . Now, for our proof by contradiction, assume that we have an infinitelength sequence such that:
But this sequence cannot exist: since is a finite, natural number, we cannot construct this infinite descending chain from it. This is a contradiction.
Most termination proofs have the same basic form. We want to prove that the relation is terminating — that is, there are no infinite sequences such that for each . We proceed as follows:
 Choose a wellsuited set with partial order such that there are no infinite descending chains in . Also choose a function .
 Show
 Conclude that are no infinite sequences such that for each . If there were, we could construct an infinite descending chain in .
As a sidenote, partial order is defined as the following properties:
 Antisymmetry:
 Transitivity:
We can add a third property to achieve total order, namely .
Lambda calculus
Lambda calculus is Turing complete, and is higherorder (functions are data). In lambda calculus, all computation happens by means of function abstraction and application.
Lambda calculus is isomorphic to Turing machines.
Suppose we wanted to write a function plus3
in our previous language:
The way we write this in lambda calculus is:
is written x => t
in Scala, or fun x > t
in OCaml. Application of our function, say plus3(succ 0)
, can be written as:
Abstraction over functions is possible using higherorder functions, which we call abstractions. An example of such an abstraction is the function below, which takes an argument and uses it in the function position.
If we apply to an argument like , we can just use the substitution rule to see how that defines a new function.
Another example: the twice
function below takes two arguments, as a curried function would. First, it takes the function to apply twice, then the argument on which to apply it, and then returns .
Pure lambda calculus
Once we have abstractions, we can actually throw out all other language primitives like booleans and other values; all of these can be expressed as functions, as we’ll see below. In pure lambdacalculus, everything is a function.
Variables will always denote a function, functions always take other functions as parameters, and the result of an evaluation is always a function.
The syntax of lambdacalculus is very simple:
1
2
3
4
t ::= // terms, also called λterms
x // variable
λx. t // abstraction, also called λabstractions
t t // application
A few rules and syntactic conventions:
 Application associates to the left, so means , not .
 Bodies of lambda abstractions extend as far to the right as possible, so means , not
Scope
The lambda expression binds the variable , with a scope limited to . Occurrences of inside of are said to be bound, while occurrences outside are said to be free.
Let be the set of free variables in a term . It’s defined as follows:
Operational semantics
As we saw with our previous language, the rules could be distinguished into computation and congruence rules. For lambda calculus, the only computation rule is:
The notation means “the term that results from substituting free occurrences of in with ”.
The congruence rules are:
A lambdaexpression applied to a value, , is called a reducible expression, or redex.
Evaluation strategies
There are alternative evaluation strategies. In the above, we have chosen call by value (which is the standard in most mainstream languages), but we could also choose:
 Full betareduction: any redex may be reduced at any time. This offers no restrictions, but in practice, we go with a set of restrictions like the ones below (because coding a fixed way is easier than coding probabilistic behavior).
 Normal order: the leftmost, outermost redex is always reduced first. This strategy allows to reduce inside unapplied lambda terms
 Callbyname: allows no reductions inside lambda abstractions. Arguments are not reduced before being substituted in the body of lambda terms when applied. Haskell uses an optimized version of this, callbyneed (aka lazy evaluation).
Classical lambda calculus
Classical lambda calculus allows for full beta reduction.
Confluence in full beta reduction
The congruence rules allow us to apply in different ways; we can choose between and every time we reduce an application, and this offers many possible reduction paths.
While the path is nondeterministic, is the result also nondeterministic? This question took a very long time to answer, but after 25 years or so, it was proven that the result is always the same. This is known the ChurchRosser confluence theorem:
Let be terms such that and . Then there exists a term such that and
Alpha conversion
Substitution is actually trickier than it looks! For instance, in the expression , the first occurrence of is bound (it refers to a parameter), while the second is free (it does not refer to a parameter). This is comparable to scope in most programming languages, where we should understand that these are two different variables in different scopes, and .
The above example had a variable that is both bound and free, which is something that we’ll try to avoid. This is called a hygiene condition.
We can transform a unhygienic expression to a hygienic one by renaming bound variables before performing the substitution. This is known as alpha conversion. Alpha conversion is given by the following conversion rule:
And these equivalence rules (in mathematics, equivalence is defined as symmetry and transitivity):
The congruence rules are as usual.
Programming in lambdacalculus
Multiple arguments
The way to handle multiple arguments is by currying:
Booleans
The fundamental, universal operator on booleans is ifthenelse, which is what we’ll replicate to model booleans. We’ll denote our booleans as and to be able to distinguish these pure lambdacalculus abstractions from the true and false values of our previous toy language.
We want true
to be equivalent to if (true)
, and false
to if (false)
. The terms and represent boolean values, in that we can use them to test the truth of a boolean value:
We can consider these as booleans. Equivalently tru
can be considered as a function performing (t1, t2) => if (true) t1 else t2
. To understand this, let’s try to apply to two arguments:
This works equivalently for fls
.
We can also do inversion, conjunction and disjunction with lambda calculus, which can be read as particular ifelse statements:

not
is a function that is equivalent tonot(b) = if (b) false else true
. 
and
is equivalent toand(b, c) = if (b) c else false

or
is equivalent toor(b, c) = if (b) true else c
Pairs
The fundamental operations are construction pair(a, b)
, and selection pair._1
and pair._2
.

pair
is equivalent topair(f, s) = (b => b f s)
 When
tru
is applied topair
, it selects the first element, by definition of the boolean, and that is therefore the definition offst
 Equivalently for
fls
applied topair
, it selects the second element
Numbers
We’ve actually been representing numbers as lambdacalculus numbers all along! Our succ
function represents what’s more formally called Church numerals.
Note that ’s implementation is the same as that of (just with renamed variables).
Every number is represented by a term taking two arguments, which are and (for “successor” and “zero”), and applies to , times. Fundamentally, a number is equivalent to the following:
With this in mind, let us implement some functions on numbers.
 Successor : we apply the successor function to (which has been correctly instantiated with and )
 Addition : we pass the instantiated as the zero of
 Subtraction : we apply times to
 Multiplication : instead of the successor function, we pass the addition by function.
 Zero test : zero has the same implementation as false, so we can lean on that to build an iszero function. An alternative understanding is that we’re building a number, in which we use true for the zero value . If we have to apply the successor function once or more, we want to get false, so for the successor function we use a function ignoring its input and returning false if applied.
What about predecessor? This is a little harder, and it’ll take a few steps to get there. The main idea is that we find the predecessor by rebuilding the whole succession up until our number. At every step, we must generate the number and its predecessor: zero is , and all other numbers are . Once we’ve reconstructed this pair, we can get the predecessor by taking the first element of the pair.
Sidenote
The story goes that Church was stumped by predecessors for a long time. This solution finally came to him while he was at the barber, and he jumped out half shaven to write it down.
Lists
Now what about lists?
Recursion in lambdacalculus
Let’s start by taking a step back. We talked about normal forms and terms for which we terminate; does lambda calculus always terminate? It’s Turing complete, so it must be able to loop infinitely (otherwise, we’d have solved the halting problem!).
The trick to recursion is selfapplication:
From a typelevel perspective, we would cringe at this. This should not be possible in the typed world, but in the untyped world we can do it. We can construct a simple infinite loop in lambda calculus as follows:
The expression evaluates to itself in one step; it never reaches a normal form, it loops infinitely, diverges. This is not a stuck term though; evaluation is always possible.
In fact, there are no stuck terms in pure lambda calculus. Every term is either a value or reduces further.
So it turns out that isn’t so terribly useful. Let’s try to construct something more practical:
Now, the divergence is a little more interesting:
This function is known as a Y combinator. It still loops infinitely (though note that while it works in classical lambda calculus, it blows up in callbyname), so let’s try to build something more useful.
To delay the infinite recursion, we could build something like a poison pill:
It can be passed around (after all, it’s just a value), but evaluating it will cause our program to loop infinitely. This is the core idea we’ll use for defining the fixedpoint combinator (also known as the callbyvalue Y combinator), which allows us to do recursion. It’s defined as follows:
This looks a little intricate, and we won’t need to fully understand the definition. What’s important is mostly how it is used to define a recursive function. For instance, if we wanted to define a modulo function in our toy language, we’d do it as follows:
1
2
3
def mod(x, y) =
if (y > x) x
else mod(x  y, y)
In lambda calculus, we’d define this as:
We’ve assumed that a greaterthan function was available here.
More generally, we can define a recursive function as:
Equivalence of lambda terms
We’ve seen how to define Church numerals and successor. How can we prove that is equal to ?
The naive approach unfortunately doesn’t work; they do not evaluate to the same value.
This still seems very close. If we could simplify a little further, we do see how they would be the same.
The intuition behind the Church numeral representation was that a number is represented as a term that “does something times to something else”. takes a term that “does something times to something else”, and returns a term that “does something times to something else”.
What we really care about is that behaves the same as when applied to two arguments. We want behavioral equivalence. But what does that mean? Roughly, two terms and are behaviorally equivalent if there is no “test” that distinguishes and .
Let’s define this notion of “test” this a little more precisely, and specify how we’re going to observe the results of a test. We can use the notion of normalizability to define a simple notion of a test:
Two terms and are said to be observationally equivalent if they are either both normalizable (i.e. they reach a normal form after a finite number of evaluation steps), or both diverge.
In other words, we observe a term’s behavior by running it and seeing if it halts. Note that this is not decidable (by the halting problem).
For instance, and are not observationally equivalent (one diverges, one halts), while and are (they both halt).
Observational equivalence isn’t strong enough of a test for what we need; we need behavioral equivalence.
Two terms and are said to be behaviorally equivalent if, for every finite sequence of values the applications and are observationally equivalent.
This allows us to assert that true and false are indeed different:
The former returns a normal form, while the latter diverges.
Types
As previously, to define a language, we start with a set of terms and values, as well as an evaluation relation. But now, we’ll also define a set of types (denoted with a first capital letter) classifying values according to their “shape”. We can define a typing relation . We must check that the typing relation is sound in the sense that:
These rules represent some kind of safety and liveness, but are more commonly referred to as progress and preservation, which we’ll talk about later. The first one states that types are preserved throughout evaluation, while the second says that if we can typecheck, then evaluation of will not get stuck.
In our previous toy language, we can introduce two types, booleans and numbers:
1
2
3
T ::= // types
Bool // type of booleans
Nat // type of numbers
Our typing rules are then given by:
With these typing rules in place, we can construct typing derivations to justify every pair (which we can also denote as a pair) in the typing relation, as we have done previously with evaluation. Proofs of properties about the typing relation often proceed by induction on these typing derivations.
Like other static program analyses, type systems are generally imprecise. They do not always predict exactly what kind of value will be returned, but simply a conservative approximation. For instance, if true then 0 else false
cannot be typed with the above rules, even though it will certainly evaluate to a number. We could of course add a typing rule for if true
statements, but there is still a question of how useful this is, and how much complexity it adds to the type system, and especially for proofs. Indeed, the inversion lemma below becomes much more tedious when we have more rules.
Properties of the Typing Relation
The safety (or soundness) of this type system can be expressed by the following two properties:

Progress: A welltyped term is not stuck.
If then either is a value, or else for some .

Preservation: Types are preserved by onestep evaluation.
If and , then .
We will prove these later, but first we must state a few lemmas.
Inversion lemma
Again, for types we need to state the same (boring) inversion lemma:
 If , then .
 If , then .
 If , then , and
 If then
 If then and
 If then and
 If then and
From the inversion lemma, we can directly derive a typechecking algorithm:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def typeof(t: Expr): T = t match {
case True  False => Bool
case If(t1, t2, t3) =>
val type1 = typeof(t1)
val type2 = typeof(t2)
val type3 = typeof(t3)
if (type1 == Bool && type2 == type3) type2
else throw Error("not typable")
case Zero => Nat
case Succ(t1) =>
if (typeof(t1) == Nat) Nat
else throw Error("not typable")
case Pred(t1) =>
if (typeof(t1) == Nat) Nat
else throw Error("not typable")
case IsZero(t1) =>
if (typeof(t1) == Nat) Bool
else throw Error("not typable")
}
Canonical form
A simple lemma that will be useful for lemma is that of canonical forms. Given a type, it tells us what kind of values we can expect:
 If is a value of type Bool, then is either or
 If is a value of type Nat, then is a numeric value
The proof is somewhat immediate from the syntax of values.
Progress Theorem
Theorem: suppose that is a welltyped term of type . Then either is a value, or else there exists some such that .
Proof: by induction on a derivation of .
 The , and are immediate, since is a value in these cases.

For , we have , with , and . By the induction hypothesis, there is some such that .
If is a value, then rule 1 of the canonical form lemma tells us that must be either or , in which case or applies to .
Otherwise, if , then by ,

For , we have .
is a value, by rule 5 of the inversion lemma and by rule 2 of the canonical form, for some numeric value . Therefore, is a value. If , then .
 The cases for , and are similar.
Preservation Theorem
Theorem: Types are preserved by onestep evaluation. If and , then .
Proof: by induction on the given typing derivation
 For and , the precondition doesn’t hold (no reduction is possible), so it’s trivially true. Indeed, is already a value, either or .
 For , there are three evaluation rules by which can be derived, depending on
 If , then by we have , and from rule 3 of the inversion lemma and the assumption that , we have , that is
 If , then by we have , and from rule 3 of the inversion lemma and the assumption that , we have , that is
 If , then by the induction hypothesis, . Combining this with the assumption that and , we can apply to conclude , that is
Messing with it
Removing a rule
What if we remove ? Then pred 0
type checks, but it is stuck and is not a value; the progress theorem fails.
Changing typechecking rule
What if we change the to the following?
This doesn’t break our type system. It’s still sound, but it rejects ifelse expressions that return other things than numbers (e.g. booleans). But that is an expressiveness problem, not a soundness problem; our type system disallows things that would otherwise be fine by the evaluation rules.
Adding bit
We could add a boolean to natural function bit(t)
. We’d have to add it to the grammar, add some evaluation and typing rules, and prove progress and preservation.
We’ll do something similar this below, so the full proof is omitted.
Simply typed lambda calculus
Simply Typed Lambda Calculus (STLC) is also denoted . The “pure” form of STLC is not very interesting on the typelevel (unlike for the termlevel of pure lambda calculus), so we’ll allow base values that are not functions, like booleans and integers. To talk about STLC, we always begin with some set of “base types”:
1
2
3
T ::= // types
Bool // type of booleans
T > T // type of functions
In the following examples, we’ll work with a mix of our previously defined toy language, and lambda calculus. This will give us a little syntactic sugar.
1
2
3
4
5
6
7
8
9
10
11
12
t ::= // terms
x // variable
λx. t // abstraction
t t // application
true // constant true
false // constant false
if t then t else t // conditional
v ::= // values
λx. t // abstraction value
true // true value
false // false value
Type annotations
We will annotate lambdaabstractions with the expected type of the argument, as follows:
We could also omit it, and let type inference do the job (as in OCaml), but for now, we’ll do the above. This will make it simpler, as we won’t have to discuss inference just yet.
Typing rules
In STLC, we’ve introduced abstraction. To add a typing rule for that, we need to encode the concept of an environment , which is a set of variable assignments. We also introduce the “turnstile” symbol , meaning that the environment can verify the right handside typing, or that must imply the righthand side.
This additional concept must be taken into account in our definition of progress and preservation:
 Progress: If , then either is a value or else for some
 Preservation: If and , then
To prove these, we must take the same steps as above. We’ll introduce the inversion lemma for typing relations, and restate the canonical forms lemma in order to prove the progress theorem.
Inversion lemma
Let’s start with the inversion lemma.
 If then
 If then
 If then and .
 If then
 If then for some with
 If then there is some type such that and .
Canonical form
The canonical forms are given as follows:
 If is a value of type Bool, then it is either or
 If is a value of type then has the form
Progress
Finally, we get to prove the progress by induction on typing derivations.
Theorem: Suppose that is a closed, well typed term (that is, for some type ). Then either is a value, or there is some such that .
 For boolean constants, the proof is immediate as is a value
 For variables, the proof is immediate as is closed, and the precondition therefore doesn’t hold
 For abstraction, the proof is immediate as is a value

Application is the only case we must treat.
Consider , with and .
By the induction hypothesis, is either a value, or it can make a step of evaluation. The same goes for .
If can reduce, then rule applies to . Otherwise, if it is a value, and can take a step, then applies. Otherwise, if they are both values (and we cannot apply reduction), then the canonical forms lemma above tells us that has the form , and so rule applies to .
Preservation
Theorem: If and then .
Proof: by induction on typing derivations. We proceed on a casebycase basis, as we have done so many times before. But one case is hard: application.
For , such that and , and where , we want to show .
To do this, we must use the inversion lemma for evaluation (note that we haven’t written it down for STLC, but the idea is the same). There are three subcases for it, starting with the following:
The lefthand side is , and the righthand side of application is a value . In this case, we know that the result of the evaluation is given by .
And here, we already run into trouble, because we do not know about how types act under substitution. We will therefore need to introduce some lemmas.
Weakening lemma
Weakening tells us that we can add assumptions to the context without losing any true typing statements:
If , and the environment has no information about —that is, —then the initial assumption still holds if we add information about to the environment:
Moreover, the latter derivation has the same depth as the former.
Permutation lemma
Permutation tells us that the order of assumptions in does not matter.
If and is a permutation of , then .
Moreover, the latter derivation has the same depth as the former.
Substitution lemma
Substitution tells us that types are preserved under substitution.
That is, if and , then .
The proof goes by induction on the derivation of , that is, by cases on the final typing rule used in the derivation.

Case : in this case, .
Thanks to typechecking, we know that the environment validates and . In this case, the resulting type of the application is .
By the induction hypothesis, , and .
By , the environment then also verifies the application of these two substitutions as : . We can factorize the substitution to obtain the conclusion, i.e.
 Case : if ( is a simple variable ) where . There are two subcases to consider here, depending on whether is or another variable:
 If , then . The result is then , which is among the assumptions of the lemma
 If , then , and the desired result is immediate

Case : if , with , and .
Based on our hygiene convention, we may assume and .
Using permutation on the first given subderivation in the lemma (