# OUTPUT

The blog of Maxime Kjaer

# CS-452 Foundations of Software

Work in progress

## Writing a parser with parser combinators

In Scala, you can (ab)use the operator overload to create an embedded DSL (EDSL) for grammars. While a grammar may look as follows in a grammar description language (Bison, Yak, ANTLR, …):

In Scala, we can model it as follows:

This is perhaps a little less elegant, but allows us to encode it directly into our language, which is often useful for interop.

The ~, |, rep and opt are parser combinators. These are primitives with which we can construct a full parser for the grammar of our choice.

### Boilerplate

First, let’s define a class ParseResult[T] as an ad-hoc monad; parsing can either succeed or fail:

👉 Nothing is the bottom type in Scala; it contains no members, and nothing can extend it

Let’s also define the tokens produced by the lexer (which we won’t define) as case classes extending Token:

Input into the parser is then a lazy stream of tokens (with positions for error diagnostics, which we’ll omit here):

We can then define a standard, sample parser which looks as follows on the type-level:

### The basic idea

For each language (defined by a grammar symbol S), define a function f that, given an input stream i (with tail i'):

• if a prefix of i is in S, return Success(Pair(x, i')), where x is a result for S
• otherwise, return Failure(msg, i), where msg is an error message string

The first is called success, the second is failure. We can compose operations on this somewhat conveniently, like we would on a monad (like Option).

### Simple parser primitives

All of the above boilerplate allows us to define a parser, which succeeds if the first token in the input satisfies some given predicate pred. When it succeeds, it reads the token string, and splits the input there.

We can use this to define a keyword parser:

Marking it as implicit allows us to write keywords as normal strings, where we can omit the keyword call (this helps us simplify the notation in our DSL; we can write "if" instead of keyword("if")).

We can make other parsers for our other case classes quite simply:

### Parser combinators

We are going to define the following parser combinators:

• ~: sequential composition
• <~, >~: sequential composition, keeping left / right only
• |: alternative
• opt(X): option (like a ? quantifier in a regex)
• rep(X): repetition (like a * quantifier in a regex)
• repsep(P, Q): interleaved repetition
• ^^: result conversion (like a map on an Option)
• ^^^: constant result (like a map on an Option, but returning a constant value regardless of result)

But first, we’ll write some very basic parser combinators: success and failure, that respectively always succeed and always fail:

All of the above are methods on a Parser[T] class. Thanks to infix space notation in Scala, we can denote x.y(z) as x y z, which allows us to simplify our DSL notation; for instance A ~ B corresponds to A.~(B).

👉 In Scala, T ~ U is syntactic sugar for ~[T, U], which is the type of the case class we’ll define below

For the ~ combinator, when everything works, we’re using ~, a case class that is equivalent to Pair, but prints the way we want to and allows for the concise type-level notation above.

At this point, we thus have two different meanings for ~: a function ~ that produces a Parser, and the ~(a, b) case class pair that this parser returns (all of this is encoded in the function signature of the ~ function).

Note that the | combinator takes the right-hand side parser as a call-by-name argument. This is because we don’t want to evaluate it unless it is strictly needed—that is, if the left-hand side fails.

^^ is like a map operation on Option; P ^^ f succeeds iff P succeeds, in which case it applies the transformation f on the result of P. Otherwise, it fails.

### Shorthands

We can now define shorthands for common combinations of parser combinators:

Note that none of the above can fail. They may, however, return None or Nil wrapped in success.

As an exercise, we can implement the rep1(P) parser combinator, which corresponds to the + regex quantifier:

### Example: JSON parser

We did not mention lexical.delimiters and lexical.reserved in the above, and for the sake of brevity, we omit the implementation of stringLit and numericLit.

### The trouble with left-recursion

Parser combinators work top-down and therefore do not allow for left-recursion. For example, the following would go into an infinite loop, where the parser keeps recursively matching the same token unto expr:

Let’s take a look at an arithmetic expression parser:

This definition of expr, namely term ~ rep("-" ~ term) produces a right-leaning tree. For instance, 1 - 2 - 3 produces 1 ~ List("-" ~ 2, ~ "-" ~ 3).

The solution is to combine calls to rep with a final foldLeft on the list:

👉 It used to be that the standard library contained parser combinators, but those are now a separate module. This module contains a chainl (chain-left) method that reduces after a rep for you.

## Arithmetic expressions — abstract syntax and proof principles

This section follows Chapter 3 in TAPL.

### Basics of induction

Ordinary induction is simply:

Suppose P is a predicate on natural numbers.
Then:
If P(0)
and, for all i, P(i) implies P(n) holds for all n
then P(n) holds for all n


We can also do complete induction:

Suppose P is a predicate on natural numbers.
Then:
If for each natural number n,
given P(i) for all i < n we can show P(n)
then P(n) holds for all n


It proves exactly the same thing as ordinary induction, it is simply a restated version. They’re interderivable; assuming one, we can prove the other. Which one to use is simply a matter of style or convenience. We’ll see some more equivalent styles as we go along.

### Mathematical representation

Let’s assume the following grammar:

What does this really define? A few suggestions:

• A set of character strings
• A set of token lists
• A set of abstract syntax trees

It depends on how you read it; a grammar like the one above contains information about all three.

However, we are mostly interested in the ASTs. The above grammar is therefore called an abstract grammar. Its main purpose is to suggest a mapping from character strings to trees.

For our use of these, we won’t be too strict with these. For instance, we’ll freely use parentheses to disambiguate what tree we mean to describe, even though they’re not strictly supported by the grammar. What matters to us here aren’t strict implementation semantics, but rather that we have a framework to talk about ASTs. For our purposes, we’ll consider that two terms producing the same AST are basically the same; still, we’ll distinguish terms that only have the same evaluation result, as they don’t necessarily have the same AST.

How can we express our grammar as mathematical expressions? A grammar describes the legal set of terms in a program by offering a recursive definition. While recursive definitions may seem obvious and simple to a programmer, we have to go through a few hoops to make sense of them mathematically.

#### Mathematical representation 1

We can use a set of terms; it is the smallest set such that:

1. ,
2. If then ,
3. If then we also have .

#### Mathematical representation 2

We can also write this somewhat more graphically:

This is exactly equivalent to representation 1, but we have just introduced a different notation. Note that “the smallest set closed under…” is often not stated explicitly, but implied.

#### Mathematical representation 3

Alternatively, we can build up our set of terms as an infinite union:

We can thus build our final set as follows:

Note that we can “pull out” the definition into a generating function :

The generating function is thus defined as:

Each function takes a set of terms as input and produces “terms justified by ” as output; that is, all terms that have the items of as subterms.

The set is said to be closed under F or F-closed if .

The set of terms as defined above is the smallest F-closed set. If is another F-closed set, then .

#### Comparison of the representations

We’ve seen essentially two ways of defining the set (as representation 1 and 2 are equivalent, but with different notation):

1. The smallest set that is closed under certain rules. This is compact and easy to read.
2. The limit of a series of sets. This gives us an induction principle on which we can prove things on terms by induction.

The first one defines the set “from above”, by intersecting F-closed sets.

The second one defines it “from below”, by starting with and getting closer and closer to being F-closed.

These are equivalent (we won’t prove it, but Proposition 3.2.6 in TAPL does so), but can serve different uses in practice.

### Induction on terms

First, let’s define depth:

The depth of a term is the smallest such that .

The way we defined , it gets larger and larger for increasing ; the depth gives us the step at which it is introduced into the set.

We see that if a term is in , then all of its immediate subterms must be in , meaning that they must have smaller depth.

This justifies the principle of induction on terms. Let P be a predicate on a term:

If, for each term s,
given P(r) for all immediate subterms r of s we can show P(s)
then P(t) holds for all t


All this says is that if we can prove the induction step from subterms to terms (under the induction hypothesis), then we have proven the induction.

### Inductive function definitions

An inductive definition is used to define the elements in a set recursively, as we have done above. The recursion theorem states that a well-formed inductive definition indeed defines a function, but we’ll get into that later.

Let’s define our grammar function a little more formally. Constants are the basic values that can’t be expanded further; in our example, they are true, false, 0. As such, the set of constants appearing in a term , written , is defined as follows:

This seems simple, but these semantics aren’t perfect. First off, a mathematical definition simply assigns a convenient name to some previously known thing. But here, we’re defining the thing in terms of itself, recursively. And the semantics above also allow us to define ill-formed inductive definitions:

The last rule produces infinitely large rules (if we implemented it, we’d expect some kind of stack overflow). We’re missing the rules for if-statements, and we have a useless rule for 0, producing empty sets.

How do we tell the difference between a well-formed inductive definition, and an ill-formed one as above? What is well-formedness anyway?

#### What is a function?

A relation over is a subset of , where the Cartesian product is defined as:

A function from (domain) to (co-domain) can be viewed as a two-place relation, albeit with two additional properties:

• It is total:
• It is deterministic:

Totality ensures that the A domain is covered, while being deterministic just means that the function always produces the same result for a given input.

#### Induction theorem

As previously stated, is a relation. It maps terms (A) into the set of constants that they contain (B). The induction theorem states that it is also a function. The proof is as follows.

is total and deterministic: for each term there is exactly one set of terms such that . Not that this inclusion notation is equivalent to saying . The proof is done by induction on .

To be able to apply the induction principle for terms, we must first show that for an arbitrary term , under the following induction hypothesis:

For each immediate subterm of , there is exactly one set of terms such that

Then the following needs to be proven as an induction step:

There is exactly one set of terms such that

We proceed by cases on (as if we were pattern matching 😉):

• If is , or

We can immediately see from the definition that of that there is exactly one set of terms ) such that .

This constitutes our base case.

• If is , or

The immediate subterm of is , and the induction hypothesis tells us that there is exactly one set of terms such that . But then it is clear from the definition that there is exactly one set of terms such that .

• If is

The induction hypothesis tells us:

• There is exactly one set of terms such that
• There is exactly one set of terms such that
• There is exactly one set of terms such that

It is clear from the definition of that there is exactly one set such that .

This proves that is indeed a function.

But what about ? It is also a relation, but it isn’t a function. For instance, and , which violates determinism. To reformulate this in terms of the above, there are two sets such that , namely and .

Note that there are many other problems with , but this is sufficient to prove that it isn’t a function.

The week 2 slides have another example of a proof by induction, which I won’t go into here.

### Operational semantics and reasoning

#### Evaluation

Suppose we have the following syntax

The evaluation relation is the smallest relation closed under the following rules.

The following are computation rules, defining the “real” computation steps:

The following is a congruence rule, defining where the computation rule is applied next:

We want to evaluate the condition before the conditional clauses in order to save on evaluation; we’re not sure which one should be evaluated, so we need to know the condition first.

#### Derivations

We can describe the evaluation logically from the above rules using derivation trees. Suppose we want to evaluate the following (with parentheses added for clarity): if (if true then true else false) then false else true.

In an attempt to make all this fit onto the screen, true and false have been abbreviated T and F in the derivation below, and the then keyword has been replaced with a parenthesis notation for the condition.

The final statement is a conclusion. We say that the derivation is a witness for its conclusion (or a proof for its conclusion). The derivation records all reasoning steps that lead us to the conclusion.

We can introduce the inversion lemma, which tells us how we got to a term.

Suppose we are given a derivation witnessing the pair in the evaluation relation. Then either:

1. If the final rule applied in was , then we have and for some and
2. If the final rule applied in was , then we have and for some and
3. If the final rule applied in was , then we have and , for some . Moreover, the immediate subderivation of witnesses .

This is super boring, but we do need to acknowledge the inversion lemma before we can do induction proofs on derivations. Thanks to the inversion lemma, given an arbitrary derivation with conclusion , we can proceed with a case-by-case analysis on the final rule used in the derivation tree.

For instance, if we define to be the i at which describing it was created (so , etc.). We want to prove that if , then .

1. If the final rule applied in was , then we have and , and the result is immediate from the definition of
2. If the final rule applied in was , then we have and , and the result is immediate from the definition of
3. If the final rule applied in was , then we have and . In this case, is witnessed by a derivation . By the induction hypothesis, , and the result is then immediate from the definition of

TODO

### Abstract machines

An abstract machine consists of:

• A set of states
• A transition relation of states, written

means that evaluates to in one step. Note that is a relation, and that is shorthand for . Often, this relation is a partial function (not necessarily covering the domain A; there is at most one possible next state). But without loss of generality, there may be many possible next states, determinism isn’t a criterion here.

### Normal forms

A normal form is a term that cannot be evaluated any further. More formally, a term is a normal form if there is no such that . A normal form is a state where the abstract machine is halted; we can regard it as the result of a computation.

#### Values that are normal form

Previously, we intended for our values (true and false) to be exactly that, the result of a computation. Did we get that right?

Let’s prove that a term is a value it is in normal form.

• The direction is immediate from the definition of the evaluation relation .
• The direction is more conveniently proven as its contrapositive: if is not a value, then it is not a normal form, which we can prove by induction on the term .

Since is not a value, it must be of the form . If is directly true or false, then E-IfTrue or E-IfFalse apply, and we are done.

Now, if is an if-then-else, and isn’t a value, by the induction hypothesis, there is a such that . Then rule E-If yields , which proves that is not in normal form.

#### Values that are not normal form

Let’s introduce new syntactic forms, with new evaluation rules.

The evaluation rules are given as follows:

All values are still normal forms. But are all normal forms values? Not in this case. For instance, succ true, iszero true, etc, are normal forms. These are stuck terms: they are in normal form, but are not values. In general, these correspond to some kind of type error, and one of the main purposes of a type system is to rule these kinds of situations out.

### Multi-step evaluation

Let’s introduce the multi-step evaluation relation, . It is the reflexive, transitive closure of single-step evaluation, i.e. the smallest relation closed under these rules:

In other words, it corresponds to any number of single consecutive evaluations.

### Termination of evaluation

We’ll prove that evaluation terminates, i.e. that for every term there is some normal form such that .

First, let’s recall our proof that . Now, for our proof by contradiction, assume that we have an infinite-length sequence such that:

But this sequence cannot exist, which is a contradiction.

Most termination proofs have the same basic form. We want to prove that the relation is terminating — that is, there are no infinite sequences etc. such that for each . We proceed as follows:

1. Choose a well-suited set with partial order such that there are no infinite descending chains in . Also choose a function .
2. Show
3. Conclude that are no infinite sequences such that for each . If there were, we could construct an infinite descending chain in .

As a side-note, partial order is defined as the following properties:

1. Anti-symmetry:
2. Transitivity:

We can add a third property to achieve total order, namely .

## Lambda calculus

Lambda calculus is Turing complete, and is higher-order (functions are data). In lambda calculus, all computation happens by means of function abstraction and application.

Lambda calculus is isomorphic to Turing machines.

Suppose we wanted to write a function plus3 in our previous language:

The way we write this in lambda calculus is:

is written x => t in Scala, or fun x -> t in OCaml. Application of our function, say plus3(succ 0), can be written as:

Abstraction over functions is possible using higher-order functions, which we call -abstractions. An example of such an abstraction is the function below, which takes an argument and uses it in the function position.

If we apply to an argument like , we can just use the substitution rule to see how that defines a new function.

Another example: the double function below takes two arguments, as a curried function would. First, it takes the function to apply twice, then the argument on which to apply it, and then returns .

### Pure lambda calculus

Once we have -abstractions, we can actually throw out all other language primitives like booleans and other values; all of these can be expressed as functions, as we’ll see below. In pure lambda-calculus, everything is a function.

Variables will always denote a function, functions always take other functions as parameters, and the result of an evaluation is always a function.

The syntax of lambda-calculus is very simple:

A few rules and syntactic conventions:

• Application associates to the left, so means , not .
• Bodies of lambda abstractions extend as far to the right as possible, so means , not

#### Scope

The lambda expression binds the variable , with a scope limited to . Occurrences of inside of are said to be bound, while occurrences outside are said to be free.

#### Operational semantics

As we saw with our previous language, the rules could be distinguished into computation and congruence rules. For lambda calculus, the only computation rule is:

The notation means “the term that results from substituting free occurrences of in with ”.

The congruence rules are:

A lambda-expression applied to a value, , is called a reducible expression, or redex.

#### Evaluation strategies

There are alternative evaluation strategies. In the above, we have chosen call by value (which is the standard in most mainstream languages), but we could also choose:

• Full beta-reduction: any redex may be reduced at any time. This offers no restrictions, but in practice, we go with a set of restrictions like the ones below (because coding a fixed way is easier than coding probabilistic behavior).
• Normal order: the leftmost, outermost redex is always reduced first. This strategy allows to reduce inside unapplied lambda terms
• Call-by-name: allows no reductions inside lambda abstractions. Arguments are not reduced before being substituted in the body of lambda terms when applied. Haskell uses an optimized version of this, call-by-need (aka lazy evaluation).

### Classical lambda calculus

Classical lambda calculus allows for full beta reduction.

#### Confluence in full beta reduction

The congruence rules allow us to apply in different ways; we can choose between and every time we reduce an application, and this offers many possible reduction paths.

While the path is non-deterministic, is the result also non-deterministic? This question took a very long time to answer, but after 25 years or so, it was proven that the result is always the same. This is known the Church-Rosser confluence theorem:

Let be terms such that and . Then there exists a term such that and

#### Alpha conversion

Substitution is actually trickier than it looks! For instance, in the expression , the first occurrence of is bound, while the second is free. This is comparable to scope in most programming languages, where we should understand that these are two different variables, and .

The above example had a variable that is both bound and free, which is something that we’ll try to avoid. This is called a hygiene condition.

We can transform a unhygienic expression to a hygienic one by renaming bound variables before performing the substitution. This is known as alpha conversion.

Let be the set of free variables in a term . Alpha conversion is given by the following conversion rule:

And these equivalence rules (in mathematics, equivalence is defined as symmetry and transitivity):

The congruence rules are as usual.

### Programming in lambda-calculus

#### Multiple arguments

The way to handle multiple arguments is by currying:

#### Booleans

The fundamental, universal operator on booleans is if-then-else, which is what we’ll replicate to model booleans. We want true to be equivalent to if (true), and false to if (false).

We can consider these as booleans. Equivalently tru can be considered as a function performing (t1, t2) => if (true) t1 else t2. To understand this, let’s try to apply to two arguments:

This works equivalently for fls.

We can also do inversion, conjunction and disjunction with lambda calculus:

• not is a function that is equivalent to not(b) = if (b) false else true.
• and is equivalent to and(b, c) = if (b) c else false
• or is equivalent to or(b, c) = if (b) true else c

#### Pairs

The fundamental operations are construction pair(a, b), and selection pair._1 and pair._2.

• pair is equivalent to pair(f, s) = (b => b f s)
• When tru is applied to pair, it selects the first element, by definition of the boolean, and that is therefore the definition of fst
• Equivalently for fls applied to pair, it selects the second element

#### Numbers

We’ve actually been representing numbers as lambda-calculus numbers all along! Our succ function represents what’s more formally called Church numerals.

Every number is represented by a term taking two arguments, which are and (for “successor” and “zero”), and applies to , times.

Note that ’s implementation is the same as that of (just with renamed variables).

We can also implement some functions on numbers.

• Successor : we apply the successor function to (which has been correctly instantiated with and )
• Addition : we pass the instantiated as the zero of
• Multiplication : instead of the successor function, we pass the addition by function.
• Zero test : zero has the same implementation as false, so we can lean on that to build an iszero function

What about predecessor? This is a little harder, and it’ll take a few steps to get there. The main idea is that we find the predecessor by rebuilding the whole succession up until our number. At every step, we must generate the number and its predecessor: zero is , and all other numbers are . Once we’ve reconstructed this pair, we can get the predecessor by taking the first element of the pair.

Sidenote

The story goes that Church was stumped by predecessors for a long time, and finally found out about this at the barber, and jumped out half shaved to write it down.

### Recursion in lambda-calculus

Let’s start by taking a step back. We talked about normal forms and terms for which we terminate; does lambda calculus always terminate? It’s Turing complete, so it must be able to loop infinitely (otherwise, we’d have solved the halting problem!).

The trick to recursion is self-application:

From a type-level perspective, we would cringe at this. This should not be possible in the typed world, but in the untyped world we can do it. We can construct a simple infinite loop in lambda calculus as follows:

The expression evaluates to itself in one step; it never reaches a normal form, it loops infinitely, diverges. This is not a stuck term though; evaluation is always possible.

In fact, there are no stuck terms in pure lambda calculus. Every term is either a value or reduces further.

So it turns out that isn’t so terribly useful. Let’s try to construct something more practical:

Now, the divergence is a little more interesting:

This function is known as a Y-combinator. It still delays infinitely (though note that while it works in classical lambda calculus, it blows up in call-by-name).

### Equivalence of lambda terms

We’ve seen how to define Church numerals and successor. How can we prove that is equal to ?

The naive approach unfortunately doesn’t work; they do not evaluate to the same value.

This still seems very close. If we could simplify a little further, we do see how they would be the same.

The intuition behind the Church numeral representation was that a number is represented as a term that “does something times to something else”. takes a term that “does something times to something else”, and returns a term that “does something times to something else”.

What we really care about is that behaves the same as when applied to two arguments. We want behavioral equivalence. But what does that mean? Roughly, two terms and are behaviorally equivalent if there is no “test” that distinguishes and .

Let’s define this notion of “test” this a little more precisely, and specify how we’re going to observe the results of a test. We can use the notion of normalizability to define a simple notion of a test:

Two terms and are said to be observationally equivalent if they are either both normalizable (i.e. they reach a normal form after a finite number of evaluation steps), or both diverge.

In other words, we observe a term’s behavior by running it and seeing if it halts. Note that this is not decidable (by the halting problem).

For instance, and are not observationally equivalent (one diverges, one halts), while and are (they both halt).

Observational equivalence isn’t strong enough of a test for what we need; we need behavioral equivalence.

Two terms and are said to be behaviorally equivalent if, for every finite sequence of values the applications and are observationally equivalent.

This allows us to assert that true and false are indeed different:

The former returns a normal form, while the latter diverges.

« Back