# OUTPUT

The blog of Maxime Kjaer

# CS-250 Algorithms

• Course homepage
• Introduction to Algorithms, Third edition (2009), T. Cormen, C. Lerserson, R. Rivest, C. Stein (book on which the course is based)
• The Art of Computer Programming, Donald Knuth (a classic)

## Analyzing algorithms

We’ll work under a model that assumes that:

• Instructions are executed one after another
• Basic instructions (arithmetic, data movement, control) take constant O(1) time
• We don’t worry about precision, although it is crucial in certain numerical applications

We usually concentrate on finding the worst-case running time. This gives a guaranteed upper bound on the running time. Besides, the worst case often occurs in some algorithms, and the average is often as bad as the worst-case.

## Sorting

The sorting problem’s definition is the following:

• Input: A sequence of $n$ numbers $(a_1, a_2, \dots, a_n)$
• Output: A permutation (reordering) $(a_1', a_2', \dots, a_n')$ of the input sequence in increasing order

### Insertion sort

Works in the same way as sorting a deck of cards.

#### Correctness

Using induction-like logic.

Loop invariant: At the start of each iteration of the “outer” for loop — the loop indexed by j — the subarray A[1, ..., j-1] consists of the elements originally in A[1, ..., j-1] but in sorted order.

We need to verify:

• Initialization: We start at j=2 so we start with one element. One number is trivially sorted compared to itself.
• Maintenance: We’ll assume that the invariant holds at the beginning of the iteration when j=k. The body of the for loop works by moving A[k-1], A[k-2] ansd so on one step to the right until it finds the proper position for A[k] at which points it inserts the value of A[k]. The subarray A[1..k] then consists of the elements in a sorted order. Incrementing j to k+1 preserves the loop invariant.
• Termination: The condition of the for loop to terminate is that j>n; hence, j=n+1 when the loop terminates, and A[1..n] contains the original elements in sorted order.

#### Analysis

• Best case: The array is already sorted, and we can do $\Theta(n)$
• Worst case: The array is sorted in reverse, so $j=t_j$ and the algorithm runs in $\Theta(n^2)$:

### Merge Sort

Merge sort is a divide-and-conquer algorithm:

• Divide the problem into a number of subproblems that are smaller instances of the same problem
• Conquer the subproblems by solving them recursively. Base case: If the subproblems are small enough, just solve them by brute force
• Combine the subproblem solutions to give a solution to the original problem

Merge sort can be implemented recursively as such:

In the merge function, instead of checking whether either of the two lists that we’re merging is empty, we can just add a sentinel to the bottom of the list, with value ∞. This works because we’re just picking the minimum value of the two.

This algorithm works well in parallel, because we can split the lists on separate computers.

#### Correctness

Assuming merge is correct, we can do a proof by strong induction on $n = r - p$.

• Base case, $n = 0$: In this case $r = p$ so A[p..r] is trivially sorted.
• Inductive case: By induction hypothesis merge_sort(A, p, q) and merge_sort(A, q+1, r) successfully sort the two subarrays. Therefore a correct merge procedure will successfully sort A[p..q] as required.

## Recurrences

Let’s try to provide an analysis of the runtime of merge sort.

• Divide: Takes contant time, i.e., $D(n) = \Theta(1)$
• Conquer: Recursively solve two subproblems, each of size $n/2$, so we create a problem of size $2T(n/2)$ for each step.
• Combine: Merge on an n-element subarray takes $\Theta(n)$ time, so $C(n) = \Theta(n)$

Our recurrence therefore looks like this:

Trying to substitue $T(n/2)$ multiple times yields $T(n) = 2^k T(n/2^k) + kcn$. By now, a qualified guess would be that $T(n) = \Theta(n \log{n})$. We’ll prove this the following way:

All-in-all, merge sort runs in $\Theta(n \log{n})$, both in worst and best case.

For small instances, insertion sort can still be faster despite its quadratic runtime; but for bigger instances, merge sort is definitely faster.

However, merge sort is not in-place, meaning that it does not operate directly on the list that needs to be sorted, unlike insertion sort.

### Master Theorem

Generally, we can solve recurrences in a black-box manner thanks to the Master Theorem:

Let $a \geq 1$ and $b > 1$ be constants, let $T(n)$ be defined on the nonnegative integers by the recurrence:

Then, $T(n)$ has the following asymptotic bounds

1. If $f(n) = \mathcal{O}(n^{\log_b{a}-\epsilon})$ for some constant $\epsilon > 0$, then $T(n)=\Theta(n^{\log_b{a}})$
2. If $f(n) = \Theta(n^{\log_b{a}})$, then $T(n) = \Theta(n^{\log_b{a}} \log{n})$
3. If $f(n) = \Omega(n^{\log_b{a}+\epsilon})$ for some constant $\epsilon > 0$, and if $a \cdot f(n/b) \leq c\cdot f(n)$ for some constant $% $ and all sufficiently large $n$, then $T(n) = \Theta(f(n))$

The 3 cases correspond to the following cases in a recursion tree:

1. Leaves dominate
2. Each level has the same cost
3. Roots dominate

## Maximum-Subarray Problem

### Description

You have the prices that a stock traded at over a period of n consecutive days. When should you have bought the stock? When should you have sold the stock?

The naïve approach of buying at the lowest and selling at the highest won’t work, as the lowest price might occur after the highest.

• Input: An array A[1..n] of numbers
• Output: Indices i and j such that A[i..j] has the greatest sum of any nonempty, contiguous subarray of A, along with the sum of the values in A[i..j]

### Divide and Conquer

• Divide the subarray into two subarrays of as equal size as possible: Find the midpoint mid of the subarrays, and consider the subarrays A[low..mid] and A[mid+1..high]
• This can run in $\Theta(1)$
• Conquer by finding maximum subarrays of A[low..mid] and A[mid+1..high]
• Recursively solve two subproblems, each of size $n/2$, so $2T(n/2)$
• Combine: Find a maximum subarray that crosses the midpoint, and use the best solution out of the three (that is, left, midpoint and right).
• The merge is dominated by find_max_crossing_subarray so $\Theta(n)$

The overall recursion is $T(n) = 2T(n/2) + \Theta(n)$, so the algorithm runs in $\Theta(n\log{n})$.

In pseudo-code the algorithm is:

## Matrix multiplication

• Input: Two $n\times n$ (square) matrices, $A = (a_{ij})$ and $B = (b_{ij})$
• Output: $n\times n$ matrix $C = (c_{ij})$ where $C=A\cdot B$.

### Naive Algorithm

The naive algorithm simply calculates $c_{ij}=\sum_{k=1}^n a_{ik}b_{kj}$. It runs in $\Theta(n^3)$ and uses $\Theta(n^2)$ space.

We can do better.

### Divide-and-Conquer algorithm

• Divide each of A, B and C into four $n/2\times n/2$ matrices so that:

• Conquer: We can recursively solve 8 matrix multiplications that each multiply two $n/2 \times n/2$ matrices, since:
• $C_{11}=A_{11}B_{11} + A_{12}B_{21}, \qquad C_{12}=A_{11}B_{12} + A_{12}B_{22}$.
• $C_{21}=A_{21}B_{11} + A_{22}B_{21}, \qquad C_{22}=A_{21}B_{12} + A_{22}B_{22}$.
• Combine: Make the additions to get to C.
• $C_{11}=A_{11}B_{11} + A_{12}B_{21}$ is $\Theta(n^2)$.

#### Pseudocode and analysis

The whole recursion formula is $T(n) = 8\cdot T(n/2) + \Theta(n^2) = \Theta(n^3)$. So we did all of this for something that doesn’t even beat the naive implementation!!

### Strassen’s Algorithm for Matrix Multiplication

What really broke the Divide-and-Conquer approach is the fact that we had to do 8 matrix multiplications. Could we do fewer matrix multiplications by increasing the number of additions?

There is a way to do only 7 recursive multiplications of $n/2\times n/2$ matrices, rather than 8. Our recurrence relation is now:

Strassen’s method is the following:

Note that when our matrix’s size isn’t a power of two, we can just pad our operands with zeros until we do have a power of two size. This still runs in $\Theta(n^{log_2{(7)}})$.

• First to beat $\Theta(n^3)$ time
• Faster methods are known today: Coppersmith and Winograd’s method runs in time $\mathcal{O}(n^{2.376})$, which has recently been improved by Vassilevska Williams to $\mathcal{O}(n^{2.3727})$.
• How to multiply matrices in best way is still a big open problem
• The naive method is better for small instances because of hidden constants in Strassen’s method’s runtime.

## Heaps and Heapsort

### Heaps

A heap is a nearly complete binary tree (meaning that the last level may not be complete). The main property of a heap is:

• Max-heap: the key of i’s children is smaller or equal to i’s key.
• Min-heap: the key of i’s children is greater or equal to i’s key.

In a max-heap, the maximum element is at the root, while the minimum element takes that place in a min-heap.

The height of a node is the number of edges on a longest simple path from the node down to a leaf. The height of the heap is therefore simply the height of the root, $\Theta(\log{n})$.

We can store the heap in a list, where:

• Root is A[1]
• Left(i) is 2i
• Right(i) is 2i + 1
• Parent(i) is floor(i/2)

#### Max-Heapify

It’s a very important algorithm for manipulating heaps. Given an i such that the subtrees of i are heaps, it ensures that the subtree rooted at i is a heap satisfying the heap property.

• Compare A[i], A[Left(i)], A[Right(i)]
• If necessary, swap A[i] with the largest of the two children to preserve heap property.
• Continue this process of comparing and swapping down the heap, until the subtree rooted at i is a max-heap.

This runs in $\Theta(\text{height of } i) = \mathcal{O}(\log{n})$ and uses $\Theta(n)$ space.

#### Building a heap

Given an unordered array A of length n, build_max_heap outputs a heap.

This procedure operates in place. We can start at $\lfloor{\frac{n}{2}}\rfloor$ since all elements after that threshold are leaves, so we’re not going to heapify those anyway.

We have $\mathcal{O}(n)$ calls to max_heapify, each of which takes $\mathcal{O}(\log{n})$ time, so we have $\mathcal{O}(n\log{n})$ in total. However, we can give a tighter bound: the time to run max_heapify is linear in the height of the node it’s run on. Hence, the time is bounded by:

Which is $\Theta(n)$, since:

See the slides for a proof by induction.

### Heapsort

Heapsort is the best of both worlds: it’s $O(n\log{n})$ like merge sort, and sorts in place like insertion sort.

• Starting with the root (the maximum element), the algorithm places the maximum element into the correct place in the array by swapping it with the element in the last position in the array.
• “Discard” this last node (knowing that it is in its correct place) by decreasing the heap size, and calling max_heapify on the new (possibly incorrectly-placed) root.
• Repeat this “discarding” process until only one node (the smallest element) remains, and therefore is in the correct place in the array
• build_max_heap: $\mathcal{O}(n)$
• for loop: $n-1$ times
• Exchange elements: $\mathcal{O}(1)$
• max_heapify: $\mathcal{O}(\log{n})$

Total time is therefore $\mathcal{O}(n\log{n})$.

### Priority queues

In a priority queue, we want to be able to:

• Insert elements
• Find the maximum
• Remove and return the largest element
• Increase a key

A heap efficiently implements priority queues.

#### Finding maximum element

This can be done in $\Theta(1)$ by simply returning the root element.

#### Extracting maximum element

We can use max heapify to rebuild our heap after removing the root. This runs in $\mathcal{O}(\log{n})$, as every other operation than max-heapify runs in $\mathcal{O}(1)$.

#### Increasing a value

This traverses the tree upward, and runs in $\mathcal{O}(\log{n})$.

#### Inserting into the heap

We know that this runs in the time for heap_increase_key, which is $\mathcal{O}(\log{n})$.

### Summary

• Heapsort runs in $\mathcal{O}(n\log{n})$ and is in-place. However, a well implemented quicksort usually beats it in practice.
• Heaps efficiently implement priority queues:
• insert(S, x): $\mathcal{O}(\log{n})$
• maximum(S): $\mathcal{O}(1)$
• extract_max(S): $\mathcal{O}(\log{n})$
• increase_key(S, x, k): $\mathcal{O}(\log{n})$

## More data structures

### Stacks

Stacks are LIFO (last-in, first out). It has two basic operations:

• Insertion with push(S, x)
• Delete operation with pop(S)

We can implement a stack using an array where S[1] is the bottom element, and S[S.top] is the element at the top.

These operations are all $\mathcal{O}(1)$.

### Queues

Queues are FIFO (first-in, first-out). They have two basic operations:

• Insertion with enqueue(Q, x)
• Deletion with dequeue(Q)

Again, queues can be implemented using arrays with two pointers: Q[Q.head] is the first element, Q[Q.tail] is the next location where a newly arrived element will be placed.

Notice that we’re not deleting anything per se. We’re just moving the boundaries of the queue and sometimes replacing elements.

Positives:

• Very efficient
• Natural operations

Negatives:

• Limited support: for example, no search
• Implementations using arrays have fixed capacity

Let’s take a look at the operations in linked lists.

This runs in $\mathcal{O}(n)$. If no element with key k exists, it will return nil.

#### Insertion

This runs in $\mathcal{O}(1)$. This linked list doesn’t implement a tail pointer, so it’s important to add the element to the start of the list and not the end, as that would mean traversing the list before adding the element.

#### Deletion

This is $\mathcal{O}(1)$.

#### Summary

• Insertion: $\mathcal{O}(1)$
• Deletion: $\mathcal{O}(1)$
• Search: $\mathcal{O}(n)$

Search in linear time is no fun! Let’s see how else we can do this.

### Binary search trees

The key property of binary search trees is:

• If y is in the left subtree of x, then y.key < x.key
• If y is in the right subtree of x, then y.key >= x.key

The tree T has a root T.root, and a height h (not necessarily log(n), it can vary depending on the organization of the tree).

#### Querying a binary search tree

All of the following algorithms can be implemented in $\mathcal{O}(h)$.

##### Maximum and minimum

By the key property, the minimum is in the leftmost node, and the maximum in rightmost.

##### Successor

The successor or a node x is the node y such that y.key is the smallest key that is strictly larger than x.key.

There are 2 cases when finding the successor of x:

1. x has a non-empty right subtree: x’s successor is the minimum in the right subtree
2. x has an empty right subtree: We go left up the tree as long as we can (until node y), and and x’s successor is the parent of y (or y if it is the root)

In the worst-case, this will have to traverse the tree upward, so it indeed runs in $\mathcal{O}(h)$.

#### Printing a binary search tree

##### Inorder
• Print left subtree recursively
• Print root
• Print right subtree recursively

This would print:

This runs in $\Theta(n)$.

Proof of runtime

$T(n) =$ runtime of inorder_tree_walk on a tree with $n$ nodes.

Base: $n = 0, \quad T(0) = c$

Induction: Suppose that the tree rooted at $x$ has $k$ nodes in its left subtree, and $n-k-1$ nodes in the right.

Therefore:

Therefore, we do indeed have $\Theta(n)$.

Preorder and postorder follow a very similar idea.

##### Preorder
• Print root
• Print left subtree recursively
• Print right subtree recursively

This would print:

##### Postorder
• Print left subtree recursively
• Print right subtree recursively
• Print root

This would print:

#### Modifying a binary seach tree

The data structure must be modified to reflect the change, but in such a way that the binary-search-tree property continues to hold.

##### Insertion
• Search for z.key
• When arrived at Nil insert z at that position

This runs in $\mathcal{O}(h)$.

##### Deletion

Conceptually, there are 3 cases:

1. If z has no children, remove it
2. If z has one child, then make that child take z’s position in the tree
3. If z has two children, then find its successor y and replace z by y

#### Summary

• Query operations: Search, max, min, predecessor, successor: $\mathcal{O}(h)$
• Modifying operations: Insertion, deletion: $\mathcal{O}(h)$

There are efficient procedures to keep the tree balanced (AVL trees, red-black trees, etc.).

### Summary

• Stacks: LIFO, insertion and deletion in $\mathcal{O}(1)$, with an array implementation with fixed capacity
• Queues: FIFO, insertion and deletion in $\mathcal{O}(1)$, with an array implementation with fixed capacity
• Linked Lists: No fixed capcity, insertion and deletion in $\mathcal{O}(1)$, supports search but $\mathcal{O}(n)$ time.
• Binary Search Trees: No fixed capacity, supports most operations (insertion, deletion, search, max, min) in time $\mathcal{O}(h)$.

## Dynamic Programming

The main idea is to remember calculations already made. This saves enormous amounts of computation, as we don’t have to do the same calculations again and again.

There are two different ways to implement this:

### Top-down with memoization

Solve recursively but store each result in a table. Memoizing is remembering what we have computed previously.

As an example, let’s calculate Fibonacci numbers with this technique:

This runs in $\Theta(n)$.

### Bottom-up

Sort the subproblems and solve the smaller ones first. That way, when solving a subproblem, we have already solved the smaller subproblems we need.

As an example, let’s calculate Fibonacci numbers with this technique:

This is also $\Theta(n)$.

### Rod cutting problem

The instance of the problem is:

• A length $n$ of a metal rods
• A table of prices $p_i$ for rods of lengths $i = 1, ..., n$

The objective is to decide how to cut the rod into pieces and maximize the price.

There are $2^{n-1}$ possible solutions (not considering symmetry), so we can’t just try them all. Let’s introduce the following theorem in an attempt at finding a better way:

#### Structural Theorem

If:

• The leftmost cut in an optimal solution is after $i$ units
• An optimal way to cut a solution of size $n-i$ is into rods of sizes $s_1, s_2, ..., s_k$

Then, an optimal way to cut our rod is into rods of size $i, s_1, s_2, ..., s_k$.

Essentially, the theorem say that to obtain an optimal solution, we need to cut the remaining pieces in an optimal way. This is the optimal substructure property. Hence, if we let $r(n)$ be the optimal revenue from a rod of length $n$, we can express $r(n)$ recursively as follows:

#### Algorithm

Let’s try to implement a first algorithm. This is a direct implementation of the recurrence relation above:

But implementing this, we see that it isn’t terribly efficient — in fact, it’s exponential. Let’s try to do this with dynamic programming. Here, we’ll do it in a memoized top-down approach.

Every problem needs to check all the subproblems. Thanks to dynamic programming, we can just sum them instead of multiplying them, as every subproblem is computed once at most. As we’ve seen earlier on with insertion sort, this is:

The total time complexity is $\mathcal{O}(n^2)$. This is even clearer with the bottom up approach:

There’s a for-loop in a for-loop, so we clearly have $\Theta(n^2)$.

Top-down only solves the subproblems actually needed, but recursive calls introduce overhead.

### When can dynamic programming be used?

• Optimal substructure
• An optimal solution can be built by combining optimal solutions for the subproblems
• Implies that the optimal value can be given by a recursive formula
• Overlapping subproblems

### Matrix-chain multiplication

• Input: A chain $(A_1, A_2, ..., A_n)$ of $n$ matrices, where for $i=1, 2, ..., n$, matrix $A_i$ has dimension $p_{i-1}\times p_i$.
• Output: A full parenthesization of the product $A_1 A_2 ... A_n$ in a way that minimizes the number of scalar multiplications.

We are not asked to calculate the product, only find the best parenthesization. Multiplying a matrix of size $p\times q$ with one of $q\times r$ takes $pqr$ scalar multiplications.

We’ll have to use the following theorem:

#### Optimal substructure theorem

If:

• The outermost parenthesization in an optimal solution is $(A_1 A_2 ... A_i)(A_{i+1}A_{i+2}...A_n)$
• $P_L$ and $P_R$ are optimal pernthesizations for $A_1 A_2 ... A_i$ and $A_{i+1}A_{i+2}...A_n$ respectively

Then $((P_L)\cdot (P_R))$ is an optimal parenthesization for $A_1 A_2 ... A_n$

See the slides for proof.

Essentially, to obtain an optimal solution, we need to parenthesize the two remaining expressions in an optimal way.

Hence, if we let $m[i, j]$ be the optimal value for chain multiplication of matrices $A_i, ..., A_j$ (meaning, how many multiplications we can do at best), we can express $m[i, j]$ recursively as follows:

That is the minimum of the left, the right, and the number of operations to combine them.

The runtime of this is $\Theta(n^3)$.

To know how to split up $A_i A_{i+1} ... A_j$ we look in s[i, j]. This split corresponds to m[i, j] operations. To print it, we can do:

### Longest common subsequence

• Input: 2 sequences, $X = (x_1, ..., x_m)$ and $Y = (y_1, ..., y_n)$
• Output: A subsequence common to both whose length is longest. A subsequence doesn’t have to be consecutive, but it has to be in order.

#### Theorem

Let $Z = (z_1, z_2, ..., z_k)$ be any LCS of $X_i$ and $Y_j$

1. If $x_i = y_j$ then $z_k = x_i = y_j$ and $Z_{k-1}$ is an LCS of $X_{i-1}$ and $Y_{j-1}$
2. If $x_i \neq y_j$ then $z_k \neq x_i$ and $Z$ is an LCS of $X_{i-1}$ and $Y_j$
3. If $x_i \neq y_j$ then $z_k \neq y_i$ and $Z$ is an LCS of $X_i$ and $Y_{j-1}$

Proof

1. If $z_k \neq x_i$ then we can just append $x_i = y_j$ to $Z$ to obtain a common subsequence of $X$ and $Y$ of length $k+1$, which would contradict the supposition that $Z$ is the longest common subsequence. Therefore, $z_k = x_i = y_j$

Now onto the second part: $Z_{k-1}$ is an LCS of $X_{i-1}$ and $Y_{j-1}$ of length $(k-1)$. Let’s prove this by contradiction; suppose that there exists a common subsequence $W$ with length greater than $k-1$. Then, appending $x_i = y_j$ to W produces a common subsequence of X and Y whose length is greater than $k$, which is a contradiction.

1. If there were a common subsequence $W$ of $X_{i-1}$ and $Y$ with length greater than $k$, then $W$ would also be a common subsequence of $X$ and $Y$ length greater than $k$, which contradicts the supposition that $Z$ is the LCS.

2. This proof is symmetric to 2.

If $c[i, j]$ is the length of a LCS of $X_i$ and $Y_i$, then:

Using this recurrence, we can fill out a table of dimensions $i \times j$. The first row and the first colum will obviously be filled with 0s. Then we traverse the table row by row to fill out the values according to the following rules.

1. If the characters at indices i and j are equal, then we take the diagonal value (above and left) and add 1.
2. If the characters at indices i and j are different, then we take the max of the value above and that to the left.

Along with the values in the table, we’ll also store where the information of each cell was taken from: left, above or diagonally (the max defaults to the value above if left and above are equal). This produces a table of of numbers and arrows that we can follow from bottom right to top left:

The diagonal arrows in the path correspond to the characters in the LCS. In our case, the LCS has length 4 and it is ABBA.

The runtime of lcs-length is dominated by the two nested loops; runtime is $\Theta(m\cdot n)$.

The runtime of print-lcs is $\mathcal(O)(n)$, where $n=i+j$.

### Optimal binary search trees

Given a sequence of keys $K=(k_1, k_2, \dots, k_n)$ of distinct keys, sorted so that $% $. We want to build a BST. Some keys are more popular than others: key $K_i$ has probability $p_i$ of being searched for. Our BST should have a minimum expected search cost.

The cost of searching for a key $k_i$ is $\text{depth}_T (k_i) + 1$ (we add one because the root is at height 0 but has a cost of 1). The expected search cost would then be:

Optimal BSTs might not have the smallest height, and optimal BSTs might not have highest-probability key at root.

Let $e[i, j]$ denote the expected search cost of an optimal BST of $k_i \dots k_j$.

With the following inputs:

We can fill out the table as follows:

To compute something in this table, we should have already computed everything to the left, and everything below. We can start out by filling out the diagonal, and then filling out the diagonals to its right.

• e[i, j] records the expected search cost of optimal BST of $k_i, \dots, k_j$
• r[i, j] records the best root of optimal BST of $k_i, \dots, k_j$
• w[i, j] records $\sum_{\ell=i}^j{p_\ell}$

The runtime is $\Theta(n^3)$: there are $\Theta(n^2)$ cells to fill in, most of which take $\Theta(n)$ to fill in.

## Graphs

### Representation

One way to store a graph is in an adjacency list. Every vertex has a list of vertices to which it is connected. In this representation, every edge is stored twice for undirected graphs, and once for directed graphs.

• Space: $\Theta(V+E)$
• Time: to list all vertices adjacent to $u$: $\Theta(\text{degree}(u))$
• Time: to determine whether $(u, v)\in E: \mathcal{O}(\text{degree}(u))$

The other way to store this data is in an adjacency matrix. This is a matrix where:

In an undirected graph, this makes for a symmetric matrix. The requirements of this implementation are:

• Space: $\Theta(V^2)$
• Time: to list all vertices adjacent to $u$: $\Theta(V)$
• Time: to determine whether $(u, v)\in E: \Theta(1)$.

We can extend both representations to include other attributes such as edge weights.

### Traversing / Searching a graph

• Input: Graph $G=(V, E)$, either directed or undirected and source vertex $s\in V$.
• Output: $v.d =$ distance (smallest number of edges) from s to v for all vertices v.

The idea is to:

• Send a wave out from $s$
• First hits all vertices 1 edge from it
• From there, it hits all vertices 2 edges from it…

This is $\mathcal{O}(V+E)$:

• $\mathcal{O}(V)$ because each vertex is enqueued at most once
• $\mathcal{O}(E)$ because every vertex is dequeued at most once and we examine the edge $(u, v)$ only when $u$ is dequeued. Therefore, every edge is examined at most once if directed, and at most twice if undirected.

BFS may not reach all vertices. We can save the shortest path tree by keeping track of the edge that discovered the vertex.

#### Depth-first search (DFS)

• Input: Graph $G = (V, E)$, either directed or undirected
• Output: 2 timestamps on each vertex: discovery time v.d and finishing time v.f

This runs in $\Theta(V+E)$.

### Classification of edges

In DFS of an undirected graph we get only tree and back edges; no forward or back-edges.

### Parenthesis theorem

$\forall u, v$, exactly one of the following holds:

1. If v is a descendant of u: u.d < v.d < v.f < u.f
2. If u is a descendant of v: v.d < u.d < u.f < v.f
3. If neither u nor v are descendants of each other: [u.d, u.f] and [v.d, v.f] are disjoint (alternatively, we can say u.d < u.f < v.d < v.f or v.d < v.f < u.d < u.f)

### White-path theorem

Vertex v is a descendant of u if and only if at time u.d there is a path from u to v consisting of only white vertices (except for u, which has just been colored gray).

### Topological sort

• Input: A directed acyclic graph (DAG)
• Output: a linear ordering of vertices such that if $(u, v) \in E$, then u appears somewhere before v

Same running time as DFS, $\Theta(V+E)$.

Proof of correctness

We need to show that if $(u, v) \in E$ then v.f < u.f.
When we explore (u, v), what are the colors of u and v?

• u is gray
• Is v gray too?
• No because then v would be an ancestor of u which implies that there is a back edge so the graph is not acyclic (view lemma below)
• Is v white?
• Then it becomes a descendent of u, and by the parenthesis theorem, u.d < v.d < v.f < u.f
• Is v black?
• Then v is already finished. Since we are exploring (u, v), we have not yet finished u. Therefore, v.f < u.f

#### Lemma: When is a directed graph acyclic?

A directed graph G is acyclic if and only if a DFS of G yields no back edges.

### Strongly connected component

A strongly connected component (SCC) of a directed graph is a maximal set of vertices $C \subseteq V$ such that $\forall u, v\in C, u \leadsto v \text{ and } v\leadsto u$.

Below is a depiction of all SCCs on a graph:

If two SCCs overlap, then they are actually one SCC (and the two parts weren’t really SCCs because they weren’t maximal). Therefore, there cannot be overlap between SCCs.

#### Lemma

$G^{SCC}$ is a directed acyclic graph.

#### Magic Algorithm

The following algorithm computes SCC in a graph G:

Where $G^T$ is G with all edges reversed. They have the same SCCs. Using adjacency lists, we can create $G^T$ in $\Theta(V+E)$ time.

## Flow Networks

A flow network is a directed graph with weighted edges: each edge has a capacity $c(u, v) \geq 0, (c(u, v) = 0 \text{ if } (u, v) \notin E)$

We talk about a source s and a sink t, where the flow goes from s to t.

We assume that there are no antiparallel edges, but this is without loss of generality; we make this assumption to simplify our descriptions of our algorithms. In fact, we can just represent antiparallel edges by adding a “temporary” node to split one of the anti-parallel edges in two.

Likewise, vertices can’t have capacities per se, but we can simulate this by replacing a vertex u by vertices uin and uout linked by an edge of capacity cu.

Possible applications include:

• Fire escape plan: exits are sinks, rooms are sources
• Railway networks

We can model the flow as a function $f : V\times V \rightarrow \mathbb{R}$. This function satisfies:

Capacity constraint: We do not use more flow than our capacity:

Flow conservation: The flow into u must be equal to the flow out of u

The value of a flow is calculated by measuring at the source; it’s the flow out of the source minus the flow into the source:

### Ford-Fulkerson Method

As long as there is a path from source to sink, with available capacity on all edges in the path, send flow along one of these paths and then we find another path and so on.

#### Residual network

The residual network network consists of edges with capacities that represent how we can change the flow on the edges. The residual capacity is:

We can now define the residual network $G_f = (V, E_f)$, where:

See the diagram below for more detail:

### Cuts in flow networks

A cut of a flow network is a partition of V into two groups of vertices, S and T.

#### Net flow across a cut

The net flow across the cut (S, T) is the flow leaving S minus the flow entering S.

The net flow across a cut is always equal to the value of the flow, which is the flow out of the source minus the flow into the source. For any cut $(S, T), \| f \| = f(S, T)$.

The proof is done simply by induction using flow conservation.

#### Capacity of a cut

The capacity of a cut (S, T) is:

The flow is at most the capacity of a cut.

#### Minimum cut

A minimum cut of a network is a cut whose capacity is minimum over all cuts of the network.

The left side of the minimum cut is found by running Ford-Fulkerson and seeing what nodes can be reached from the source in the residual graph; the right side is the rest.

#### Max-flow min-cut theorem

The max-flow is equal to the capacity of the min-cut.

The proof is important and should be learned for the exam.

Proof of the max-flow min-cut theorem

We’ll prove equivalence between the following:

1. $f$ has a maximum flow
2. $G_f$ has no augmenting path
3. $\| f \| = c(S, T)$ for a minimum cut $(S, T)$

$(1) \Rightarrow (2)$: Suppose toward contradiction that $G_f$ has an augmenting path p. However, the Ford-Fulkerson method would augment f by p to obtain a flow if increased value which contradicts that f is a maximum flow.

$(2) \Rightarrow (3)$: Let S be the set of nodes reachable from s in a residual network. Every edge flowing out of S in G must be at capacity, otherwise we can reach a node outside S in the residual network.

$(3) \Rightarrow (1)$: Recall that $\| f\| \leq c(S, T) \forall \text{cut } (S, T)$. Therefore, if the value of a flow is equal to the capacity of some cut, then it cannot be further improved.

### Time for finding max-flow (or min-cut)

• It takes $\mathcal{O}(E)$ time to find a a path in the residual network (using for example BFS).
• Each time the flow value is increased by at least 1
• So running time is $\mathcal{O}(E\cdot \| f_{\text{max}} \|)$, where $\| f_{\text{max}} \|$ is the value of a max flow.

If capacities are irrational then the Ford-Fulkerson might not terminate. However, if we take the shortest path or fattest path then this will not happen if the capacities are integers (without proof).

Augmenting path Number of iterations
BFS Shortest path $\leq\frac{1}{2}E\cdot V$
Fattest path $\leq E\cdot \log{(E\cdot U)}$

Where $U$ is the maximum flow value, and the fattest path is the path with largest minimum capacity (the bottleneck).

### Bipartite matching

Say we have N students applying for M jobs. Each gets several offers. Is there a way to match all students to jobs?

If we add a source and a sink, give all edges capacity 1, from left to right. If we run Ford-Fulkerson, we get a result.

#### Why does it work?

Every matching defines a flow of value equal to the number of edges in the matching. We put flow 1 on the edges of the matching, and to edges to and from the source and sink. All other edges have 0 flow.

Works because flow conservation is equivalent to: no student is matched more than once, no job is matched more than once.

### Edmonds-Kart algorithm

It’s just like Ford-Fulkerson, but we pick the shortest augmenting path (in the sense of the minimal number of edges, found with DFS).

The runtime is $\mathcal{O}(VE^2)$

#### Lemma

Let $\delta_f(s, u)$ be the shortest path distance from s to u in $G_f, u\in V$.

$\forall u\in V, S_f(s, u)$ are monotonically non-decreasing throughout the execution of the algorithm.

Proof of runtime

Edmonds-Kart terminates in $\mathcal{O}(V\cdot E)$ iterations. An edge (u, v) is said to be critical if its capacity is smallest on the augmenting path.

Every edge in G becomes critical $\mathcal{O}(V)$ times (a critical edge is removed, but it can reappear later).

## Data structures for disjoint sets

• Also known as “union find”
• Maintain collection $\mathcal{S} = \{ S_1, \dots, S_k \}$ of disjoint dynamic (changing over time) sets
• Each set is identified by a representative, which is some member of the set. It doesn’t matter which member is the representative, as long as if we ask for the representative twice without modifying the set, we get the same answer both times

We want to support the following operations:

• make_set(x): Makes a new set $S_i=\{x\}$ and add it to $\mathcal{S}$
• union(x, y): If $x \in S_x, y \in S_y$, then $\mathcal{S} = \mathcal{S} - S_x - S_y \cup \{ S_x \cup S_y \}$
• Representative of new set is any member in $S_x \cup S_y$, often the representative of one of $S_x$ and $S_y$
• Destroys $S_x$ and $S_y$ (since sets must be disjoint)
• find(x): Return the representative of the set containing x.

### Application: Connected components

This algorithm runs in $\mathcal{O}(V\log{(V)}+E)$ in linked-lists with weighted union heuristic, and $\mathcal{O}(V+E)$ in forests with union-by-rank and path-compression.

This is not the fastest implementation, but it certainly is the easiest. Each set is a single linked list represented by a set object that has:

• A pointer to the head of the list (assumed to be the representative)
• A pointer to the tail of the list

Each object in the list has:

• Attributes for the set member (its value for instance)
• A pointer to the set object
• A pointer to the next object

Our operations are now:

• make_set(x): Create a singleton list in time $\Theta(1)$
• find(x): Follow the pointer back to the list object, and then follow the head pointer to the representative; time $\Theta(1)$.

Union can be implemented in a couple of ways. We can either append y’s list onto the end of x’s list, and update x’s tail pointer to the end.

Otherwise, we can use weighted-union heuristic: we always append the smaller list to the larger list. As a theorem, m operations on n elements takes $\mathcal{O}(m+n\log{n})$ time.

### Implementation: Forest of trees

• One tree per set, where the root is the representative.
• Each node only point to its parent (the root points to itself).

The operations are now:

• make_set(x): make a single-node tree
• find(x): follow pointers to the root
• union(x, y): make one root a child of another

To implement the union, we can make the root of the smaller tree a child of the root of the larger tree. In a good implementation, we use the rank instead of the size to compare the trees, since measuring the size takes time, and rank is an upper bound on the height of a node anyway. Therefore, a proper implementation would make the root with the smaller rank a child of the root with the larger rank; this is union-by-rank.

find-set can be implemented with path compression, where every node it meets while making its way to the top is made a child of the root; this is path-compression.

Below is one implementation of the operations on this data structure.

## Minimum Spanning Trees

A spanning tree of a graph is a set of edges that is:

1. Acyclic
2. Spanning (connects all vertices)

We want to find the minimum spanning tree of a graph, that is, a spanning tree of minimum total weights.

• Input: an undirected graph G with weight w(u, v) for each edge $(u, v)\in E$.
• Output: an MST: a spanning tree of minimum total weights

There are 2 natural greedy algorithms for this.

### Prim’s algorithm

Prim’s algorithm is a greedy algorithm. It works by greedily growing the tree T by adding a minimum weight crossing edge with respect to the cut induced by T at each step.

See the slides for a proof.

#### Implementation

How do we find the minimum crossing edges at every iteration?

We need to check all the outgoing edges of every node, so the running time could be as bad as $\mathcal{O}(V E)$. But there’s a more clever solution!

• For every node w keep value dist(w) that measures the “distance” of w from the current tree.
• When a new node u is added to the tree, check whether neighbors of u decreases their distance to tree; if so, decrease distance
• Maintain a min-priority queue for the nodes and their distances

When we start every node has the key infinity and our root has key 0, so we pick up the root. Now we need to find the edge with the minimal weight that crosses the cut.

• Initialize Q and first for loop: $\mathcal{O}(V\log{V})$
• decrease_key is $\mathcal{O}(\log{V})$
• The while loop:
• We run V times extract_min ($\mathcal{O}(V\log{V})$)
• We run E times decrease_key ($\mathcal{O}(E\log{V})$)

The total runtime is the max of the above, so $\mathcal{O}(E\log{V})$ (which can be made $\mathcal{O}(V\log{V})$ with careful queue implementation).

### Kruskal’s algorithm

Start from an empty forest T and greedily maintain forest T which will become an MST at the end. At each step, add the cheapest edge that does not create a cycle.

#### Implementation

In each iteration, we need to check whether the cheapest edge creates a cycle.

This is the same thing as checking whether its endpoints belong to the same component. Therefore, we can use a disjoint sets (union-find) data structure.

• Initialize A: $\mathcal{O}(1)$
• First for loop: V times make_set
• Sort E: $\mathcal{O}(E\log{E})$
• Second for loop: $\mathcal{O}(E)$ times find_sets and unions

So this can run in $\mathcal{O}(E\log{V})$ (or, equivalently, $\mathcal{O}(E\log{E})$ since $E=V^2$ at most); runtime is dominated by the sorting algorithm.

### Summary

• Greedy is good (sometimes)
• Prim’s algorithm relies on priority queues for the implementation
• Kruskal’s algorithm uses union-find

## Shortest Path Problem

• Input: directed, weighted graph
• Output: a path of minimal weight (there may be multiple solutions)

The weight of a path $(v_0, v1, \dots, v_k): \sum_{i=1}^k{w(v_{i-1}, v_i)}$.

### Problem variants

• Single-source: Find shortest paths from source vertex to every vertex
• Single-destination: Find shortest paths to given destination vertex. This can be solved by reversing edge directions in single-source.
• Single pair: Find shortest path from u to v. No algorithm known that is better in worst case than solving single-source.
• All pairs: Find shortest path from u to v for all pairs u, v of vertices. This can be solved with single-pair for each vertex, but better algorithms are knowns

### Bellman-Ford algorithm

Bellman-Ford runs in $\Theta(E\cdot V)$. Unlike Dijkstra, it allows negative edge weights, as long as no negative-weight cycle (a cycle where the weights add up to a negative number) is reachable from the source. The algorithm is easy to implement in distributed settings (e.g. IP routing): each vertex repeatedly asks their neighbors for the best path.

• Input: Directed graph with weighted edges, source s, no negative cycles.
• Ouput: The shortest path from s to t

It starts by trivial initialization, in which the distance of every vertex is set to infinity, and their predecessor is non-existent:

Bellman-Ford updates shortest-path estimates iteratively by using relax:

This function reduces the distance of v if it is possible to reach it in a shorter path thanks to the (u, v) edge. It runs in $\mathcal{O}(1)$.

The entire algorithm is then:

A property of Bellman-Ford:

• After one iteration, we have found all shortest paths that are one edge long
• After two iterations, we have found all shortest paths that are two edges long
• After three iterations, we have found all shortest paths that are three edges long

Let’s assume that the longest shortest path (in terms of number of edges) is from source s to vertex t. Let’s assume that there are n vertices in total; then the shortest path from s to t can at most be n-1 edges long, which is why we terminate at n-1 iterations in the above pseudocode.

Indeed, if there are no negative cycles, Bellman-Ford returns the correct answer after n-1 iterations. But we may in reality already be done before the n-1st iteration. If the longest shortest path (in terms of number of edges) is m edges long, then the mth iteration will produce the final result. The algorithm has no way of knowing the value of m, but what it could do is run an m+1st iteration and see if any distances change; if not, then it is done.

At any iteration, if Bellman-Ford doesn’t change any vertex distances, then it is done (it won’t change anything after this point anyway).

#### Detecting negative cycles

There is no negative cycle reachable from the source if and only if no distances change when we run one more (nth) iteration of Bellman-Ford.

### Dijkstra’s algorithm

• This algorithm only works when all weights are nonnegative.
• It’s greedy, and faster than Bellman-Ford.
• It’s very similar to Prim’s algorithm; could also be described as a weighted version of BFS.

We start with a Source $S = \{ s \}$, and greedily grow S. At each step, we add to S the vertex that is closest to S (where distance is defined u.d + w(u, v)).

This creates the shortest-path tree: we can give the shortest path between the source and any vertex in the tree.

Since Dijkstra’s algorithm is greedy (it doesn’t have to consider all edges, only the ones in the immediate vicinity), it is more efficient. Using a binary heap, we can run in $\mathcal{O}(E\log{V})$ (though a more careful implementation can optimize it to $\mathcal{O}(V\log{V}+E)$).

## Probabilistic analysis and randomized algorithms

### Motivation

• Worst case does not usually happen
• Average case analysis
• Amortized analysis
• Randomization sometimes helps avoid worst-case and attacks by evil users
• Choosing the pivot in quick-sort at random
• Randomization necessary in cryptography
• Can we get randomness?
• How to extract randomness (extractors)
• Longer “random behaving” strings from small seed (pseudorandom generators)

### Probabilistic Analysis: The Hiring Problem

A basketball team wants to hire a new player. The taller the better. Each candidate that is taller than the current tallest is hired. How many players did we (temporarily) hire?

Worst-case scenario is that the candidates come in sorted order, from lowest to tallest; in this case we hire n players.

What is the expected number of hires we make over all the permutations of the candidates?

There are $n!$ permutations, each equally likely. The expectation is the sum of hires in each permutation divided by $n!$:

For 5 players, we have 120 terms, but for 10 we have 3 628 800… We need to find a better way of calculating this than pure brute force.

Given a sample space and an event A, we define the indicator random variable:

#### Lemma

For an event A, let $X_A = I\{A\}$. Then $\mathbb{E}[X_A] = Pr[A]$

Proof

#### Multiple Coin Flips

What about multiple coin flips? We want to determine the expected number of heads when we flip n coins. Let $X$ be a random variable for the number of heads in n flips. We could calculate:

… but that is cumbersome. Instead, we can use indicator variables.

For $i = 1, \dots, n$, define $X_i = I\{\text{the ith flip results in event H}\}$. Then:

By linearity of expectation (which holds even if X and Y are independent), i.e. that $\mathbb{E}[aX + bY] = a\mathbb{E}[X] + b\mathbb{E}[Y]$, we have:

For n coin flips, the above yields $\frac{n}{2}$.

#### Solving the Hiring Problem

Back to our hiring problem:

Therefore, the number of candidates we can expect to hire is:

This is the harmonic number, which is $H_n = \ln{n}+\mathcal{O}(1)$.

The probability of hiring one candidate is $\frac{1}{n}$ (this is the probability of the first being the tallest), while the probability of hiring all candidates is $\frac{1}{n!}$ (which is the probability of the exact permutation where all candidates come in decreasing sorted order).

#### Bonus trivia

Given a function RANDOM that return 1 with probability $p$ and 0 with probability $1-p$.

### Birthday Lemma

If $q > 1.78 \sqrt{\|M\|}$ then the probability that a function chosen uniformly at random $f: {1, 2, \dots, q} \rightarrow M$ is injective is at most $\frac{1}{2}$.

Note that this is a weaker statement than finding the q and m for which f is 50% likely to be injective. The real probability of a such function being injective is $e^{-q(q-1)/(2m)}$, as in the proof below (plugging in $q=23, m=365$ shows that in a group of 23, there’s a 50% chance of two people having the same birthday).

Proof

Let $m = \| M\|$. The probability that the function is injective is:

Since $e^{-x} > 1-x$ we have that this is less than:

Which itself is less than ½ if:

### Hash functions and tables

We want to design a computer system for a library, where we can:

• Insert a new book
• Delete book
• Search book
• All operations in (expected) constant time

There are multiple approaches to this; let’s start with a very naive one: a direct-address table, in which we save an array/table T with a position for each possible book (for every possible ISBN).

This is terribly inefficient; sure, the running time of each of the above operations is $\mathcal{O}(1)$, but it takes space $\mathcal{O}(U)$, where U is the size of the universe.

#### Hash tables

Instead, let’s use hash tables. Their running time is the same (constant, in average case), but their space requirement is $\mathcal{O}(K)$, the size of the key space (instead of the universe).

In hash tables an element with key k is stored in slot h(k). $h: U \rightarrow\{0, 1, \dots, m-1\}$ is called the hash function.

A good hash function is efficiently computable, should distribute keys uniformly (seemingly at random over our sample space, though the output should of course be deterministic, the same every time we run the function). This is the principle of simple uniform hashing:

The hashing function h hashes a new key equally likely to any of the m slots, independently of where any other key has hashed to

#### Collisions

When two items with keys ki and kj have h(ki) = h(kj), we have a collision. How big of a table do we need to avoid collisions with high probability? This is the same problem as the Birthday Lemma. It says that for h to be injective with good probability (meaning that the expected number of collisions is lower than 1) then we need $m > K^2$.

This means that if a library has $K = 10 000$ books then it needs an array of size $m = K^2 = 10^8$ (at least).

Even then, we can’t avoid collisions, but we can still deal with them. We can place all elements that hash to the same slot into the same doubly linked list.

See linked lists for details on how to do operations on the lists.

#### Running times

• Insert: $\mathcal{O}(1)$
• Delete: $\mathcal{O}(1)$
• Search: Expected $\mathcal{O}(\frac{n}{m})$ (if good hash function)

Insertion and deletion are $\mathcal{O}(1)$, and the space requirement is $\mathcal{O}(m+K)$.

The worst case is that all n elements are hashed to the same slot, in which case search takes $\Theta(n)$, since we’re searching through a linked list. But this is exceedingly rare with a correct m and n (we cannot avoid collisions without having $m \gg n^2$).

Let the following be a theorem for running time of search: a search takes expected time $\Theta(1+\alpha)$, where $\alpha = \frac{n}{m}$ is the expected length of the list.

See the slides for extended proof!

If we choose the size of our hash table to be proportional to the number of elements stored, we have $m = \Theta(n)$. Insertion, deletion are then in constant time.

#### Examples of Hash Functions

• Division method: $h(k) = k \text{ mod } m$, where m often selected to be a prime not too close to a power of 2
• Multiplicative method: $h(k) = \lfloor m\times\text{ fractional part of}(Ak)\rfloor$. Knut suggests chosing $A\approx \frac{\sqrt{5}-1}{2}$.

## Quick Sort

We recursively select an element at random, the pivot, and split the list into two sublist; the smaller or equal elements to the left, the larger ones to the right. When we only have single elements left, we can combine them with the pivot in the middle.

This is a simple divide-and-conquer algorithm. The divide step looks like this:

See this visualization for a better idea of how this operates.

The running time is $\Theta(n)$ for an array of length n; it’s proportional to the number of comparisons we use (the number of times that line 5 is run).

The loop invariant is:

1. All entries in A[p..i] are $\leq$ pivot
2. All entries in A[i+1 .. j-1] are $>$ pivot
3. A[r] is the pivot

The algorithm uses the above like so:

Quicksort runs in $\mathcal{O}(n \log{n})$ if the split is balanced (two equal halves), that is, if we’ve chosen a good pivot. In this case we have our favorite recurrence:

Which indeed is $\mathcal{O}(n \log{n})$.

But what if we choose a bad pivot every time? We would just have elements to the left of the pivot, and it would run in $\mathcal{O}(n^2)$. But this is very unlikely.

If we select the pivot at random, the algorithm will perform well on any input. Therefore, as a small modification to the partitioning algorithm, we should select a random number in the list instead of the last number.

Total running time is proportional to $\mathbb{E}[\text{# comparisons}]$. We can calculate this by using a random indicator variable:

Note: two numbers are only compared when one is the pivot; therefore, no nunmbers are compared to each other twice (we could also say that two numbers are compared at most once). The total number of comparisons formed by the algorithm is therefore:

The expectancy is thus:

Using linearity of expectation, this is equal to:

• If a pivot $x$ such that $% $ is chosen then $z_i$ and $z_j$ will never be compared at any later time.
• If either $z_i$ or $z_j$ is chosen before any other element of $Z_ij$ then it will compared to all other elements of $Z_ij$.
• The probability that $z_i$ is compared to $z_j$ is the probability that either $z_i$ or $z_j$ is the element first chosen
• There are $j-i+1$ elements and pivots chosen at random, independently. Thus the probability that any one of them is the first chosen one is $1 / (j-i+1)$

Therefore:

To wrap it up:

Quicksort is $\mathcal{O}(n\log{n})$.

### Why always n log n?

Let’s look at all the sorting algorithms we’ve looked at or just mentioned during the course:

• Quick Sort
• Merge sort
• Heap sort
• Bubble sort
• Insertion sort

All algorithms we have seen so far have a running time based on the number of comparisons ($a \leq b$). Let’s try to analyze the number of comparisons to give an absolute lower bound on sorting algorithms; we’ll find out that it is impossible to do better than $\mathcal{O}(n\log{n})$.

We need $\Omega(n)$ to even examine the inputs. If we look at the comparisons we need no make, we can represent them as a decision tree:

There are $n!$ leaves in the decision tree (this is the number of output permutations), and its height is $\Omega(\log{n!})=\Omega(n\log{n})$. Therefore, if we have an algorithm that takes all its information from comparisons (comparison sorting) gives us a decision tree, and cannot run in better time than $\mathcal{O}(n\log{n})$. In that sense, merge-sort, heapsort and quicksort are optimal.

### Linear time sorting

Also known as non-comparison sort.

• Input: A[1..n], where $A[j] \in \{ 0, 1, \dots, k \}$ for $j = 1, 2, \dots, n$. Array A and values n and k are given as parameters
• Output: B[1..n] sorted

The for-loops run in $\Theta(k), \Theta(n), \Theta(k), \Theta(n)$ respectively, so runtime is $\Theta(n+k)$.

How big a k is practical?

• 32-bit values? No
• 16-bit values? Probably not
• 8-bit values? Maybe depending on n (if it’s big then yes)
• 4-bit values? Probably, unless n is very small (if it’s small then no, comparison sorting is fine)

## Review of the course

### Growth of functions

1. The logs: $\log{N}, \log^2{N}, \dots$
2. The polynomials: $\sqrt{N}, 20N, N^2, \dots$
3. The exponentials: $\sqrt{4^N}=2^N, 3^N, ...$

### Sorting

• Insertion sort: Put the numers in their correct order one at a time. $\Theta(n^2)$, worst case occurs when the input is in reverse sorted order
• Merge sort: A divide and conquer algorithm. The merge works by having two stacks of cards, adding a sentinel at the bottom, and then repeatedly just taking the smallest of the two.
• Time to divide: $\Theta(1)$
• Time to combine $\Theta(n), \text{ where } n=r-p$
• Number of subproblems and their size: 2 subproblems of size $n/2$.
• Recurrence:
« Back