A prerequisite for this course is CS-250 Algorithms.
When does the greedy algorithm work?
- Maximum weight spanning trees
- Matroid intersection
- Linear Programming
- Approximation algorithms
- Multiplicative Weights Algorithm
- Simplex method
- Randomized Algorithms
- Polynomial identity testing
- Sampling and Concentration Inequalities
- Streaming algorithms
- Locality sensitive hashing
- Submodular function minimization
- Submodular function maximization
- Online algorithms
- Spectral Graph Theory
When does the greedy algorithm work?
Maximum weight spanning trees
We’ll start with a problem called maximum weight spanning trees.
- Input: A graph with edge weights
- Output: A spanning tree of maximum weight
A spanning tree is a subgraph connecting all vertices of in the minimum possible number of edges. Being a tree, it is acyclic by definition.
1 2 3 4 5 6 7 8 9 10 11 12 def kruskal_greedy(G, w): """ Input: a connected undirected graph G=(V, E) edge weights w Output: a maximum weight spanning tree S """ S = set() sort edges in decreasing weight order for i = 1 to |E|: if S + E[i] is acyclic: S += E[i] return S
The running time of this algorithm is:
- Sorting: where .
- For-loop: to determine whether a graph is acyclic, we can use union-find, which is (where is almost a constant)
Thus, the total running time is .
Useful facts for the correctness proof
In the correctness proof below, we’ll need a few useful facts, which we won’t take the time to prove. We define .
- In a connected graph, the spanning tree has edges.
- Suppose we have a graph , where is an acyclic edge set of edges. Then the number of connected components is . Intuitively, we get to this result by observing the following:
- If , then there are no edges, and every node is a connected component, so there are connected components
- If , then there’s only one edge, then we have components (the added edge connected two components)
- If , then there are connected components
We will prove the following lemma by contradiction.
kruskal_greedy returns a maximum weight spanning tree.
We’ll suppose for the sake of contradiction that
kruskal_greedy doesn’t return a maximum weight spanning tree. This means that we suppose it picks edges 2, but that there exists a tree of higher weight.
Note that the edges are indexed in decreasing order. is of higher weight than , so there exists a first index such that . We can use this index to define two sets:
is the set of the first edges of . Because
kruskal_greedypicks edges in order of decreasing weight and because , all these weights are .
is the set of the first edges of . Because
kruskal_greedypicked these before , all these weights are .
We can notice that , which is useful because of the following key property:
Key property: As , such that is acyclic.3
Proof of the key property
This follows from the fact that an acyclic graph with edges has connected components, as we previously stated. Thus, for any acyclic edge sets with , the graph has fewer components than , so there must be an edge connecting two components in ; it follows that and is acyclic.
More succinctly, such that is acyclic.
This edge is , so . Since it has higher weight, must have been considered by the algorithm before .
When was considered, the algorithm checked whether could be added to the current set at the time, which we’ll call . In the above, we stated that is acyclic. Since and since a subset of acyclic edges is also acyclic4, is also acyclic.
Since adding the edge to doesn’t create a cycle, according to the algorithm, it must have been selected to be a part of . At this point, we have , but also (because of our initial assumption that ), which is a contradiction.
In the correctness proof above, we used two properties of acyclic graphs:
- A subset of acyclic edges is acyclic
- For two acyclic edge sets and such that , there is such that is acyclic
The generalization of these two properties will lead us to matroids.
A matroid is defined on a finite ground set of elements, and a family of subsets of , satisfying two properties:
- : If and then
- : If and and then
The property is called downward-closedness, or the hereditary property; it tells us that by losing elements of a feasible solution, we still retain a feasible solution.
The sets of are called independent sets: if , we say that is independent.
The property is the augmentation property. It implies that every maximal5 independent set is of maximum cardinality6. This means that all maximal independent sets have the same cardinality. A set of maximum cardinality is called a base of the matroid.
Since can be exponential in the size of the ground set , we assume that we have a membership oracle which we can use to efficiently check for a set .
1 2 3 4 5 6 7 8 9 10 11 12 def greedy(M, w): """ Input: a matroid M = (E, I), |E| weights w Output: a maximum weight base S """ S = set() sort elements of E in decreasing weight order for i = 1 to |E|: if S + E[i] in I: S += E[i] return S
We’d like to prove the following theorem:
For any ground set and a family of subsets ,
greedy finds a maximum weight base for any set of weights if and only if is a matroid.
The if direction () follows from the correctness proof we did for Kruskal’s algorithm.
For the only if direction (), we must prove the following claim, which we give in the contrapositive form7.
Suppose is not a matroid. Then there exists an assignment of weights such that
greedy does not return a maximum weight base.
To prove this, we’re going to cook up some weights for which
greedy doesn’t return a maximum weight base when isn’t a matroid (i.e. the tuple either violates or ).
First, consider the case where isn’t a matroid because it violates . Therefore, there exists two sets such that but . Let’s pick the following weights:
This weight assignment has been chosen so that the algorithm considers elements in the following order: first , then , then everything else. Since , the algorithm only selects (it’s a strict subset as ). Best case, the strict subset has size , so the solution’s weight is at most:
Second, consider the case where is violated (but isn’t). Let be two independent sets, such that , and , .
Let’s use the following weights, which again have been carefully chosen to have the algorithm consider elements in , then , then everything else. The value of the weights is also carefully chosen to get a nice closed form later.
Because of (and our assumption that ),
greedy will first select all elements in . But at this point, because we assumed to be violated, we have , . That means that we cannot pick any elements in to add to the solution, so the solution would have weight:
However, since and , we expect the optimal solution to have value .
In our initial example about maximum spanning trees, we were actually considering a matroid in which is the set of edges, and is defined as:
The k-Uniform matroid is a matroid in which is given by:
is satisfied because dropping elements still satisfies . is satisfied because and implies that .
In a partition matroid, the ground set is partitioned into disjoint subsets (we can think of them as representing different colors, for example). Each such subset has an integer associated to it, stating how many elements can be picked from each subset at most.
The linear matroid is defined for a matrix . The ground set is the index set of the columns of ; a subset gives us a matrix containing the columns indexed by the set . The matroid is then given by:
Matroids can be constructed from other matroids. For instance, the truncated matroid is constructed from the matroid , such that:
Laminar matroids are defined by a family of subsets of the ground set that satisfy the following. If , then either:
We can think of these subsets as being part of a tree, where each node is the union of its children. For each set , we define an integer . The matroid is then defined by:
Gammoid (“Netflix matroid”)
A gammoid describes a set of vertices that can be reached by vertex-disjoint paths in a directed graph. An example of this is Netflix, which may want to know how many people they can reach from their server nodes if each link has a certain max capacity.
Matroids form a rich set of problems that can be solved by the
greedy algorithm, but there are also many problems with efficient algorithms that aren’t matroids. This is the case for problems that aren’t matroids themselves, but can be defined as the intersection of two matroids.
The intersection of two matroids and is:
The intersection of two matroids satisfies , but generally not .
The following theorem adds a lot of power to the concept of matroids.
There is an efficient algorithm for finding a max-weight independent set in the intersection of two matroids.
Here, efficient means polynomial time, if we assume a polynomial time membership oracle. This is the case for all the matroid examples seen in class.
Definition of bipartite matching
For instance, we can consider the example of bipartite matching.
- Input: A bipartite graph , where and are two disjoint vertex sets
- Output: Matching of maximum weight (or maximum cardinality8).
A matching means that each vertex is incident to at most one edge, i.e. that:
For this problem, we can define with being the edge set, and defined as:
This satisfies downward closedness but not .
Bipartite matching as a matroid intersection
The bipartite matching problem can be defined as the intersection of two matroids:
We’ll use two partition matroids and imposing the same restrictions, but on and , respectively. The ground set for both matroids is . The partitioning of the ground set in is defined as follows, and similarly for :
Here, denotes the edges incident to a vertex . Note that since is bipartite, none of the vertices in are connected; this means that the above indeed creates a partitioning in disjoint sets. We let for every in , so we can define the family of independent sets for the first matroid is:
In other words, a set of edges is independent for (i.e. ) if it has at most one edge incident to every vertex of (no restrictions on how many edges can be incident to vertices on ). Defining similarly, we see why the matroid intersection corresponds to a matching in :
Colorful spanning trees
- Input: A graph , where every edge has one of colors, which we denote by , with
- Output: Whether the graph has a spanning tree in which all edges have a different color
We want to find a spanning tree, so we’ll use a graphic matroid for .
The color assignment gives us the partition of into disjoint sets , where each represents all the edges that have the same color . This gives us a hint for the second matroid: we need a partition matroid. We let be defined by:
- Input: Directed graph and a special root vertex that has no incoming edges
- Output: An arborescence of
An arborescence of a directed graph is a spanning tree (ignoring edge directions), directed away from . In an arborescence , every vertex in is reachable from .
More formally, is an arborescence if:
- is a spanning tree (ignoring edge direction)
- has in-degree 0 and all other nodes have in-degree 1 in the edge set (considering edge direction)
These two criteria give us a good hint of which matroids to choose. We let be a graphic matroid for (disregarding the edge direction), and be a partition matroid in which:
Here, denotes the set of arcs incoming to . In this partition matroid, the independent edge sets are those that have at most one arc incoming to every vertex .
Any arborescence is independent in both matroids, by definition. Conversely, a set independent in both matroids of cardinality (so that it is a base in both) is an arborescence: being a spanning tree, it has a unique path between and any vertex .
Maximum cardinality bipartite matching
Recall that a path is a collection of edges where all the ’s are distinct vertices. We can represent a path as .
A maximal path is a path to which we cannot add any edges to make it longer.
A maximum length path is the longest path. Note that it is also a maximal path, but that this is a stronger assertion.
An alternating path with respect to an edge set is a path that alternates between edges in and edges in .
An augmenting path with respect to an edge set is an alternating path in which the first and last vertices are unmatched, meaning that they are not incident to an edge in .
In a bipartite graph, the matching may define alternating paths in which we cannot revisit a node (by definition of a matching). Also note that an augmenting path is one that increases the matching; this is the core idea behind the following algorithm.
1 2 3 4 5 6 7 8 9 def augmenting_path_algorithm(G): """ Input: Bipartite graph G = (V, E) Output: Matching M of maximum cardinality """ M = set() while exists an augmenting path P wrt. M: M = M △ P return M
is the symmetric difference, which we can also denote:
An efficient algorithm to find an augmenting path is make edges in directed from to , and edges in directed from to . We can then run BFS from unmatched vertices in , looking for unmatched vertices in . This can be run in time for bipartite graphs (it is harder in general graphs). Seeing that there can be up to matchings in the graph and that each augmenting path increases the matching size by one, we have to run loops. Therefore, the total runtime of the algorithm is .
Here’s how the algorithm works. Suppose we already have a graph with a matching (in red):
The path corresponds to all the edges in the graph. The symmetric difference corresponds to a new matching of cardinality :
We now prove the correctness of this algorithm, which is to say that it indeed finds a maximum matching. The algorithm returns a set with respect to which there are no augmenting paths, so to prove correctness we must prove the following:
A matching is maximal if and only if there are no augmenting paths with respect to .
The proof is by contradiction.
First, let’s prove the direction. Suppose towards contradiction that is maximal, but that there exists an augmenting path with respect to . Then is a matching of greater cardinality than , which contradicts the optimality of .
Then, let’s prove the direction. We must prove that the lack of augmenting paths implies that is maximal. We’ll prove the contrapositive, so suppose is not maximal, i.e. that there is a maximal matching such that . Let ; intuitively, this edge set represents the edges that and disagree on.
From there on, we reason as follows:
- has more edges from than from (since , which implies that )
- In , every vertex has degree , with at most one edge from , and at most one edge from . Thus, every component in is either:
- a path (where middle nodes have degree two and the ends of the path have degree one), or
- a cycle (where all nodes have degree two)
- The cycles and paths that compose alternate between edges from and (we cannot have vertices incident to two edges of the same set, as and are matchings). This leads us to the following observations:
- In cycles, there is the same number of edges from and
- In paths, there number of edges from is than the number of edges from
- Let’s remove cycles from consideration, and concentrate on paths. We can do so and still retain the property of having more edges from than from . Since , there must be at least one path with strictly more edges from than from ; it must start and end with a edge, and alternate between the sets in between. This path is an augmenting path with respect to .
Therefore, there must exist an augmenting path with respect to . This proves the contrapositive, so the direction stating that no augmenting paths is maximal is proven.
In the above, we’ve seen an algorithm for unweighted bipartite matching. If we’d like to generalize it to weighted bipartite matching8, we’ll have to introduce a powerful algorithmic tool: linear programming.
Definition of Linear Programming
A linear program (LP) is the problem of finding values for variables that minimize (or equivalently, maximize) a given linear objective function, subject to linear constraints:
Strictly speaking, it isn’t necessary to allow equality and upper-bound constraints:
- We can get an equality constraint with a lower-bound and an upper-bound.
- We can transform an upper-bound into a lower-bound by multiplying all factors and the bound by .
Still, we allow them for simpler notation for now. Strict inequality, however, is disallowed (otherwise, we wouldn’t have a closed set, and there would never be an optimal solution, as we could always get closer to the bound by an infinitesimal amount).
A feasible solution is an extreme point if it cannot be written as a convex combination of other feasible solutions.
A convex combination of points is a point of the form where the satisfy .
To gain some intuition about what an extreme point is, we can take a look at the diagram below:
In the above, is not an extreme point because . If we can walk an amount in opposing directions, then it is not an extreme point.
Extreme points are important because they have useful structural properties that we can exploit to design algorithms and construct proofs. We can state the following theorem about extreme points:
If the feasible region is bounded, then there always exists an optimum which is an extreme point.
The proof is as follows. As the feasible region is bounded, there is an optimal solution . If happens to be an extreme point, we are done. The real work in this proof is for the case where isn’t an extreme point. To prove this, we’ll have to introduce a small lemma:
Any feasible point can be written as a convex combination of the extreme points.
This is essentially proven by the following diagram:
Indeed, if we draw a line from a feasible point in a bounded domain, we’ll hit the bounds in two locations and , which are convex combination of the extreme points , and , , respectively. is itself a convex combination of and , and thus of the extreme points , , and .
With this lemma in place, we can write the feasible solution as a convex combination of extreme points:
Where the are extreme points, and satisfy .
Let be the vector defining the objective that we wish to maximize (the proof is the same if we minimize). Then we have:
This proves . In other words, the value of the objective function is a weighted average of the extreme points.
We conclude the proof with the “tallest person in the class” argument. If we measured the height of all the people in a class, and got the average value of 170cm, we could say that at least one person has height 170cm. For the same reason, we can conclude from the above:
This extreme point is gives us a higher value for the objective function than . Since was chosen to be any feasible point, this means that is an optimal solution.
Maximum weight bipartite perfect matching
The problem corresponds to the following:
- Input: A bipartite graph where and , and edge weights
- Output: A perfect matching maximizing
A matching is perfect if every vertex is incident to exactly one edge in the matching (i.e. every vertex is matched). We need for this to work.
This problem can be formulated as the following linear program:
The constraints say that every vertex is chosen exactly once. The intended meaning for variables is:
But in linear programming, we cannot take on discrete values; instead, we can only have , which gives us the constraint . The upper bound of 1 is implied by the other constraints, so for the sake of conciseness, we only give the minimal set of constraints above.
Still, we will see that relaxing to be in still gives us an optimal solution for which .
For bipartite graphs, any extreme point solution to the LP is integral.
Integral means that it is an integer; coupled with the constraint that , being integral implies .
We’ll prove this by contradiction. Let be an extreme point for the graph , and let be the set of edges for which optimal extreme point solution is not integral. We suppose toward contradiction that the solution contains such edges, i.e. that .
We have the constraint that for any given vertex or , . This means that either:
- All the incident edges to the vertex have integral weights , and the vertex therefore has no incident edges in .
- At least one incident edge to the vertex has non-integral weight . But because of the above constraint, it takes at least two non-integral weights and to sum up to 1, so having one incident edge in implies having at least two incident edges in .
This implies that must contain a cycle. By construction, because we only have a finite number of vertices in and because they cannot have degree 1, the path must close back to the original vertex.
Further, cycles in a bipartite graph must have even length. This follows from the fact that to get back to a vertex in , we must go to and back to times, for a total of edges.
All these edges in the cycle are fractional (being in ). Since this is a proof by contradiction, we’ll try to find two feasible points and such that , which would be a contradiction of the assumption that is an extreme point.
Let be the edges of the cycle. Let and be defined for each edge as follows:
The degree constraints of and are satisfied, because they alternate between adding and subtracting in a cycle of even length, and we assumed that satisfies them.
To ensure feasibility, we must choose a small to guarantee that all . We can choose the following value to do so:
Note that this indeed satisfies , so we have a contradiction of the assumption that is an extreme point.
Bipartite perfect matching polytope
The polytope9 corresponding to the bipartite perfect matching LP constraints is called the bipartite perfect matching polytope.
We can solve the maximum weight bipartite matching problem by solving the above linear program.
Unfortunately, this is only for bipartite graphs; for general graphs, we have a problem with odd cycles. We could solve this by imposing an addition constraint on odd cycles:
Unfortunately, this adds exponentially many constraints. A proof 2 or 3 years ago established that there is no way around this.
Vertex cover for bipartite graphs
The vertex cover problem is defined as follows:
- Input: A graph with node weights
- Output: A vertex cover that minimizes
A vertex cover is a vertex set that ensures that every edge has at least one endpoint in . In other words, is a vertex cover if , we have or .
Just like maximum weight bipartite perfect matching, the vertex cover problem can be formulated as LP constraints:
The constraints tell us that at least one of the endpoints should be in the vertex cover, as the intended meaning for is:
For bipartite graphs, any extreme point to the vertex cover LP is integral.
We’ll prove this by contradiction. Let be an extreme point, and be the vertices with fractional values in . Suppose toward contradiction that .
Since is bipartite, the vertices are split into disjoint subsets and . In the same way, we can divide into disjoint subsets and , which are the fractional vertices in and , respectively.
As previously, we’ll define and such that , by setting , and letting:
Let’s verify that these are feasible solutions. We’ll just verify , but the proof for follows the same reasoning.
By the selection of , we have , , which satisfies the second constraint.
To verify the first constraint, we must verify that for every edge .
- If (or ), then (or ), so (or ), so the constraint holds.
- If , then and , which also respects the constraint, because:
The last inequality holds because is a feasible solution and thus satisfies the first LP constraint.
These and are feasible solutions, and therefore show a contradiction in our initial assumption that is an extreme point; the claim is therefore verified.
For general graphs, we have the same problem with odd cycles as for matchings. But the situation is actually even worse in this case, as the problem is NP-hard for general graphs, and we do not expect to have efficient algorithms for it. Still, we’ll see an approximation algorithm later on.
In the above, we have seen two integrality proofs. These both have the same general approach: construct and adding or subtracting from such that , and , and such that and are feasible solutions. To prove feasibility, we much choose a construction where all the cancel out. If that isn’t possible, then we must the variables for which the don’t cancel out are “on their own” (i.e. not with any other variables) in the constraint.
Consider the following linear program:
We’re looking for an optimal solution OPT. To find this value, we can ask about the upper bound and about the lower bound.
- Is there a solution of cost ? Is OPT ?
- Is there no better solution? Is OPT ?
- Is there a matching of value at least ? In other words, is ?
- Do all matchings have value at most ? In other words, is ?
Answering the first kind of question is easy enough: we can just find a feasible solution to the LP with an objective function that fits the bill (i.e. is ).
To answer the second kind of question though, we have to be a little smarter. The intuition is that a linear combination of primal constraints cannot exceed the primal objective function. This leads us to define the following dual LP with variables and :
Let’s formalize this approach for the general case.
Consider the following linear program with variables, for and constraints:
The matrix of factors is an matrix (with ranging over columns, and over rows). This means that the factors in the primal are in the same layout as in ; we’ll see that they’re transposed in the dual.
Indeed, the dual program has variables for and constraints:
Note that the dual of the dual is the primal, so we can convert problems both ways.
Weak duality tells us that every dual-feasible solution is a lower bound of primal-feasible solutions.
If is a feasible solution to the primal problem and is feasible to the dual, then:
We can prove this by rewriting the right-hand side:
The first inequality uses the constraints of the primal, and the second uses the constraints of the dual. We also use the fact that throughout.
This leads us to the conclusion that the optimal solution of the primal is lower bounded by the optimal solution to the dual. In fact, we can make a stronger assertion than that, which is what we do in strong duality.
Strong duality tells us that the dual solutions aren’t just a lower bound, but that the optimal solutions to the dual and the primal are equal.
If is an optimal primal solution and is an optimal dual solution, then:
Furthermore, if the primal problem is unbounded, then the dual problem is infeasible, and vice versa: if the dual is unbounded, the primal is infeasible.
Example: Maximum Cardinality Matching and Vertex Cover on bipartite graphs
These two problems are the dual of each other. Remember that max cardinality matching is the following problem:
- Input: a bipartite graph
- Output: a matching
We define a variable for each edge , with the intended meaning that represents the number of incident edges to in the matching . This leads us to defining the following LP:
The dual program is:
This dual problem corresponds to the relaxation of vertex cover, which returns a vertex set . By weak duality, we have , for any matching and vertex cover . Since both the primal and the dual are integral for bipartite graphs, we have strong duality, which implies Kőnig’s theorem.
Kőnig’s theorem describes an equivalence between the maximum matching and the vertex cover problem in bipartite graphs. Another Hungarian mathematician, Jenő Egerváry, discovered it independently the same year11.
Let be a maximum cardinality matching and be a minimum vertex cover of a bipartite graph. Then:
As a consequence of strong duality, we have a strong relationship between primal and dual optimal solutions:
Let be a feasible solution to the primal, and let be a feasible solution to the dual. Then:
Note that we could equivalently write instead of because we assume to have the constraint that (similarly for ).
The “space” between the value of a constraint and its bound is what we call “slackness”. In this case, since the right-hand side of the iff says that the variables being positive implies that the constraints are met exactly (and aren’t just bounds), we have no slackness.
We can prove this complementarity slackness by applying strong duality to weak duality.
First, let’s prove the direction. Let be the optimal primal solution. From the proof of weak duality, we have:
Since we’re assuming that are optimal solutions, we also have strong duality, which tells us that:
With this in mind, all the inequalities above become equalities:
From this, we can quite trivially arrive to the following conclusion:
This means that as long as , we have:
Now, let’s prove the direction. We know that:
Therefore, we can do just as in the proof of weak duality:
This is equivalent to being optimal solutions, by weak duality.
Duality of Min-Cost Perfect Bipartite Matching
- Input: , a bipartite weighted graph with edge weights
- Output: Perfect matching of maximum cost
We assume that is a complete bipartite graph, meaning that all possible edges exist. This is equivalent to not having a complete graph, as we can just consider missing edges in an incomplete graph as having infinite weight in a complete graph.
In the LP for this problem, we let be a variable for every edge , taking value 1 if is picked in the matching, and 0 if not. The problem is then:
As we saw previously, any extreme point of this problem is integral, so we can solve the min-cost perfect matching by solving the above LP. But that is not the most efficient approach. Using duality, we can reduce the problem to that of finding a perfect matching in an unweighted graph, which we saw how to solve using augmenting paths.
To obtain the dual, we’ll first write the above LP in a form using inequalities, which gives us the following primal:
This notation is a little tedious, so we’ll introduce a variable for each constraint. For each , we introduce for the first constraint and for the second; similarly for each , we introduce and for the third and fourth constraint respectively. These will play the same role as the dual variables, in that every constraint in the primal has a variable in the dual. The dual is thus:
By complementarity slackness, and are feasible if and only if the following holds:
We also get a series of less interesting implications, which are trivially true as their right-hand side is always true (because the original constraints already specified equality):
For the following discussion, we’ll define:
Note that these are not constrained to be . If we need to go back from notation to notation, we can define and when , and and if not (equivalently for ).
We can simplify the notation of our dual LP to the equivalent notation:
With this simplified notation, complementarity slackness gives us that and are feasible if and only if the following holds:
In other words, we can summarize this as the following lemma:
A perfect matching is of minimum cost if and only if there is a feasible dual solution such that:
In other words, if we can find a vertex weight assignment such that every edge has the same weight as the sum of its vertex weights, we’ve found a min-cost perfect matching. This is an interesting insight that will lead us to the Hungarian algorithm.
The Hungarian algorithm finds a min-cost perfect matching. It works with the dual problem to try to construct a feasible primal solution.
Consider the following bipartite graph:
The thin black edges have cost 1, and the thick red edge has cost 2. The Hungarian algorithm uses the lemma from above to always keep a dual solution