Lessons from Prim's Minimum Spanning Tree

28 Apr 2017

I am working on implementing Prim’s Minimum Spanning Tree algorithm. Initially, I had an implementation and the specs were passing.

On closer look, and more test cases, my implementation wasn’t as robust and I learned important lessons from this exercise.

Process > Result
Code reuse
Graph(viz) sooner
Testing all nodes
Generating test cases
Working with real input sooner

What’s a Minimum Spanning Tree?

A spanning tree is a tree that reaches every point in a system. It’s describes the connection to everything in a system from a point.

A minimum spanning tree is a spanning tree with the lowest cost.

Minimum Spanning Trees are useful in situations where there is a ‘connection’. If something is ‘connected’ to something else, consider minimum spanning tree.

Prim’s Minimum Spanning Tree

One thing that is special about Prim’s Algorithm compared to other minimum spanning tree algorithms is that Prim’s Algorithm is greedy.

Prim’s algorithm build out the a spanning tree, adding one node a time with the smallest edge weight.

Start from any node
Find smallest weight edge at node
Combine edge & node
Repeat

This is an animation of Prim’s Algorithm working:

prim mst animation

Process > Result

One of the most important thing I keep learning with graph algorithms: the process is more important than the result.

The final output of Prim’s algorithm is a value: the sum of weights that connect all the nodes.

To get to this single value, the algorithm must touch every node and edge of the graph, evaluate the weight, and move on to the next node that isn’t attached to the new collapsed graph.

Although the final output of Prim’s Algorithm is a value, most of the work done is collapsing the graph.

Reusing Karger’s Minimum Cut

Starting off, since Prim’s Algoithm follows a similar idea of cutting a graph, I looked at my previous implementation of Karger’s Minimum Cut algorithm.

The main function re-used was collapse graph, which creates a new larger node by collapsing together the added nodes.

One big difference I encountered was the data structure used for my implementation of Karger’s Minimum Cut I had did not fit weights in well, which would represent the above graph as:

{
  :a => [:b, :d],
  :b => [:c, :d, :e],
  :c => [:e],
  :d => [:e]
}

I tried to shoehorn the weights in the same structure by making each node itself a hash in the array:

{
  :a => [{ :b => 2 }, { :d => 6 }],
  :b => [{ :c => 3 }, { :d => 8 }, { :e => 5 }],
  :c => [{ :e => 7 }],
  :d => [{ :e => 9 }]
}

This became a little too complex for each graph collapse (and also finding the weights of connected edges!)

At the end, I took a simpler approach to representing connections and weights:

{
  :a => { :b => 2, :d => 6 },
  :b => { :c => 3, :d => 8, :e => 5 },
  :c => { :e => 7 },
  :d => { :e => 9 }
}

As I implemented Prim’s Algorithm, I am constantly reminded of Rule 5 of Rob Pike’s 5 Rules of Programming

Rule 5. Data dominates. If you've chosen the right data structures
and organized things well, the algorithms will almost always be
self-evident. Data structures, not algorithms, are central to
programming.

Prim’s algorithm is simple, but the data structure requirements are far more complex than just a hash. At this point, I am considering building a better data structure to store the graph.

Graph(viz) sooner!

To aid in developing the algorithm, I hooked up Graphviz to visualize how my implementation progressed.

It’s pretty rough debugging a graph algorithm in a hash!

Honestly, I hooked up Graphviz after I had Prim’s Algorithm working properly, but looking back, I really wished I had connected Graphviz in sooner.

Being able to visualize what my code was doing to the graph would have helped debug things sooner.

Test from ALL nodes

In the course of development, I only tested my implementation for the happy path of the test case, in this situation, this was first node.

As I progressed and used larger graphs, I continued the same pattern of testing from the first node only. Most other algorithms usually have one answer for each input.

At the end, I realized ALL nodes need to produce the same final result. In this case, a graph of n-nodes will have one answer, but it can tested n-times.

When revisiting previous test graphs, when I tested against any node, there were failures from some nodes but not others. This made debugging much harder on larger graphs than if I debugged from any node on smaller graphs earlier.

Find lots of test cases

When making my specs, I generate my own test cases to cover all of the possible scenarios. Working through my own test cases is effective, I want test cases that I did not generate.

I use the Coursera forums for a source of verified test cases to help develop my implementation. It’s the next best thing to the course material!

In the forum, I saw a post that linked an automatic test case generator, even for graphs! Combined with Graphviz, this was the source of endless test graphs…

Coursera recently changed their policy on courses and their forums. I lost access to the generator and had to search for it… and it took a LOT of searching, but I eventually found it again:

SPOJ Toolkit

Once I found the test generator again, I was able to use this to generate a lot of test cases for me. Combined with Graphviz, I was able to easily determine how my implementation was working.

Real input sooner

In developing my implementation, I fell into the trap of using perfect input to my functions, sanitized graphs.

When I started to incorporate input that was from the test case generator, which could just be large lists, I ran into bugs. It was a bit trickier to see what was going on when the graph wasn’t setup properly internally.

Reworking my algorithm to treat the adjacency list as if it was an undirected graph. (I did try to convert a directed to undirected graph on the fly, but having an internal representation that did not reflect the input caused more trouble than it was worth.)

Conclusion

I learned these things from Prim’s Algorithm: working with real input, sourcing test cases, testing algorithm rigorously sooner, Graphviz as a tool, and the process is greater than the result

Now with these lessons in my mind, I am looking forward to the next graph algorithm to implement!