stb_connected_components.h (stbcc) finds the connected components on a large 2D grid, and tracks those connected components as the grid is updated. This is trickier than you might think.

stbcc has many possible uses, but it was inspired in particular by a comment by the programmer of Dwarf Fortress that they kept a separate data structure that could efficiently answer the question of whether any given point Q on the map was reachable from some other point P, and they used this to prevent degenerate slowness in pathfinding from P to Q (which, if Q is unreachable, typically will explore the entirety of the map that is reachable from P).

The traditional algorithm for finding connected components in a graph is to just use depth-first-search; the equivalent on grids is a "flood-fill" algorithm. These techniques work fine, and are O(N) with N nodes (e.g. in a 1024x1024 grid, they take ~1M units of time). However, they offer no good approach for updating.

The classic update-friendly approach for this kind of problem is to use Disjoint Set Forests. The algorithm will work on arbitrary graphs, not just grids. It allows you to build groups of 'connected' objects (sets) efficiently; the initial build is essentially O(N) time (see the Wikipedia page for more details about its speed). This technique also allows you to dynamically add connections to the existing data structure in essentially O(1) (amortized) time. The implementation of disjoint set forests is straightforward, using only a dozen or so lines of code, as in this reference solution I coded at the beginning of the stbcc stream (see previous post):

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | point leader[1024][1024]; point dsf_find(int x, int y) { point p,q; p = leader[y][x]; if (p.x == x && p.y == y) return p; q = dsf_find(p.x, p.y); leader[y][x] = q; // path compression return q; } void dsf_union(int x1, int y1, int x2, int y2) { point p = dsf_find(x1,y1); point q = dsf_find(x2,y2); if (p.x == q.x && p.y == q.y) return; leader[p.y][p.x] = q; } |

Unfortunately, there is no simple update-friendly data structure to allow

Think of a connected-components data structure as storing a label for every node on the graph, indicating which component that node belongs to. (To check if point P can reach point Q, we check if they have the same label.)

If you imagine a map split into two large components, when you connect them, you naively might have to update half of the nodes on the map, which will obviously not be O(1) time. However, if the nodes don't directly store a label, but store a pointer to a label, then we might be able to set up a situation where all the nodes in one component point to one label, and all the nodes in the other component point to another label, and when we join the two components, we only have to update one

But when we disconnect a large component on the map into two components, we might make that disconnection in many places; there's no way to guarantee that the existing pointers to labels point to two different locations in a way that exactly matches the disconnection that we have. Thus there is no straightforward solution to an update that creates a disconnection.

AFAIK most people simply rebuild the entire connected component data structure (i.e. periodically recompute the flood fill).

So the idea of stb_connected_components.h is to compromise; we introduce a moderate number of intermediate, indirected labels, so that we can rewrite as few labels as possible and minimize the number of nodes which we need to point to different labels entirely. We do this by relying on the graph being a grid and using the geometry of the grid to guide us.

There's a classic trick for making certain kinds of O(N^1.5) algorithms when you have an O(N^2) algorithm that it relies on: partition the input data into sqrt(N) pieces each of size sqrt(N), run the O(N^2) algorihm on the pieces; each one will take O(sqrt(N)^2), or O(N) time, so running on all the pieces will take sqrt(N)*O(N) time, i.e. O(N^1.5) time.

We do something similar for stbcc. We're going to chunk up our M*M grid into smaller pieces. Suppose that size is K. Then we have K*K pieces, each one (M/K)*(M/K) nodes. Suppose we run an O(N) algorithm within one (M/K)^2 piece and another O(N) algorithm on the K^2 coarse pieces; then we have O(K*K) + O((M/K)*(M/K)), or O(K*K+M*M/(K*K). If K is 1, this becomes O(M^2), if K is M, this becomes O(M^2), but if K is sqrt(M), this becomes O(M). It's minimized when K is sqrt(M).

Why a two-level hierarchy? Decades of experience has shown that you almost never want quadtree-esque fully-recursive hierarchies for speeding things up; you almost always want two-level hierarchies. (Maybe, maybe, if it's really big, you want three-level hierarchies. If you wanted to do a 65536x65536 map, maybe that would be better.)

So, as described in the previous section, stbcc splits an M*M grid into subgrids that are sqrt(M)*sqrt(M). For each subgrid, we cache the locally-connected components; each within-subgrid connected component has a single slot storing a label and all the nodes in that component point to the same slot; thus when connectivity changes we only have to update that one slot to update all of the nodes.

There are sqrt(M)*sqrt(M) total subgrids; which means we have (on typical maps) a small multiple of that many locally-connected components; in other words we usually have O(M) pieces--but that's from an M*M map, so if N=M*M, we're talking about O(N^0.5) components. If we also keep track of how locally-connected components connect

On a single update, we do three things:

- Rebuild the locally-connected components for the subgrid containing the updated node
- Rebuild the adjacency information from that subgrid to its neighbors, and vice versa
- Compute from scratch the globally-connected components from the locally-connected pieces

For an M*M map, step 1 takes essentially O(M), step 2 takes O(sqrt(M)) time [pretending array deletion is O(1)], and step 3 takes O(M) time for game-like maps but has a much worse case discussed below. (All these 'essentiallies' are based on pretending that DSF takes O(N) time for N operations, ignoring that (a) the actual cost is technically higher, and (b) the nearly-linear behavior is only if you implement both path compression and union by rank, and we only implement the former to reduce memory/cache pressure.)

Adjacency lists can be maintained as doubly-linked lists so that deletion time for them is actually O(1), but this requires a lot of extra memory, and in the common case where the arrays are small, searching the list to delete an item isn't a big deal.

The actual implementation is:

- Run a distjoint-set-forest on a subgrid
- Delete old adjacency information for the old subgrid (and use that to delete adjancey info pointing to this subgrid from its neighbors); then check every node on the subgrid edges to see if it connects across the edge to a neighbor component, and if so add a connection to the adjacency lists for both
- Run a disjoint-set-forest over all the locally-collected-components in the entire map using the adjacency info to define the connections (this is the slowest step, and has a bad worst case)

Just writing down the above suggested a few possible improvements:

The above describes the initial 0.90 release of stb_connected_components. The first update reduced memory usage.

stbcc allocates the worst-case memory usage needed so it never does any dynamic allocations. Because adjacency lists are variable sized, the initial version allocated a max-sized adjacency list for every locally-connected-component. This required foolish amounts of memory (~80MB for a 1K*1K map).

The max-sized adjacency list occurs when every connection out of a subgrid come from the same locally-connected component. If this happens, no other locally-connected component in that subgrid can have

The problem is that with a fixed-allocation scheme, we need to allocate variably-sized lists, and they might need to be resized. This could require writing a little "memory allocator" out of each subgrid's adjacency "pool", and then after multiple updates, these could become fragmented and need defragmentation. While this is possible, we use a much simpler strategy: when we create the initial lists for a subgrid, we divide up the list, and apportion any leftover space equally between all the components. If we try to add to a component's adjacency list and there is no room, we discard all the information for that subgrid and rebuild. (We're already always rebuilding 1 subgrid on every update; now we update at most 5, and this is rare in practice.)

This reduces worst-case memory to ~8MB for a 1K*1K map (equivalent to about 8 bytes per node, which I consider tolerable).

Although I don't believe the cases considered in this section are interesting for game maps, stb_connected_components doesn't claim to be only for game maps, so when I tested and found some degeneracies I decided it was worth putting the effort into speeding them up. This didn't noticeably speed up the game map case, but it also didn't hurt.

A worst case for the algorithm as described above is a checkerboard; each white square is a degenerate single-node connected component. In this case, an M*M grid has M*M/2 locally-connected components, and the disjoint-set-forest algorithm run on those components will take O(M*M) time (even though they're all disconnected). This is not theoretical; making such a map did make this step very slow.

To improve performance, we could simply eliminate degenerate 1-node connected-components from the global step, but we can do better, handling larger components as well. We could classify each locally-connected component by whether it has an empty adjacency list or not. However, that could change when an update happens immediately next to the upgrid, creating a new adjacent connection even if the subgrid itself doesn't change.

Instead we classify each locally-connected component with whether it touches an edge of the subgrid or not. Those that touch it need adjacency lists and need to participate in the globally-connected components step. Those that do not do not need lists and do not participate. (This also helps the previous section; splitting up the adjacency-list storage, only components that are touch edges are allocated extra adjacency-list storage.) Remember, if the subgrid ever changes, it will be discarded and fully rebuilt, so if a locally-connected component doesn't touch the edge now, it never will for the lifetime of this subgrid.

By separating the list of locally-connected components in each subgrid into two lists (those that are on-edge and those that aren't), we avoid traversing the latter entirely, bringing performance down from O(N) to O(N^0.75) (i.e. from O(M^2) to O(M^1.5). This is still a bit slow; in practice with M=1024 I saw times from 0.25ms for the O(M)-ish normal-map case to 2.5ms for an O(M^1.5)-ish worst case, all compared to say 50ms for the O(M^2) full-map disjoint set forest.

The checkerboard case is actually not the worst case for performance, because it actually has

To do significantly better we'd have to formalize the problem differently; as it stands, an M*M grid has M subgrids with worst case O(sqrt(M)) adjacencies, and we have to call dsf_union() for every adjacency, so the O(M^1.5) worst case is unavoidable with this way of factoring the problem. Of course, speeding up this degenerate worst case is also not necessarily the priority; however, measurements have shown that the global disjoint-set-forest step is also the slowest step in the update operation, even for 'normal' grids.

One way to speed up the current global disjoint-set-forest step would be to apply a similar 'incremental' algorithm to it, i.e. introduce a third hierarchical level. The easiest way is to imagine that this is another level of grids

Note that that worst case hasn't improved a lot (the existing worst case for a 1024*1024 grid is 65536), so this might cause an even bigger spread between worst case and average case (which is perhaps surprising if you thought it would help the worst case more).

But I don't plan to do this, since two-level hierarchies tend to be the sweet spot of complexity & performance.

I wrote the initial version entirely on a 5.5 hour twitch stream. This includes about 45 minutes creating some test data and a reference solution, and some time at the end optimizing things.

There were additional updates to the library since the initial version, one to reduce memory by a factor of 10 in the 1024x1024 case, and some which fix some degenerate cases to have better performance. It's currently at version 0.93.