Representing Large Networks by HIERARCHYID Chunks

If you recall, I wrote about “Polymorphic Graph Queries” a while ago. This exemplified the use of HIERARCHYID to represent the topology of a small computer network. As it turns out, there is a case in which the HIERARCHYID approach will explode in both numbers and size, making them an unwieldy choice, and it’s commonly seen in large networks. There is however a way to work around that issue. As far as I can tell, the graph tables in SQL Server still do not support polymorphic queries, so this workaround should be valuable.

Assume that we have a reasonably large computer network, with say a million or more devices. Representing the entire topology of the network efficiently turns out to require a combination of HIERARCHYID and traditional relational tables. HIERARCHYID performs well all the way down from locations, through enclosures, devices, and ports or antennas to the actual communication media (fiber, ethernet, wireless). Because of the large number of things connected to this layer, this is where they become unwieldy and explode in numbers. HIERARCHYID does not work well when you have intermediate layers with comparatively massive amounts of connections. Such a scenario could easily bring you into needing billions of HIEARCHYID:s. Storage skyrockets and performance goes down the drain.

Instead, by having a traditional many-to-many table represent such layers, in which different HIERARCHYID:s are related to each other, it is possible to get the best of both worlds and achieve the ability do sub second searches through the topology. Let’s call the structure (UID, HIERARCHYID) a chunk, where the UID can typically be an integer. The relational table can then be as simple as (UID, UID) indicating that two chunks are connected, only requiring as many rows as there are connections. Polymorphic queries now need to take this into account, by first finding a number of candidate chunks, then join these through the relational table to discard ones that are not connected, which yields the final result.

A similar recursive query used for testing a relational parent-child hierarchy of the same network had to be stopped after having run for several hours. The benefit of HIERARHYID is substantial, but only if you take special care of layers with high connectivity. For small uncomplicated hierarchies, like employees and managers at a company, a traditional representation with less complexity is likely sufficient. Some alternatives can be found in “Hierarchical Data in SQL” by Ben Brumm.

Published by

Lars Rönnbäck

Co-developer of the Anchor Modeling technique. Programmer of the online modeling tool. Site maintainer. Presenter and trainer.

Leave a Reply

Your email address will not be published. Required fields are marked *