Wednesday, July 8, 2009

Closure Trees

The code here is PHP and MySQL. It should be trivial to translate into other languages and databases.

I needed to store a tree in a database (mysql) the other day, and I stumbled across this: (It starts on slide 68)



The basic idea is to store paths to all ancestors and all descendants in a separate table. This makes a lot of actions very fast at the cost of space. Many of the basic modifications of the tree are covered in the slides, but some of them took me some thought to work out.

To represent the tree of (1,2(3,4(5))), we'll do:


INSERT INTO closure (ancestor, descendant) VALUES (1,1);
INSERT INTO closure (ancestor, descendant) VALUES (2,2);
INSERT INTO closure (ancestor, descendant) VALUES (1,2);
INSERT INTO closure (ancestor, descendant) VALUES (3,3);
INSERT INTO closure (ancestor, descendant) VALUES (2,3);
INSERT INTO closure (ancestor, descendant) VALUES (1,3);
INSERT INTO closure (ancestor, descendant) VALUES (4,4);
INSERT INTO closure (ancestor, descendant) VALUES (2,4);
INSERT INTO closure (ancestor, descendant) VALUES (1,4);
INSERT INTO closure (ancestor, descendant) VALUES (5,5);
INSERT INTO closure (ancestor, descendant) VALUES (4,5);
INSERT INTO closure (ancestor, descendant) VALUES (2,5);
INSERT INTO closure (ancestor, descendant) VALUES (1,5);


Building the Tree in Memory

If we want to build the tree in memory, first we grab a list of all of each node's ancestors:


SELECT
descendant AS id,
GROUP_CONCAT(ancestor) as ancestors
FROM
closure
GROUP BY (descendant);

+----+-----------+
| id | ancestors |
+----+-----------+
| 1 | 1 |
| 2 | 2,1 |
| 3 | 3,1,2 |
| 4 | 4,1,2 |
| 5 | 5,4,1,2 |
+----+-----------+


When we bring it into our language of choice, the first thing to do it sort the list by number of ancestors.
Top level nodes will only have themselves as an ancestor, and any node with two ancestors is dependant solely on a node with one ancestor. After we have sorted our ancestor list, we either create the first node with the first item in the list, or we create an empty node to start as $tree. I used an empty node, as I have many top level nodes. If you have only one node without ancestors other than itself, use it for the first node.

To traverse the tree, the next node will always be the intersection between the list of ancestors, and the available ancestors.

If there is only one ancestor left, it is to be a new child of whatever node you end up on.


function add_node($ancestors, &$tree) {
if (count($ancestors) == 1) {
$tree->add_child($ancestors);
return;
}
$next_node = array_intersect($ancestors, $tree->get_children_ids());
$this->add_node(
array_diff($ancestors, $next_node) ,
$tree->get_child_by_id(array_pop($next_node))
);
}

Moving a Segment

Moving a segment of the tree is probably the most painful task. It is divided into two parts.
1) Remove all the links from the node and its descendants to all of its ancestors
2) Insert all the links from the knode and its descendants to the new ancestors

So, if we want to move node 4 under node 3, first we:
delete these descendants:
SELECT descendant FROM closure WHERE ancestor=4;
from these ancestors:
SELECT ancestor FROM closure WHERE descendant=4 AND ancestor != 4;

With some help from FurnaceBoy on #MySQL, the query is:
DELETE q FROM closure
q NATURAL JOIN (
SELECT a.ancestor, d.descendant
FROM closure a
CROSS JOIN closure d
WHERE a.descendant=4 AND a.ancestor != 4 AND d.ancestor=4
) t;


Now, to insert the new links.

First we get the ancestors of the new node (3)
SELECT ancestor FROM closure WHERE descendant=3;
Then we get the descendants of the old node (4)
SELECT descendant FROM closure WHERE ancestor=4;
Then we cross join them.

SELECT a.ancestor, d.descendant
FROM closure a
CROSS JOIN closure d
WHERE a.descendant=2 AND d.ancestor=1;



INSERT INTO closure (ancestor, descendant)
SELECT a.ancestor, d.descendant
FROM closure a CROSS JOIN closure d
WHERE a.descendant=3 AND d.ancestor=4;