Trees in SQL databases

Introduction

Sooner or later in your database project you will need to store hierarchical information. For instance enterprise structure, categories of merchandise, product catalogs, or folders with documents all are good nominees for such structures. It is easy of course to implement that kind of store with self-related table. Problems arrive once you need to answer hierarchical questions. For example what if we have an enterprise structure and need to know how many employees reporting to manager �X�? There have been already some solutions of the problem.

See:

Joe Celko�s book and articles also covered it.

The above techniques are good when you only reading the data, but when you need to modify the tree you will have considerable performance deprivation.

I am going to introduce method that has similar performance for reading but better performance for modification of the tree.

The Idea

Suppose we have the following table:

Diagram1

Where NodeId is the primary key, ParentId is the foreign key, and NodeName is some extra field. In order to provide simple examples let�s assume that NodeName has unique constraint on it.

For all examples below I will use the following tree:

Sample tree

Let�s create another table having two relationships with the Node table:

Diagram2

Where both NodeId and ParentId are foreign keys referencing Node table.

Each record in the Tree table means that the Node pointed by Tree.NodeId has the ancestor pointed by Tree.ParentId. By ancestor I understand any node lying on the path from the node to the root of the tree, i.e. direct parent or grand parent or grand-grand parent and so on.

Now let�s ensure one simple invariant: all ancestors of each node from Node table are enumerated in Tree table.

For example for node 4 they are going to be 2 and 1.

The immediate result of this invariant is: this is very easy to get full path from the node to the root of the tree – just select all records from the Tree table related to the seeking node. Let�s write down this select. Suppose we need to find path from node 7 to the root.

Query 1:

Collapse
select p.*
from Node n, Tree t, Node p
where n.NodeName=7
    and n.NodeId = t.NodeId
    and t.ParentId = p.NodeId

It is a pretty strait forward select, isn�t it?

The second result of the above invariant is: we can select all descendants of the node. This is because we are enlisting all ancestors for every node from Node table in the Tree table including of course all ancestors of the seeking node. In order to select entire sub tree of the node we just query for all nodes that have the seeking node as their ancestor.

Query 2:

Collapse
select c.*
from Node n, Tree t, Node c
where n.NodeName=2
    and n.NodeId = t.ParentId
    and t.NodeId = c.NodeId

This is as simple as that. Note that the two queries are opposite in the direction passing through Tree table.

Implementation

Before going further let me make two notes.

# 1: from my experience it is very convenient to have one more field in the Tree table representing the distance on the tree between nodes pointed by NodeId and ParentId, in another words – level of ancestor. So the actual structure includes Level field:

Diagram3

# 2: in most probable cases we need sub tree of the node or path from the node to the root including the node itself. So it would be useful having in the Tree table the node itself as its own ancestor of the level 0.

It is possible to implement the above invariant either as stored procedures or as triggers. I personally prefer trigger one.

Let�s start with insert trigger for Node table. I will be using MS SQL server.

First of all the trigger must create a record to satisfy # 2. This can be done by the following statement:

Collapse
    insert into Tree(NodeId, ParentId, Level)
    select NodeId, NodeId, 0
    from inserted

Now we are ready to add the list of all ancestors of the inserted nodes. Take in consideration that the parent of each inserting node already has enumerated all his ancestors, so we can use them as a foundation. But we need to replace parent�s NodeId with the id of the new node and increment the level.

Collapse
    insert into Tree(NodeId, ParentId, Level)
    select n.NodeId, t.ParentId, t.Level + 1
    from inserted n, Tree t
    where n.ParentId = t.NodeId

So the overall trigger will be:

Collapse
create trigger NodeInsert on Node for insert as
begin
    set nocount on

    insert into Tree(NodeId, ParentId, Level)
    select NodeId, NodeId, 0
    from inserted

    insert into Tree(NodeId, ParentId, Level)
    select n.NodeId, t.ParentId, t.Level + 1
    from inserted n, Tree t
    where n.ParentId = t.NodeId
end
go

Now it is time to define update trigger which takes care of changing node�s parent. The goal of update trigger is to delete all obsolete records from Tree table and create new ones. In order to minimize changes been made by this trigger let�s reuse the information already stored in the Tree table in a manner we did in insert trigger. What is not going to be affected by changing node�s parent?

Apparently for updated node itself it is persistent only one record that satisfies #2. And for descendants they are going to be permanent all records about ancestors below updated node including this node. While every ancestor above the updated node will obsolete.

For example if we are changing parent for node 4 then for nodes 5, 6, and 7 will be changed only ancestors above node 4 i.e. 1 and 2.

On the first step we back up all descendants of updated nodes in the following table:

Collapse
    declare @child table(NodeId int, Level int)

We insert all descendants� primary keys and the level of each of them relatively to updated nodes. Condition t.Level > 0 allows to exclude updated node from been inserted to the @child table.

Collapse
    insert into @child(NodeId, Level)
    select t.NodeId, t.Level
    from inserted n, Tree t
    where n.NodeId = t.ParentId and t.Level > 0

The second step deletes all obsolete rows for all descendants:

Collapse
    delete Tree
    where
        Tree.NodeId in(select NodeId from @child)
        and Tree.ParentId in(
            select t.ParentId
            from inserted n, Tree t
            where n.NodeId = t.NodeId and t.Level > 0
        )

Condition t.Level > 0 excludes updated nodes themselves from the deletion.

The third step removes obsolete records for updated nodes:

Collapse
    delete Tree
    where Tree.NodeId in(select NodeId from inserted)
        and Tree.Level > 0

On the next two steps we will populate new information in the Tree table.

So the forth step is going to be:

Collapse
    insert into Tree(NodeId, ParentId, Level)
    select n.NodeId, t.ParentId, t.Level + 1
    from inserted n, Tree t
    where n.ParentId = t.NodeId

This is exactly how we did it in the insert trigger.

And finally the fifth step:

Collapse
    insert into Tree(NodeId, ParentId, Level)
    select c.NodeId, t.ParentId, t.Level + c.Level
    from inserted n, Tree t, @child c
    where n.NodeId = t.NodeId and t.Level > 0

It might look strange that there is no any joins with @child table in this statement. But this is what we really need here because we are going to repeat ancestor�s information for each child. Also note that we are adding child�s level stored in the @child table rather that just 1.

The entire trigger looks like:

Collapse
create trigger NodeUpdate on Node for update as
if update(ParentId)
begin
    set nocount on

    declare @child table(NodeId int, Level int)

    insert into @child(NodeId, Level)
    select t.NodeId, t.Level
    from inserted n, Tree t
    where n.NodeId = t.ParentId and t.Level > 0

    delete Tree
    where
        Tree.NodeId in(select NodeId from @child)
        and Tree.ParentId in(
            select t.ParentId
            from inserted n, Tree t
            where n.NodeId = t.NodeId and t.Level > 0
        )

    delete Tree
    where Tree.NodeId in(select NodeId from inserted) and Tree.Level > 0

    insert into Tree(NodeId, ParentId, Level)
    select n.NodeId, t.ParentId, t.Level + 1
    from inserted n, Tree t
    where n.ParentId = t.NodeId

    insert into Tree(NodeId, ParentId, Level)
    select c.NodeId, t.ParentId, t.Level + c.Level
    from inserted n, Tree t, @child c
    where n.NodeId = t.NodeId and t.Level > 0
end
go

As you can see it will be executed only if ParentId was updated and all other updates of Node table will be filtered out by the very first if statement.

Restrictions of the implementation

If you are going to insert or update more than one Node at a time please make sure you are not doing this for more then one level of the tree. From the practical point of view this is not a strong restriction at all.

Remaining steps

In order to make the functionality completed we need the ability to clean the entire Tree table and populate it from the scratch. It is probably a good idea to create a stored procedure for this purpose, and I am not going to take away from you the pleasure to write down such procedure in you own.

Examples

You can find SQL statements for creating both tables, both triggers, and statements that populating Node table in the attached file. It also contains Query 1 and 2 and statements to change node�s parent.

 

About eagle081183

Passionate, Loyal
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s