PostgreSQL 11 Server Side Programming Quick Start Guide
上QQ阅读APP看书,第一时间看更新

Recursive CTEs

A recursive CTE is a special construct that allows an auxiliary statement to reference itself and, therefore, join itself onto previously-computed results. This is particularly useful when we need to join a table an unknown number of times, typically to 'explode' a flat tree structure. The traditional solution would involve some kind of iteration, probably by means of a cursor that iterates one tuple at a time over the whole result set. However, with recursive CTEs, we can use a much cleaner and simpler approach.

A recursive CTE is made by an auxiliary statement that is built on top of the following:

  • A non-recursive statement, which works as a bootstrap statement and is executed when the auxiliary term is first evaluated
  • A recursive statement, which can either reference the bootstrap statement or itself

These two parts are joined together by means of a UNION predicate.

In order to better understand the use case of a recursive CTE, let's consider a tags table, populated as shown in the following listing. As you can see, each tag is related to a possible parent tag; as you can see, the tag music contains the tags rock, hard rock, and pop. The tag hard rock, in turn, contains metallica and foo fighters:

testdb=> SELECT t_name, pk, t_child_of FROM tags;

t_name | pk |t_child_of --------------+----+---------- holidays | 10 | family | 12 | music | 13 | metallica | 17 | 16 foo fighters | 18 | 16 rock | 14 | 13 pop | 15 | 13 hard rock | 16 | 13 sicily | 11 | 10 2017 | 19 | 11 2018 | 20 | 11 2018 | 21 |
Listing 17:  Tag table example data

So, how can we extract the whole tree structure for every tag? Let's compose a recursive CTE to do this for us. First, we need to declare the auxiliary statement with the RECURSIVE keyword, so that PostgreSQL knows that the CTE will refer itself in the auxiliary statement. Then, we have to write the non-recursive part of the auxiliary statement. In this specific example, the non-recursive part is the extraction of all tags that are not children of other tags, or in other words, the parent tags. The resulting tuples must be combined with each tuple that comes from the non-recursive statement by means of the UNION predicate. This is when t_child_of has a value set to the non-recursive result set, pk. The following listing implements such logic and provides the output of all of the tag names, where each level of the tree has been separated by a > sign. The important part here is that the recursive part of the auxiliary statement (aliased as ct) joins to the non-recursive part (aliased as tt) and that it does this however many times is necessary:

testdb=> WITH RECURSIVE tags_tree AS (
  -- non recursive statment
  SELECT t_name, pk
  FROM tags WHERE t_child_of IS NULL

  UNION
  -- recursive statement
  SELECT tt.t_name || ' > ' || ct.t_name, ct.pk
  FROM tags ct
  JOIN tags_tree tt ON tt.pk = ct.t_child_of
)
SELECT t_name FROM tags_tree ORDER BY t_name;

t_name ---------------------------------- family holidays holidays > sicily music music > hard rock music > hard rock > foo fighters music > hard rock > metallica music > pop music > rock
Listing 18:  Listing the tags tree

Note that we can also add computed-on-the-fly values, for example a level column that holds the depth of the current level within the tree. This can be useful when searching for tags that appear at different levels within a tree. As you can see in the following example, the 2018 tag can appear as a root tag or a leaf tag, and the recursive CTE extracts only the non-leaf tag hierarchy:

testdb=> WITH RECURSIVE tags_tree AS (
  -- non recursive statment
  SELECT t_name, pk, 1 AS level
  FROM tags WHERE t_child_of IS NULL

  UNION
  -- recursive statement
  SELECT tt.t_name || ' > ' || ct.t_name, ct.pk
        , tt.level + 1
  FROM tags ct
  JOIN tags_tree tt ON tt.pk = ct.t_child_of
)
SELECT t_name FROM tags_tree 
WHERE t_name like '%2018%' AND level > 1;

t_name
--------------------------
holidays > sicily > 2018
Listing 19:  Searching for a deeper tag

When dealing with recursive CTEs, it is important to be aware of infinite loops. These can arise if the recursion part of an auxiliary statement does not end properly. As an example, in the previous code snippet, all we need to do is change the recursive part of the auxiliary statement to reference the column tt.pk in the output of the SELECT statement, instead of the correct reference to the column ct.pk, to make the query loop infinitely by joining each row to itself. As you can see, it is really important to constrain the recursive term in such a way that it can end properly.