8.1.4 Binary Trees

The data structure we will use is known as a sorted binary tree. While a list provides constant time procedures for accessing the first element and the rest of the elements, a binary tree provides constant time procedures for accessing the root element, the left side of the tree, and the right side of the tree. The left and right sides of the tree are themselves trees. So, like a list, a binary tree is a recursive data structure.

Whereas we defined a List (in Chapter 5) as:

A List is either (1) null or (2) a Pair whose second cell is a List.

a Tree is defined as:

A Tree is either (1) null or (2) a triple while first and third parts are both Trees.

Symbolically:

The make-tree procedure can be defined using cons to package the three inputs into a tree:

(define (make-tree left element right)
  (cons element (cons left right)))

We define selector procedures for extracting the parts of a non-null tree:

(define (tree-element tree) (car tree))
(define (tree-left tree) (car (cdr tree)))
(define (tree-right tree) (cdr (cdr tree)))

The tree-left and tree-right procedures are constant time procedures that evaluate to the left or right subtrees respectively of a tree.

In a sorted tree, the elements are maintained in a sorted structure. All elements in the left subtree of a tree are less than (according to the comparison function) the value of the root element of the tree; all elements in the right subtree of a tree are greater than or equal to the value of the root element of the tree (the result of comparing them with the root element is false). For example, here is a sorted binary tree containing 6 elements using < as the comparison function:

The top node has element value 7, and its left subtree is a tree containing the tree elements whose values are less than 7. The null subtrees are not shown. For example, the left subtree of the element whose value is 12 is null. Although there are six elements in the tree, we can reach any element from the top by following at most two branches. By contrast, with a list of six elements, we need five cdr operations to reach the last element.

The depth of a tree is the largest number of steps needed to reach any node in the tree starting from the root. The example tree has depth 2, since we can reach every node starting from the root of the tree in two or fewer steps. A tree of depth $d$ can contain up to $2^{d+1}-1$ elements. One way to see this is from this recursive definition for the maximum number of nodes in a tree:

$$ \mathrm{TreeNodes}(d) = \left\{ \begin{array}{r@{\quad:\quad}l} 1 & d=0 \\ \mathrm{TreeNodes}(d-1) + 2 \times \mathrm{TreeLeaves}(d-1) & d > 0 \end{array} \right. $$

A tree of depth zero has one node. Increasing the depth of a tree by one means we can add two nodes for each leaf node in the tree, so the total number of nodes in the new tree is the sum of the number of nodes in the original tree and twice the number of leaves in the original tree. The maximum number of leaves in a tree of depth $d$ is $2^d$ since each level doubles the number of leaves. Hence, the second equation simplifies to

$$ \mathrm{TreeNodes}(d-1) + 2 \times 2^{d-1} = \mathrm{TreeNodes}(d-1) + 2^{d}. $$

The value of $\mathrm{TreeNodes}(d-1)$ is $2^{d-1} + 2^{d-2} + \ldots + 1 = 2^{d} - 1$. Adding $2^{d}$ and $2^{d}-1$ gives $2^{d+1} - 1$ as the maximum number of nodes in a tree of depth $d$. Hence, a well-balanced tree containing $n$ nodes has depth approximately $\log_2 n$. A tree is well-balanced if the left and right subtrees of all nodes in the contain nearly the same number of elements.

Procedures that are analogous to the list-first-half, list-second-half, and list-append procedures that had linear running times for the standard list representation can all be implemented with constant running times for the tree representation. For example, tree-left is analogous to list-first-half and make-tree is analogous to list-append.

The tree-insert-one procedure inserts an element in a sorted binary tree:

(define (tree-insert-one cf el tree)
  (if (null? tree) (make-tree null el null)
      (if (cf el (tree-element tree))
          (make-tree (tree-insert-one cf el (tree-left tree))
                     (tree-element tree)
                     (tree-right tree))
          (make-tree (tree-left tree)
                     (tree-element tree)
                     (tree-insert-one cf el (tree-right tree))))))

When the input tree is null, the new element is the top element of a new tree whose left and right subtrees are null. Otherwise, the procedure compares the element to insert with the element at the top node of the tree. If the comparison evaluates to true, the new element belongs in the left subtree. The result is a tree where the left tree is the result of inserting this element in the old left subtree, and the element and right subtree are the same as they were in the original tree. For the alternate case, the element is inserted in the right subtree, and the left subtree is unchanged.

In addition to the recursive call, tree-insert-one only applies constant time procedures. If the tree is well-balanced, each recursive application halves the size of the input tree so there are approximately $\log_2 n$ recursive calls. Hence, the running time to insert an element in a well-balanced tree using tree-insert-one is in $\Theta(\log n)$.

Using tree-insert-one, we define list-to-sorted-tree, a procedure that takes a comparison function and a list as its inputs, and outputs a sorted binary tree containing the elements in the input list. It inserts each element of the list in turn into the sorted tree:

(define (list-to-sorted-tree cf p)
  (if (null? p) null
      (tree-insert-one cf (car p) (list-to-sorted-tree cf (cdr p)))))

Assuming well-balanced trees as above (we revisit this assumption later), the expected running time of list-to-sorted-tree is in $\Theta(n\log n)$ where $n$ is the size of the input list. There are $n$ recursive applications of list-to-sorted-tree since each application uses cdr to reduce the size of the input list by one. Each application involves an application of tree-insert-one (as well as only constant time procedures), so the expected running time of each application is in $\Theta(\log n)$. Hence, the total running time for list-to-sorted-tree is in $\Theta(n\log n)$.

To use our list-to-sorted-tree procedure to perform sorting we need to extract a list of the elements in the tree in the correct order. The leftmost element in the tree should be the first element in the list. Starting from the top node, all elements in its left subtree should appear before the top element, and all the elements in its right subtree should follow it. The tree-extract-elements procedure does this:

(define (tree-extract-elements tree)
  (if (null? tree) null
      (list-append (tree-extract-elements (tree-left tree))
                   (cons (tree-element tree)
                         (tree-extract-elements (tree-right tree))))))

The total number of applications of tree-extract-elements is between $n$ (the number of elements in the tree) and $3n$ since there can be up to two null trees for each leaf element (it could never actually be $3n$, but for our asymptotic analysis it is enough to know it is always less than some constant multiple of $n$). For each application, the body applies list-append where the first parameter is the elements extracted from the left subtree. The end result of all the list-append applications is the output list, containing the $n$ elements in the input tree.

Hence, the total size of all the appended lists is at most $n$, and the running time for all the list-append applications is in $\Theta(n)$. Since this is the total time for all the list-append applications, not the time for each application of tree-extract-elements, the total running time for tree-extract-elements is the time for the recursive applications, in $\Theta(n)$, plus the time for the list-append applications, in $\Theta(n)$, which is in $\Theta(n)$.

Putting things together, we define list-sort-tree:

(define (list-sort-tree cf p)
  (tree-extract-elements (list-to-sorted-tree cf p)))

The total running time for list-sort-tree is the running time of the list-to-sorted-tree application plus the running time of the tree-extract-elements application. The running time of list-sort-tree is in $\Theta(n \log n)$ where $n$ is the number of elements in the input list (in this case, the number of elements in p), and the running time of tree-extract-elements is in $\Theta(n)$ where $n$ is the number of elements in its input list (which is the result of the list-to-sorted tree application, a list containing $n$ elements where $n$ is the number of elements in p).

Only the fastest-growing term contributes to the total asymptotic running time, so the expected total running time for an application of list-sort-tree-insert to a list containing $n$ elements is in $\Theta(n\log n)$. This is substantially better than the previous sorting algorithms which had running times in $\Theta(n^2)$ since logarithms grow far slower than their input. For example, if $n$ is one million, $n^2$ is over 50,000 times bigger than $n \log_2 n$; if $n$ is one billion, $n^2$ is over 33 million times bigger than $n \log_2 n$ since $\log_2 1000000000$ is just under 30.

There is no general sorting procedure that has expected running time better than $\Theta(n\log n)$, so there is no algorithm that is asymptotically faster than list-sort-tree (in fact, it can be proven that no asymptotically faster sorting procedure exists). There are, however, sorting procedures that may have advantages such as how they use memory which may provide better absolute performance in some situations.

Unbalanced Trees. Our analysis assumes the left and right halves of the tree passed to tree-insert-one having approximately the same number of elements. If the input list is in random order, this assumption is likely to be valid: each element we insert is equally likely to go into the left or right half, so the halves contain approximately the same number of elements all the way down the tree. But, if the input list is not in random order this may not be the case.

For example, suppose the input list is already in sorted order. Then, each element that is inserted will be the rightmost node in the tree when it is inserted. For the previous example, this produces the unbalanced tree shown in Figure 8.1. This tree contains the same six elements as the earlier example, but because it is not well-balanced the number of branches that must be traversed to reach the deepest element is 5 instead of 2. Similarly, if the input list is in reverse sorted order, we will have an unbalanced tree where only the left branches are used.

Figure 8.1: Unbalanced trees

In these pathological situations, the tree effectively becomes a list. The number of recursive applications of tree-insert-one needed to insert a new element will not be in $\Theta(\log n)$, but rather will be in $\Theta(n)$. Hence, the worst case running time for list-sort-tree-insert is in $\Theta(n^2)$ since the worst case time for tree-insert-one is in $\Theta(n)$ and there are $\Theta(n)$ applications of tree-insert-one. The list-sort-tree-insert procedure has expected running time in $\Theta(n \log n)$ for randomly distributed inputs, but has worst case running time in $\Theta(n^2)$.

Exercise 8.7. Define a procedure binary-tree-size that takes as input a binary tree and outputs the number of elements in the tree. Analyze the running time of your procedure.

Exercise 8.8. $\left[\star \right]$ Define a procedure binary-tree-depth that takes as input a binary tree and outputs the depth of the tree. The running time of your procedure should not grow faster than linearly with the number of nodes in the tree.

Exercise 8.9. $\left[\star \right]$ $\left[\star \right]$ Define a procedure binary-tree-balance that takes as input a sorted binary tree and the comparison function, and outputs a sorted binary tree containing the same elements as the input tree but in a well-balanced tree. The depth of the output tree should be no higher than $\log_2 n + 1$ where $n$ is the number of elements in the input tree.