8.2.2 Binary Search

If the data to search is structured, it may be possible to find an element that satisfies some property without examining all elements. Suppose the input data is a sorted binary tree, as introduced in Section 8.1.4. Then, with a single comparison we can determine if the element we are searching for would be in the left or right subtree. Instead of eliminating just one element with each application of the matching function as was the case with list-search, with a sorted binary tree a single application of the comparison function is enough to exclude approximately half the elements.

The binary-tree-search procedure takes a sorted binary tree and two procedures as its inputs. The first procedure determines when a satisfying element has been found (we call this the ef procedure, suggesting equality). The second procedure, cf, determines whether to search the left or right subtree. Since cf is used to traverse the tree, the input tree must be sorted by cf.

(define (binary-tree-search ef cf tree) ; requires: tree is sorted by cf
  (if (null? tree) false
      (if (ef (tree-element tree)) (tree-element tree)
          (if (cf (tree-element tree))
              (binary-tree-search ef cf (tree-left tree))
              (binary-tree-search ef cf (tree-right tree))))))

For example, we can search for a number in a sorted binary tree using = as the equality function and < as the comparison function:

(define (binary-tree-number-search tree target)
 (binary-tree-search (lambda (el) (= target el))
                     (lambda (el) (< target el))
                     tree))

To analyze the running time of binary-tree-search, we need to determine the number of recursive calls. Like our analysis of list-sort-tree, we assume the input tree is well-balanced. If not, all the elements could be in the right branch, for example, and binary-tree-search becomes like list-search in the pathological case.

If the tree is well-balanced, each recursive call approximately halves the number of elements in the input tree since it passed in either the left or right subtree. Hence, the number of calls needed to reach a null tree is in $\Theta(\log n)$ where $n$ is the number of elements in the input tree. This is the depth of the tree: binary-tree-search traverses one path from the root through the tree until either reaching an element that satisfies the ef function, or reaching a null node.

Assuming the procedures passed as ef and cf have constant running time, the work for each call is constant except for the recursive call. Hence, the total running time for binary-tree-search is in $\Theta(\log n)$ where $n$ is the number of elements in the input tree. This is a huge improvement over linear searching: with linear search, doubling the number of elements in the input doubles the search time; with binary search, doubling the input size only increases the search time by a constant.