8.1.5 Quicksort

Although building and extracting elements from trees allows us to sort with expected time in $\Theta(n\log n)$, the constant time required to build all those trees and extract the elements from the final tree is high.

In fact, we can use the same approach to sort without needing to build trees. Instead, we keep the two sides of the tree as separate lists, and sort them recursively. The key is to divide the list into halves by value, instead of by position. The values in the first half of the list are all less than the values in the second half of the list, so the lists can be sorted separately.

The list-quicksort procedure uses list-filter (from Example 5.5) to divide the input list into sublists containing elements below and above the comparison element, and then recursively sorts those sublists:

(define (list-quicksort cf p)
  (if (null? p) null
      (list-append
       (list-quicksort cf
        (list-filter (lambda (el) (cf el (car p))) (cdr p)))
       (cons (car p)
             (list-quicksort cf
              (list-filter (lambda (el) (not (cf el (car p)))) (cdr p)))))))

This is the famous quicksort algorithm that was invented by Sir C. A. R. (Tony) Hoare while he was an exchange student at Moscow State University in 1959. He was there to study probability theory, but also got a job working on a project to translate Russian into English. The translation depended on looking up words in a dictionary. Since the dictionary was stored on a magnetic tape which could be read in order faster than if it was necessary to jump around, the translation could be done more quickly if the words to translate were sorted alphabetically. Hoare invented the quicksort algorithm for this purpose and it remains the most widely used sorting algorithm.

As with list-sort-tree-insert, the expected running time for a randomly arranged list is in $\Theta(n\log n)$ and the worst case running time is in $\Theta(n^2)$. In the expected cases, each recursive call halves the size of the input list (since if the list is randomly arranged we expect about half of the list elements are below the value of the first element), so there are approximately $\log n$ expected recursive calls.

Each call involves an application of list-filter, which has running time in $\Theta(m)$ where $m$ is the length of the input list. At each call depth, the total length of the inputs to all the calls to list-filter is $n$ since the original list is subdivided into $2^d$ sublists, which together include all of the elements in the original list. Hence, the total running time is in $\Theta(n\log n)$ in the expected cases where the input list is randomly arranged. As with list-sort-tree-insert, if the input list is not randomly rearranged it is possible that all elements end up in the same partition. Hence, the worst case running time of list-quicksort is still in $\Theta(n^2)$.

Exercise 8.10. Estimate the time it would take to sort a list of one million elements using list-quicksort.

Exercise 8.11. Both the list-quicksort and list-sort-tree-insert procedures have expected running times in $\Theta(n\log n)$. Experimentally compare their actual running times.

Exercise 8.12. What is the best case input for list-quicksort? Analyze the asymptotic running time for list-quicksort on best case inputs.

Exercise 8.13. $\left[\star \right]$ Instead of using binary trees, we could use ternary trees. A node in a ternary tree has two elements, a left element and a right element, where the left element must be before the right element according to the comparison function. Each node has three subtrees: left, containing elements before the left element; middle, containing elements between the left and right elements; and right, containing elements after the right element. Is it possible to sort faster using ternary trees?