7.1 Empirical measurements

We can measure the cost of evaluating a given expression empirically. If we are primarily concerned with time, we could just use a stopwatch to measure the evaluation time. For more accurate results, we use the built-in (time Expression) special form.1 Evaluating (time Expression) produces the value of the input expression, but also prints out the time required to evaluate the expression (shown in our examples using slanted font). It prints out three time values:

  1.  
    The time in milliseconds the processor ran to evaluate the expression. CPU is an abbreviation for “central processing unit”, the computer’s main processor.

  2.  
    The actual time in milliseconds it took to evaluate the expression. Since other processes may be running on the computer while this expression is evaluated, the real time may be longer than the CPU time, which only counts the time the processor was working on evaluating this expression.

  3.  
    The time in milliseconds the interpreter spent on garbage collection to evaluate the expression. Garbage collection is used to reclaim memory that is storing data that will never be used again.

For example, using the definitions from Chapter 5,

(time (solve-pegboard (board-remove-peg (make-board 5)
                                                                (make-position 1 1))))

prints: cpu time: 141797 real time: 152063 gc time: 765. The real time is 152 seconds, meaning this evaluation took just over two and a half minutes. Of this time, the evaluation was using the CPU for 142 seconds, and the garbage collector ran for less than one second.

Here are two more examples:

> (time (car (list-append (intsto 1000) (intsto 100))))
cpu time: 531 real time: 531 gc time: 62
1
> (time (car (list-append (intsto 1000) (intsto 100))))
cpu time: 609 real time: 609 gc time: 0
1

The two expressions evaluated are identical, but the reported time varies. Even on the same computer, the time needed to evaluate the same expression varies. Many properties unrelated to our expression (such as where things happen to be stored in memory) impact the actual time needed for any particular evaluation. Hence, it is dangerous to draw conclusions about which procedure is faster based on a few timings.

Another limitation of this way of measuring cost is it only works if we wait for the evaluation to complete. If we try an evaluation and it has not finished after an hour, say, we have no idea if the actual time to finish the evaluation is sixty-one minutes or a quintillion years. We could wait another minute, but if it still hasn’t finished we don’t know if the execution time is sixty-two minutes or a quintillion years. The techniques we develop allow us to predict the time an evaluation needs without waiting for it to execute.

There’s no sense in being precise when you don’t even know what you’re talking about.
John von Neumann

Finally, measuring the time of a particular application of a procedure does not provide much insight into how long it will take to apply the procedure to different inputs. We would like to understand how the evaluation time scales with the size of the inputs so we can understand which inputs the procedure can sensibly be applied to, and can choose the best procedure to use for different situations. The next section introduces mathematical tools that are helpful for capturing how cost scales with input size.

Exercise 7.1. Suppose you are defining a procedure that needs to append two lists, one short list, short and one very long list, long, but the order of elements in the resulting list does not matter. Is it better to use (list-append short long) or (list-append long short)? (A good answer will involve both experimental results and an analytical explanation.)

Exploration 7.1: Multiplying Like Rabbits

Filius Bonacci was an Italian monk and mathematician in the 12th century. He published a book, Liber Abbaci, on how to calculate with decimal numbers that introduced Hindu-Arabic numbers to Europe (replacing Roman numbers) along with many of the algorithms for doing arithmetic we learn in elementary school. It also included the problem for which Fibonacci numbers are named:2

  1. A pair of newly-born male and female rabbits are put in a field. Rabbits mate at the age of one month and after that procreate every month, so the female rabbit produces a new pair of rabbits at the end of its second month. Assume rabbits never die and that each female rabbit produces one new pair (one male, one female) every month from her second month on. How many pairs will there be in one year?

Filius Bonacci

We can define a function that gives the number of pairs of rabbits at the beginning of the $n^{th}$ month as:

$$ \text{Fibonacci}(n) = \begin{cases}1 & : n = 1 \\ 1 & : n = 2 \\ \text{Fibonacci}(n-1) + \text{Fibonacci}(n-2) & : n > 1 \end{cases} $$

The third case follows from Bonacci’s assumptions: all the rabbits alive at the beginning of the previous month are still alive (the $\textrm{Fibonacci}(n-1)$ term), and all the rabbits that are at least two months old reproduce (the $\textrm{Fibonacci}(n-2)$ term).

The sequence produced is known as the Fibonacci sequence:

[ 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, \ldots ]

After the first two 1s, each number in the sequence is the sum of the previous two numbers. Fibonacci numbers occur frequently in nature, such as the arrangement of florets in the sunflower (34 spirals in one direction and 55 in the other) or the number of petals in common plants (typically 1, 2, 3, 5, 8, 13, 21, or 34), hence the rarity of the four-leaf clover.

Translating the definition of the Fibonacci function into a Scheme procedure is straightforward; we combine the two base cases using the or special form:

(define (fibo n)
        (if (or (= n 1) (= n 2)) 1
                (+ (fibo (- n 1)) (fibo (- n 2)))))

Applying fibo to small inputs works fine:

> (time (fibo 10))
cpu time: 0 real time: 0 gc time: 0
55
> (time (fibo 30))
cpu time: 2156 real time: 2187 gc time: 0
832040

But when we try to determine the number of rabbits in five years by computing (fibo 60), our interpreter just hangs without producing a value. The fibo procedure is defined in a way that guarantees it eventually completes when applied to a non-negative whole number: each recursive call reduces the input by 1 or 2, so both recursive calls get closer to the base case. Hence, we always make progress and must eventually reach the base case, unwind the recursive applications, and produce a value. To understand why the evaluation of (fibo 60) did not finish in our interpreter, we need to consider how much work is required to evaluate the expression.

To evaluate (fibo 60), the interpreter follows the if expressions to the recursive case, where it needs to evaluate (+ (fibo 59) (fibo 58)). To evaluate (fibo 59), it needs to evaluate (fibo 58) again and also evaluate (fibo 57). To evaluate (fibo 58) (which needs to be done twice), it needs to evaluate (fibo 57) and (fibo 56). So, there is one evaluation of (fibo 60), one evaluation of (fibo 59), two evaluations of (fibo 58), and three evaluations of (fibo 57).

The total number of evaluations of the fibo procedure for each input is itself the Fibonacci sequence! To understand why, consider the evaluation tree for (fibo 4) shown in Figure 7.1. The only direct number values are the 1 values that result from evaluations of either (fibo 1) or (fibo 2). Hence, the number of 1 values must be the value of the final result, which just sums all these numbers. For (fibo 4), there are 5 leaf applications, and 3 more inner applications, for 8 (= $\textrm{Fibonacci}(5)$) total recursive applications. The number of evaluations of applications of fibo needed to evaluate (fibo 60) is the 61st Fibonacci number — 2,504,730,781,961 — over two and a half trillion applications of fibo!

Figure 7.1: Evaluation of fibo procedure.

Although our fibo definition is correct, it is ridiculously inefficient and only finishes for input numbers below about 40. It involves a tremendous amount of duplicated work: for the (fibo 60) example, there are two evaluations of (fibo 58) and over a trillion evaluations of (fibo 1) and (fibo 2).

We can avoid this duplicated effort by building up to the answer starting from the base cases. This is more like the way a human would determine the numbers in the Fibonacci sequence: we find the next number by adding the previous two numbers, and stop once we have reached the number we want.

The fast-fibo procedure computes the $n^{th}$ Fibonacci number, but avoids the duplicate effort by computing the results building up from the first two Fibonacci numbers, instead of working backwards.

(define (fast-fibo n)
        (define (fibo-iter a b left)
                (if ( < = left 0) b
                        (fibo-iter b (+ a b) (- left 1))))
        (fibo-iter 1 1 (- n 2)))

This is a form of what is known as dynamic programming. The definition is still recursive, but unlike the original definition the problem is broken down differently. Instead of breaking the problem down into a slightly smaller instance of the original problem, with dynamic programming we build up from the base case to the desired solution. In the case of Fibonacci, the fast-fibo procedure builds up from the two base cases until reaching the desired answer. The additional complexity is we need to keep track of when to stop; we do this using the left parameter.

The helper procedure, fibo-iter (short for iteration), takes three parameters: a is the value of the previous-previous Fibonacci number, b is the value of the previous Fibonacci number, and left is the number of iterations needed before reaching the target. The initial call to fibo-iter passes in 1 as a (the value of $\textrm{Fibonacci}(1)$), and 1 as b (the value of $\textrm{Fibonacci}(2)$), and (- n 2) as left (we have $n - 2$ more iterations to do to reach the target, since the first two Fibonacci numbers were passed in as a and b we are now working on $\textrm{Fibonacci}(2)$). Each recursive call to fibo-iter reduces the value passed in as left by one, and advances the values of a and b to the next numbers in the Fibonacci sequence.

The fast-fibo procedure produces the same output values as the original fibo procedure, but requires far less work to do so. The number of applications of fibo-iter needed to evaluate (fast-fibo 60) is now only 59. The value passed in as left for the first application of fibo-iter is 58, and each recursive call reduces the value of left by one until the zero case is reached. This allows us to compute the expected number of rabbits in 5 years is 1548008755920 (over 1.5 Trillion)3.

 


  1. The |time| construct must be a special form, since the expression is not evaluated before entering |time| as it would be with the normal application rule. If it were evaluated normally, there would be no way to time how long it takes to evaluate, since it would have already been evaluated before |time| is applied. 

  2. Although the sequence is named for Bonacci, it was probably not invented by him. The sequence was already known to Indian mathematicians with whom Bonacci studied. 

  3. Perhaps Bonacci’s assumptions are not a good model for actual rabbit procreation. This result suggests that in about 10 years the mass of all the rabbits produced from the initial pair will exceed the mass of the Earth, which, although scary, seems unlikely!