7.4.2 Linear Growth

When the running time of a procedure increases by a constant amount when the size of the input grows by one, the running time of the procedure grows linearly with the input size. If the input size is $n$, the running time is in $\Theta(n)$. If a procedure has running time in $\Theta(n)$, doubling the size of the input will approximately double the execution time.

An example of a procedure that has linear growth is the elementary school addition algorithm from Section 6.2.3. To add two $d$-digit numbers, we need to perform a constant amount of work for each digit. The number of steps required grows linearly with the size of the numbers (recall from Section 7.3.1 that the size of a number is the number of input symbols needed to represent the number.

Many procedures that take a List as input have linear time growth. A procedure that does something that takes constant time with every element in the input List, has running time that grows linearly with the size of the input since adding one element to the list increases the number of steps by a constant amount. Next, we analyze three list procedures, all of which have running times that scale linearly with the size of their input.

Example 7.4: Append

Consider the list-append procedure (from Example 5.6):

(define (list-append p q)
  (if (null? p) q (cons (car p) (list-append (cdr p) q))))

Since list-append takes two inputs, we need to be careful about how we refer to the input size. We use $n_p$ to represent the number of elements in the first input, and $n_q$ to represent the number of elements in the second input. So, our goal is to define a function $\mathrm{RunningTimeL}(n_p, n_q)$ that captures how the maximum number of steps required to evaluate an application of list-append scales with the size of its input.

To analyze the running time of list-append, we examine its body which is an if expression. The predicate expression applies the null? procedure with is constant time since the effort required to determine if a list is null does not depend on the length of the list. When the predicate expression evaluates to true, the alternate expression is just q, which can also be evaluated in constant time.

Next, we consider the alternate expression. It includes a recursive application of list-append. Hence, the running time of the alternate expression is the time required to evaluate the recursive application plus the time required to evaluate everything else in the expression. The other expressions to evaluate are applications of cons, car, and cdr, all of which is are constant time procedures.

So, we can defined the total running time recursively as:

$$ \mathrm{RunningTimeL}(n_p, n_q) = C + \mathrm{RunningTimeL}(n_p - 1, n_q) $$

where C is some constant that reflects the time for all the operations besides the recursive call. Note that the value of $n_q$ does not matter, so we simplify this to:

$$ \mathrm{RunningTimeL}(n_p) = C + \mathrm{RunningTimeL}(n_p - 1). $$

This does not yet provide a useful characterization of the running time of list-append though, since it is a circular definition. To make it a recursive definition, we need a base case. The base case for the running time definition is the same as the base case for the procedure: when the input is null. For the base case, the running time is constant:

$$ \mathrm{RunningTimeL}(0) = C_0 $$

where $C_0$ is some constant.

To better characterize the running time of list-append, we want a closed form solution. For a given input $n$, $\mathrm{RunningTime}(n)$ is $C + C + C + C + \ldots + C + C_0$ where there are $n-1$ of the $C$ terms in the sum. This simplifies to $(n-1)C + C_0 = nC - C + C_0 = nC + C_2$. We do not know what the values of $C$ and $C_2$ are, but within the asymptotic notations the constant values do not matter. The important property is that the running time scales linearly with the value of its input. Thus, the running time of list-append is in $\Theta(n_p)$ where $n_p$ is the number of elements in the first input.

Usually, we do not need to reason at quite this low a level. Instead, to analyze the running time of a recursive procedure it is enough to determine the amount of work involved in each recursive call (excluding the recursive application itself) and multiply this by the number of recursive calls. For this example, there are $n_p$ recursive calls since each call reduces the length of the p input by one until the base case is reached. Each call involves only constant-time procedures (other than the recursive application), so the amount of work involved in each call is constant. Hence, the running time is in $\Theta(n_p)$. Equivalently, the running time for the list-append procedure scales linearly with the length of the first input list.

Example 7.5: Length

Consider the list-length procedure from Example 5.1:

(define (list-length p) (if (null? p) 0 (+ 1 (list-length (cdr p)))))

This procedure makes one recursive application of list-length for each element in the input p. If the input has $n$ elements, there will be $n+1$ total applications of list-length to evaluate (one for each element, and one for the null). So, the total work is in $\Theta(n \cdot \textrm{work for each recursive application})$.

To determine the running time, we need to determine how much work is involved in each application. Evaluating an application of list-length involves evaluating its body, which is an if expression. To evaluate the if expression, the predicate expression, (null? p), must be evaluated first. This requires constant time since the null? procedure has constant running time (see Section 7.4.1. The consequent expression is the primitive expression, 0, which can be evaluated in constant time. The alternate expression, (+ 1 (list-length (cdr p))), includes the recursive call. There are $n+1$ total applications of list-length to evaluate, the total running time is $n+1$ times the work required for each application (other than the recursive application itself).

The remaining work is evaluating (cdr p) and evaluating the + application. The cdr procedure is constant time. Analyzing the running time of the + procedure application is more complicated.

Cost of Addition Since + is a built-in procedure, we need to think about how it might be implemented. Following the elementary school addition algorithm (from Section 6.2.3), we know we can add any two numbers by walking down the digits. The work required for each digit is constant; we just need to compute the corresponding result and carry bits using a simple formula or lookup table. The number of digits to add is the maximum number of digits in the two input numbers. Thus, if there are $b$ digits to add, the total work is in $\Theta(b)$. In the worst case, we need to look at all the digits in both numbers. In general, we cannot do asymptotically better than this, since adding two arbitrary numbers might require looking at all the digits in both numbers.

But, in the list-length procedure the + is used in a very limited way: one of the inputs is always 1. We might be able to add 1 to a number without looking at all the digits in the number. Recall the addition algorithm: we start at the rightmost (least significant) digit, add that digit, and continue with the carry. If one of the input numbers is 1, then once the carry is zero we know now of the more significant digits will need to change. In the worst case, adding one requires changing every digit in the other input. For example, (+ 99999 1) is 100000. In the best case (when the last digit is below 9), adding one requires only examining and changing one digit.

Figuring out the average case is more difficult, but necessary to get a good estimate of the running time of list-length. We assume the numbers are represented in binary, so instead of decimal digits we are counting bits (this is both simpler, and closer to how numbers are actually represented in the computer). Approximately half the time, the least significant bit is a 0, so we only need to examine one bit. When the last bit is not a 0, we need to examine the second least significant bit (the second bit from the right): if it is a 0 we are done; if it is a 1, we need to continue.

We always need to examine one bit, the least significant bit. Half the time we also need to examine the second least significant bit. Of those times, half the time we need to continue and examine the next least significant bit. This continues through the whole number. Thus, the expected number of bits we need to examine is,

$$ 1 + \frac{1}{2}\left(1 + \frac{1}{2} \left(1 + \frac{1}{2}\left(1 + \frac{1}{2}\left(1 + \ldots\right)\right)\right)\right) $$

where the number of terms is the number of bits in the input number, $b$. Simplifying the equation, we get:

$$ 1 + \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \frac{1}{16} + \ldots + \frac{1}{2^b} $$

No matter how large $b$ gets, this value is always less than $2$. So, on average, the number of bits to examine to add 1 is constant: it does not depend on the length of the input number. %Although adding two arbitrary values cannot be done in constant time, adding 1 to an arbitrary value can, on average, be done in constant time.

This result generalizes to addition where one of the inputs is any constant value. Adding any constant $C$ to a number $n$ is equivalent to adding one $C$ times. Since adding one is a constant time procedure, adding one $C$ times can also be done in constant time for any constant $C$.

Excluding the recursive application, the list-length application involves applications of two constant time procedures: cdr and adding one using +. Hence, the total time needed to evaluate one application of list-length, excluding the recursive application, is constant.

There are $n+1$ total applications of list-length to evaluate total, so the total running time is $c(n+1)$ where $c$ is the amount of time needed for each application. The set $\Theta(c(n+1))$ is identical to the set $\Theta(n)$, so the running time for the length procedure is in $\Theta(n)$ where $n$ is the length of the input list.

Example 7.6: Accessing List Elements

Consider the list-get-element procedure from Example 5.3:

(define (list-get-element p n)
   (if (= n 1)
       (car p)
       (list-get-element (cdr p) (- n 1))))

The procedure takes two inputs, a List and a Number selecting the element of the list to get. Since there are two inputs, we need to think carefully about the input size. We can use variables to represent the size of each input, for example $s_p$ and $s_n$ for the size of $p$ and $n$ respectively. In this case, however, only the size of the first input really matters.

The procedure body is an if expression. The predicate uses the built-in = procedure to compare $n$ to 1. The worst case running time of the = procedure is linear in the size of the input: it potentially needs to look at all bits in the input numbers to determine if they are equal. Similarly to +, however, if one of the inputs is a constant, the comparison can be done in constant time. To compare a number of any size to 1, it is enough to look at a few bits. If the least significant bit of the input number is not a 1, we know the result is false. If it is a 1, we need to examine a few other bits of the input number to determine if its value is different from 1 (the exact number of bits depends on the details of how numbers are represented). So, the = comparison can be done in constant time.

If the predicate is true, the base case applies the car procedure, which has constant running time. The alternate expression involves the recursive calls, as well as evaluating (cdr p), which requires constant time, and (- n 1). The - procedure is similar to +: for arbitrary inputs, its worst case running time is linear in the input size, but when one of the inputs is a constant the running time is constant. This follows from a similar argument to the one we used for the + procedure (Exercise 7.13 asks for a more detailed analysis of the running time of subtraction). So, the work required for each recursive call is constant.

The number of recursive calls is determined by the value of $n$ and the number of elements in the list $p$. In the best case, when $n$ is 1, there are no recursive calls and the running time is constant since the procedure only needs to examine the first element. Each recursive call reduces the value passed in as $n$ by 1, so the number of recursive calls scales linearly with $n$ (the actual number is $n - 1$ since the base case is when $n$ equals 1). But, there is a limit on the value of $n$ for which this is true. If the value passed in as $n$ exceeds the number of elements in $p$, the procedure will produce an error when it attempts to evaluate (cdr p) for the empty list. This happens after $s_p$ recursive calls, where $s_p$ is the number of elements in $p$. Hence, the running time of list-get-element does not grow with the length of the input passed as $n$; after the value of $n$ exceeds the number of elements in $p$ it does not matter how much bigger it gets, the running time does not continue to increase.

Thus, the worst case running time of list-get-element grows linearly with the length of the input list. Equivalently, the running time of list-get-element is in $\Theta(s_p)$ where $s_p$ is the number of elements in the input list.

Exercise 7.10. Explain why the list-map procedure from Section 5.4.1 has running time that is linear in the size of its List input. Assume the procedure input has constant running time.

Exercise 7.11. Consider the list-sum procedure (from Example 5.2):

(define (list-sum p) (if (null? p) 0 (+ (car p) (list-sum (cdr p)))))

What assumptions are needed about the elements in the list for the running time to be linear in the number if elements in the input list?

Exercise 7.12. For the decimal six-digit odometer (shown in the picture below), we measure the amount of work to add one as the total number of wheel digit turns required. For example, going from 000000 to 000001 requires one work unit, but going from 000099 to 000100 requires three work units.

a. What are the worst case inputs?
b. What are the best case inputs?
c. $\left[\star \right]$ On average, how many work units are required for each mile? Assume over the lifetime of the odometer, the car travels 1,000,000 miles.
d. Lever voting machines were used by the majority of American voters in the 1960s, although they are not widely used today. Most level machines used a three-digit odometer to tally votes. Explain why candidates ended up with 99 votes on a machine far more often than 98 or 100 on these machines.

Exercise 7.13. $\left[\star \right]$ The list-get-element argued by comparison to +, that the - procedure has constant running time when one of the inputs is a constant. Develop a more convincing argument why this is true by analyzing the worst case and average case inputs for -.

Exercise 7.14. $\left[\star \right]$ Our analysis of the work required to add one to a number argued that it could be done in constant time. Test experimentally if the DrRacket + procedure actually satisfies this property. Note that one + application is too quick to measure well using the time procedure, so you will need to design a procedure that applies + many times without doing much other work.