5.1 Types

All data in a program has an associated type. Internally, all data is stored just as a sequence of bits, so the type of the data is important to understand what it means. We have seen several different types of data already: Numbers, Booleans, and Procedures (we use initial capital letters to signify a datatype).

A datatype defines a set (often infinite) of possible values. The Boolean datatype contains the two Boolean values, true and false. The Number type includes the infinite set of all whole numbers (it also includes negative numbers and rational numbers). We think of the set of possible Numbers as infinite, even though on any particular computer there is some limit to the amount of memory available, and hence, some largest number that can be represented. On any real computer, the number of possible values of any data type is always finite. But, we can imagine a computer large enough to represent any given number.

The type of a value determines what can be done with it. For example, a Number can be used as one of the inputs to the primitive procedures +, *, and =. A Boolean can be used as the first subexpression of an if expression and as the input to the not procedure (not can also take a Number as its input, but for all Number value inputs the output is false), but cannot be used as the input to +, *, or =.1

A Procedure can be the first subexpression in an application expression. There are infinitely many different types of Procedures, since the type of a Procedure depends on its input and output types. For example, recall bigger procedure from Chapter 3:

bigger (define (bigger a b) (if ( > a b) a b))

It takes two Numbers as input and produces a Number as output. We denote this type as:

Number $\times $ Number $\rightarrow $ Number

The inputs to the procedure are shown on the left side of the arrow. The type of each input is shown in order, separated by the $\times $ symbol.2 The output type is given on the right side of the arrow.

From its definition, it is clear that the bigger procedure takes two inputs from its parameter list. How do we know the inputs must be Numbers and the output is a Number?

The body of the bigger procedure is an if expression with the predicate expression (> a b). This applies the > primitive procedure to the two inputs. The type of the > procedure is Number $\times $ Number $\rightarrow $ Boolean. So, for the predicate expression to be valid, its inputs must both be Numbers. This means the input values to bigger must both be Numbers. We know the output of the bigger procedure will be a Number by analyzing the consequent and alternate subexpressions: each evaluates to one of the input values, which must be a Number.

Starting with the primitive Boolean, Number, and Procedure types, we can build arbitrarily complex datatypes. This chapter introduces mechanisms for building complex datatypes by combining the primitive datatypes.

Exercise 5.1 Describe the type of each of these expressions.

  1. 17
  2.  (lambda (a) (> a 0))
  3.  ((lambda (a) (> a 0)) 3)
  4.  (lambda (a) (lambda (b) ( > a b)))
  5.  (lambda (a) a)

Exercise 5.2 Define or identify a procedure that has the given type.

  1. Number $\times $ Number $\rightarrow $ Boolean
  2. Number $\rightarrow $ Number
  3. (Number $\rightarrow $ Number) $\times $ (Number $\rightarrow $ Number) $\rightarrow $ (Number $\rightarrow $ Number)
  4. Number $\rightarrow $ (Number $\rightarrow $ (Number $\rightarrow $ Number))

  1. The primitive procedure equal? is a more general comparison procedure that can take as inputs any two values, so could be used to compare Boolean values. For example, (equal? false false) evaluates to true and (equal? true 3) is a valid expression that evaluates to false

  2. The notation using $\times $ to separate input types makes sense if you think about the number of different inputs to a procedure. For example, consider a procedure that takes two Boolean values as inputs, so its type is Boolean $\times $ Boolean $\rightarrow $ Value. Each Boolean input can be one of two possible values. If we combined both inputs into one input, there would be $2 \times 2$ different values needed to represent all possible inputs.