11.1.4 Control Statements

Python provides many built-in data types. We describe three of the most useful data types here: lists, strings, and dictionaries.

Lists. Python provides a list datatype similar to lists in Scheme, except instead of building lists from simpler parts (that is, using cons pairs in Scheme), the Python list type is a built-in datatype. The other important difference is that Python lists are mutable like mlist from Section 9.3.

Lists are denoted in Python using square brackets. For example, [] denotes an empty list and [1, 2] denotes a list containing two elements. The elements in a list can be of any type (including other lists).

Elements can be selected from a list using the list subscript expression:

A subscript expression evaluates to the element indexed by value of the inner expression from the list. For example,

The expression p[0] in Python is analogous to (car p) in Scheme.

The subscript expression has constant running time; unlike indexing Scheme lists, the time required does not depend on the length of the list even if the selection index is the end of the list. The reason for this is that Python stores lists internally differently from how Scheme stores as chains of pairs. The elements of a Python list are stored as a block in memory, so the location of the $k^{th}$ element can be calculated directly by adding $k$ times the size of one element to the location of the start of the list.

A subscript expression can also select a range of elements from the list:

Subscript expressions with ranges evaluate to a list containing the elements between the low bound and the high bound. If the low bound is missing, the low bound is the beginning of the list. If the high bound is missing, the high bound is the end of the list. For example,

The expression p[1:] in Python is analogous to (cdr p) in Scheme.

Python lists are mutable (the value of a list can change after it is created). We can use list subscripts as the targets for an assignment expression:

Assignments using ranges as targets can add elements to the list as well as changing the values of existing elements:

In the tokenize procedure, we use tokens = [] to initialize tokens to an empty list, and use tokens.append(current) to append an element to the tokens list. The Python append procedure is similar to the mlist-append! procedure (except it works on the empty list, where there is no way in Scheme to modify the null input list).

Strings. The other datatype used in tokenize is the string datatype, named str in Python. As in Scheme, a String is a sequence of characters. Unlike Scheme strings which are mutable, the Python str datatype is immutable. Once a string is created its value cannot change. This means all the string methods that seem to change the string values actually return new strings (for example, capitalize() returns a copy of the string with its first letter capitalized).

Strings can be enclosed in single quotes (e.g., 'hello'), double quotes (e.g., ''hello''), and triple-double quotes (e.g., '' '' ''hello'' '' ''; a string inside triple quotes can span multiple lines). In our example program, we use the assignment expression, current = ' ' (two single quotes), to initialize the value of current to the empty string. The input, s, is a string object.

The addition operator can be used to concatenate two strings. In tokenize, we use current = current + c to update the value of current to include a new character. Since strings are immutable there is no string method analogous to the list append method. Instead, appending a character to a string involves creating a new string object.

Dictionaries. A dictionary is a lookup-table where values are associated with keys. The keys can be any immutable type (strings and numbers are commonly used as keys); the values can be of any type. We did not use the dictionary type in tokenize, but it is very useful for implementing frames in the evaluator.

A dictionary is denoted using curly brackets. The empty dictionary is {}. We add a key-value pair to the dictionary using an assignment where the left side is a subscript expression that specifies the key and the right side is the value assigned to that key. For example,

birthyear = {}
birthyear['Euclid'] = '300BC'
birthyear['Ada'] = 1815
birthyear['Alan Turing'] = 1912
birthyear['Alan Kay'] = 1940

defines birthyear as a dictionary containing four entries. The keys are all strings; the values are numbers, except for Euclid's entry which is a string.

We can obtain the value associated with a key in the dictionary using a subscript expression. For example, birthyear['Alan Turing'] evaluates to 1912. We can replace the value associated with a key using the same syntax as adding a key-value pair to the dictionary. The statement,

birthyear['Euclid'] = -300

replaces the value of birthyear['Euclid'] with the number -300.

The dictionary type also provides a method has_key that takes one input and produces a Boolean indicating if the dictionary object contains the input value as a key. For the birthyear dictionary,

The dictionary type lookup and update operations have approximately constant running time: the time it takes to lookup the value associated with a key does not scale as the size of the dictionary increases. This is done by computing a number based on the key that determines where the associated value would be stored (if that key is in the dictionary). The number is used to index into a structure similar to a Python list (so it has constant time to retrieve any element). Mapping keys to appropriate numbers to avoid many keys mapping to the same location in the list is a difficult problem, but one the Python dictionary object does well for most sets of keys.