5.6 Data abstraction

The mechanisms we have for constructing and manipulating complex data structures are valuable because they enable us to think about programs closer to the level of the problem we are solving than the low level of how data is stored and manipulated in the computer. Our goal is to hide unnecessary details about how data is represented so we can focus on the important aspects of what the data means and what we need to do with it to solve our problem. The technique of hiding how data is represented from how it is used is known as data abstraction.

The datatypes we have seen so far are not very abstract. We have datatypes for representing Pairs, triples, and Lists, but we want datatypes for representing objects closer to the level of the problem we want to solve. A good data abstraction is abstract enough to be used without worrying about details like which cell of the Pair contains which datum and how to access the different elements of a List. Instead, we want to define procedures with meaningful names that manipulate the relevant parts of our data.

The rest of this section is an extended example that illustrates how to solve problems by first identifying the objects we need to model the problem, and then implementing data abstractions that represent those objects. Once the appropriate data abstractions are designed and implemented, the solution to the problem often follows readily. This example also uses many of the list procedures defined earlier in this chapter.

Exploration 5.2: Pegboard Puzzle

For this exploration, we develop a program to solve the infamous pegboard puzzle, often found tormenting unsuspecting diners at pancake restaurants. The standard puzzle is a one-player game played on a triangular board with fifteen holes with pegs in all of the holes except one.

The goal is to remove all but one of the pegs by jumping pegs over one another. A peg may jump over an adjacent peg only when there is a free hole on the other side of the peg. The jumped peg is removed. The game ends when there are no possible moves. If there is only one peg remaining, the player wins (according to the Cracker Barrel version of the game, “Leave only one—you’re genius”). If more than one peg remains, the player loses (“Leave four or more’n you’re just plain ‘eg-no-ra-moose’.”).

Figure 5.1: Pegboard Puzzle. The blue peg can jump the red peg as shown, removing the red peg. The resulting position is a winning position.

Our goal is to develop a program that finds a winning solution to the pegboard game from any winnable starting position. We use a brute force approach: try all possible moves until we find one that works. Brute force solutions only work on small-size problems. Because they have to try all possibilities they are often too slow for solving large problems, even on the most powerful computers imaginable.1

The first thing to think about to solve a complex problem is what datatypes we need. We want datatypes that represent the things we need to model in our problem solution. For the pegboard game, we need to model the board with its pegs. We also need to model actions in the game like a move (jumping over a peg). The important thing about a datatype is what you can do with it. To design our board datatype we need to think about what we want to do with a board. In the physical pegboard game, the board holds the pegs. The important property we need to observe about the board is which holes on the board contain pegs. For this, we need a way of identifying board positions. We define a datatype for representing positions first, then a datatype for representing moves, and a datatype for representing the board. Finally, we use these datatypes to define a procedure that finds a winning solution.

Position. We identify the board positions using row and column numbers:

(1 1)
(2 1) (2 2)
(3 1) (3 2) (3 3)
(4 1) (4 2) (4 3) (4 4)
(5 1) (5 2) (5 3) (5 4) (5 5)

A position has a row and a column, so we could just use a Pair to represent a position. This would work, but we prefer to have a more abstract datatype so we can think about a position’s row and column, rather than thinking that a position is a Pair and using the car and cdr procedures to extract the row and column from the position.

Our Position datatype should provide at least these operations:

  1. make-position: Number $\times $ Number $\rightarrow $ Position :  
    Creates a Position representing the row and column given by the input numbers.

  2. position-get-row: Position $\rightarrow $ Number :  
    Outputs the row number of the input Position.

  3. position-get-column: Position $\rightarrow $ Number :  
    Outputs the column number of the input Position.

Since the Position needs to keep track of two numbers, a natural way to implement the Position datatype is to use a Pair. A more defensive implementation of the Position datatype uses a tagged list. With a tagged list, the first element of the list is a tag denoting the datatype it represents. All operations check the tag is correct before proceeding. We can use any type to encode the list tag, but it is most convenient to use the built-in Symbol type. Symbol Symbols are a quote () followed by a sequence of characters. The important operation we can do with a Symbol, is test whether it is an exact match for another symbol using the eq? procedure.

We define the tagged list datatype, tlist, using the list-get-element procedure from Example 5.3:

(define (make-tlist tag p) (cons tag p))
(define (tlist-get-tag p) (car p))

(define (tlist-get-element tag p n)
        (if (eq? (tlist-get-tag p) tag)
                (list-get-element (cdr p) n)
                (error (format "Bad tag:  a (expected  a)"
                        (tlist-get-tag p) tag))))

The format procedure is a built-in procedure similar to the printf procedure described in Section 4.5.1. Instead of printing as a side effect, format produces a String. For example, (format "list:  a number:  a." (list 1 2 3) 123) evaluates to the String "list: (1 2 3) number: 123.".

This is an example of defensive programming.defensive programming Using our tagged lists, if we accidentally attempt to use a value that is not a Position as a position, we will get a clear error message instead of a hard-to-debug error (or worse, an unnoticed incorrect result).

Using the tagged list, we define the Position datatype as:

(define (make-position row col) (make-tlist ’Position (list row col))) (define (position-get-row posn) (tlist-get-element ’Position posn 1)) (define (position-get-column posn) (tlist-get-element ’Position posn 2))

Here are some example interactions with our Position datatype:

> (define pos (make-position 2 1))  
> pos  
(Position 2 1)  
> (get-position-row pos)  
2  
> (get-position-row (list 1 2))  

Bad tag: 1 (expected Position)Error since input is not a Position.

Move. A move involves three positions: where the jumping peg starts, the position of the peg that is jumped and removed, and the landing position. One possibility would be to represent a move as a list of the three positions. A better option is to observe that once any two of the positions are known, the third position is determined. For example, if we know the starting position and the landing position, we know the jumped peg is at the position between them. Hence, we could represent a jump using just the starting and landing positions.

Another possibility is to represent a jump by storing the starting Position and the direction. This is also enough to determine the jumped and landing positions. This approach avoids the difficulty of calculating jumped positions. To do it, we first design a Direction datatype for representing the possible move directions. Directions have two components: the change in the column (we use 1 for right and -1 for left), and the change in the row (1 for down and -1 for up).

We implement the Direction datatype using a tagged list similarly to how we defined Position:

(define (make-direction right down)
        (make-tlist ’Direction (list right down)))
(define (direction-get-horizontal dir) (tlist-get-element ’Direction dir 1)) (define (direction-get-vertical dir) (tlist-get-element ’Direction dir 2))

The Move datatype is defined using the starting position and the jump direction:

(define (make-move start direction)
        (make-tlist ’Move (list start direction)))
        (define (move-get-start move) (tlist-get-element ’Move move 1))
        (define (move-get-direction move) (tlist-get-element ’Move move 2))

We also define procedures for getting the jumped and landing positions of a move. The jumped position is the result of moving one step in the move direction from the starting position. So, it will be useful to define a procedure that takes a Position and a Direction as input, and outputs a Position that is one step in the input Direction from the input Position.

(define (direction-step pos dir)
        (make-position
                (+ (position-get-row pos) (direction-get-vertical dir))
                (+ (position-get-column pos) (direction-get-horizontal dir))))

Using direction-step we can implement procedure to get the middle and landing positions.

(define (move-get-jumped move)
        (direction-step (move-get-start move) (move-get-direction move)))
(define (move-get-landing move)
        (direction-step (move-get-jumped move) (move-get-direction move)))

Board. The board datatype represents the current state of the board. It keeps track of which holes in the board contain pegs, and provides operations that model adding and removing pegs from the board:

  1. make-board: Number $\rightarrow $ Board :  
    Outputs a board full of pegs with the input number of rows. (The standard physical board has 5 rows, but our datatype supports any number of rows.)

  2. board-rows: Board $\rightarrow $ Number :  
    Outputs the number of rows in the input board.

  3. board-valid-position?: Board $\times $ Position $\rightarrow $ Boolean  
    Outputs true if input Position corresponds to a position on the Board; otherwise, false.

  4. board-is-winning?: Board $\rightarrow $ Boolean  
    Outputs true if the Board represents a winning position (exactly one peg); otherwise, false.

  5. board-contains-peg?: Position $\rightarrow $ Boolean  
    Outputs true if the hole at the input Position contains a peg; otherwise, false.

  6. board-add-peg: Board $\times $ Position $\rightarrow$ Board  
    Output a Board containing all the pegs of the input Board and one additional peg at the input Position. If the input Board already has a peg at the input Position, produces an error.

  7. board-remove-peg: Board $\times $ Position $\rightarrow $ Board  
    Outputs a Board containing all the pegs of the input Board except for the peg at the input Position. If the input Board does not have a peg at the input Position, produces an error.

The procedures for adding and removing pegs change the state of the board to reflect moves in the game, but nothing we have seen so far, however, provides a means for changing the state of an existing object.2 So, instead of defining these operations to change the state of the board, they actually create a new board that is different from the input board by the one new peg. These procedures take a Board and Position as inputs, and produce as output a Board.

There are lots of different ways we could represent the Board. One possibility is to keep a List of the Positions of the pegs on the board. Another possibility is to keep a List of the Positions of the empty holes on the board. Yet another possibility is to keep a List of Lists, where each List corresponds to one row on the board. The elements in each of the Lists are Booleans representing whether or not there is a peg at that position. The good thing about data abstraction is we could pick any of these representations and change it to a different representation later (for example, if we needed a more efficient board implementation). As long as the procedures for implementing the Board are updated the work with the new representation, all the code that uses the board abstraction should continue to work correctly without any changes.

We choose the third option and represent a Board using a List of Lists where each element of the inner lists is a Boolean indicating whether or not the corresponding position contains a peg. So, make-board evaluates to a List of Lists, where each element of the List contains the row number of elements and all the inner elements are true (the initial board is completely full of pegs). First, we define a procedure make-list-of-constants that takes two inputs, a Number, n, and a Value, val. The output is a List of length n where each element has the value val.

(define (make-list-of-constants n val)
        (if (= n 0) null (cons val (make-list-of-constants (- n 1) val))))

To make the initial board, we use make-list-of-constants to make each row of the board. As usual, a recursive problem solving strategy works well: the simplest board is a board with zero rows (represented as the empty list); for each larger board, we add a row with the right number of elements.

The tricky part is putting the rows in order. This is similar to the problem we faced with intsto, and a similar solution using append-list works here:

(define (make-board rows)
        (if (= rows 0) null
                (list-append (make-board (- rows 1))
                        (list (make-list-of-constants rows true)))))

Evaluating (make-board 3) produces ((true) (true true) (true true true)).

The board-rows procedure takes a Board as input and outputs the number of rows on the board.

(define (board-rows board) (length board))

The board-valid-position? indicates if a Position is on the board. A position is valid if its row number is between 1 and the number of rows on the board, and its column numbers is between 1 and the row number.

(define (board-valid-position? board pos)
        (and ( > = (position-get-row pos) 1) ( > = (position-get-column pos) 1)
                 ( < = (position-get-row pos) (board-rows board))
                 ( < = (position-get-column pos) (position-get-row pos))))

We need a way to check if a Board represents a winning solution (that is, contains only one peg). We implement a more general procedure to count the number of pegs on a board first. Our board representation used true to represent a peg. To count the pegs, we first map the Boolean values used to represent pegs to 1 if there is a peg and 0 if there is no peg. Then, we use sum-list to count the number of pegs. Since the Board is a List of Lists, we first use list-flatten to put all the pegs in a single List.

(define (board-number-of-pegs board)
(list-sum
        (list-map (lambda (peg) (if peg 1 0)) (list-flatten board))))

A board is a winning board if it contains exactly one peg:

(define (board-is-winning? board)
        (= (board-number-of-pegs board) 1))

The board-contains-peg? procedure takes a Board and a Position as input, and outputs a Boolean indicating whether or not that Position contains a peg. To implement board-contains-peg? we need to find the appropriate row in our board representation, and then find the element in its list corresponding to the position’s column. The list-get-element procedure (from Example 5.3) does exactly what we need. Since our board is represented as a List of Lists, we need to use it twice: first to get the row, and then to select the column within that row:

(define (board-contains-peg? board pos)
        (list-get-element (list-get-element board (position-get-row pos))
                (position-get-column pos)))

Defining procedures for adding and removing pegs from the board is more complicated. Both of these procedures need to make a board with every row identical to the input board, except the row where the peg is added or removed. For that row, we need to replace the corresponding value. Hence, instead of defining separate procedures for adding and removing we first implement a more general board-replace-peg procedure that takes an extra parameter indicating whether a peg should be added or removed at the selected position.

First we consider the subproblem of replacing a peg in a row. The procedure row-replace-peg takes as input a List representing a row on the board and a Number indicating the column where the peg should be replaced. We can define row-replace-peg recursively: the base case is when the modified peg is at the beginning of the row (the column number is 1); in the recursive case, we copy the first element in the List, and replace the peg in the rest of the list. The third parameter indicates if we are adding or removing a peg. Since true values represent holes with pegs, a true value indicates that we are adding a peg and false means we are removing a peg.

(define (row-replace-peg pegs col val)
        (if (= col 1)
                (cons val (cdr pegs))
                (cons (car pegs) (row-replace-peg (cdr pegs) (- col 1) val))))

To replace the peg on the board, we use row-replace-peg to replace the peg on the appropriate row, and keep all the other rows the same.

(define (board-replace-peg board row col val)
        (if (= row 1)
                (cons (row-replace-peg (car board) col val) (cdr board))
                (cons (car board) (board-replace-peg (cdr board) (- row 1) col val))))

Both board-add-peg and board-remove-peg can be defined simply using board-remove-peg. They first check if the operation is valid (adding is valid only if the selected position does not contain a peg, removing is valid only if the selected position contains a peg), and then use board-replace-peg to produce the modified board:

(define (board-add-peg board pos)
        (if (board-contains-peg? board pos)
                (error (format "Board already contains peg at position:  a" pos))
                (board-replace-peg board (position-get-row pos)
                                                                 (position-get-column pos) true)))
(define (board-remove-peg board pos)
        (if (not (board-contains-peg? board pos))
                (error (format "Board does not contain peg at position:  a" pos))
                (board-replace-peg board (position-get-row pos)
                                                                 (position-get-column pos) false)))

We can now define a procedure that models making a move on a board. Making a move involves removing the jumped peg and moving the peg from the starting position to the landing position. Moving the peg is equivalent to removing the peg from the starting position and adding a peg to the landing position, so the procedures we defined for adding and removing pegs can be composed to model making a move. We add a peg landing position to the board that results from removing the pegs in the starting and jumped positions:

(define (board-execute-move board move)
(board-add-peg
        (board-remove-peg
                (board-remove-peg board (move-get-start move))
                (move-get-jumped move))
                (move-get-landing move)))

Finding Valid Moves. Now that we can model the board and simulate making jumps, we are ready to develop the solution. At each step, we try all valid moves on the board to see if any move leads to a winning position (that is, a position with only one peg remaining). So, we need a procedure that takes a Board as its input and outputs a List of all valid moves on the board. We break this down into the problem of producing a list of all conceivable moves (all moves in all directions from all starting positions on the board), filtering that list for moves that stay on the board, and then filtering the resulting list for moves that are legal (start at a position containing a peg, jump over a position containing a peg, and land in a position that is an empty hole).

First, we generate all conceivable moves by creating moves starting from each position on the board and moving in all possible move directions. We break this down further: the first problem is to produce a List of all positions on the board.

We can generate a list of all row numbers using the intsto procedure (from Example 5.8). To get a list of all the positions, we need to produce a list of the positions for each row. We do this by mapping each row to the corresponding list:

(define (all-positions-helper board)
        (list-map (lambda (row) (list-map (lambda (col) (make-position row col))
                                                                          (intsto row)))
        (intsto (board-rows board)))

This almost does what we need, except instead of producing one List containing all the positions, it produces a List of Lists for the positions in each row. The list-flatten procedure (from Example 5.10) produces a flat list containing all the positions.

(define (all-positions board)
        (list-flatten (all-positions-helper board)))

For each Position, we find all possible moves starting from that position. We can move in six possible directions on the board: left, right, up-left, up-right, down-left, and down-right.

(define all-directions
        (list
                (make-direction -1 0) (make-direction 1 0) ; left, right
                (make-direction -1 -1) (make-direction 0 -1) ; up-left, up-right
                (make-direction 0 1) (make-direction 1 1))) ; down-left, down-right

For each position on the board, we create a list of possible moves starting at that position and moving in each possible move directions. This produces a List of Lists, so we use list-flatten to flatten the output of the list-map application into a single List of Moves.

(define (all-conceivable-moves board)
        (list-flatten
                (list-map
                        (lambda(pos) (list-map (lambda (dir) (make-move pos dir))
                                                                all-directions))
(all-positions board))))

The output produced by all-conceivable-moves includes moves that fly off the board. We use the list-filter procedure to remove those moves, to get the list of moves that stay on the board:

(define (all-board-moves board)
        (list-filter
                (lambda (move) (board-valid-position? board (move-get-landing move)))
                (all-conceivable-moves board)))

Finally, we need to filter out the moves that are not legal moves. A legal move must start at a position that contains a peg, jump over a position that contains a peg, and land in an empty hole. We use list-filter similarly to how we kept only the moves that stay on the board:

(define (all-legal-moves board)
        (list-filter
                (lambda (move)
                        (and
                                (board-contains-peg? board (move-get-start move))
                                (board-contains-peg? board (move-get-jumped move)) (not
                                (board-contains-peg? board (move-get-landing move)))))
        (all-board-moves board)))

Winning the Game. Our goal is to find a sequence of moves that leads to a winning position, starting from the current board. If there is a winning sequence of moves, we can find it by trying all possible moves on the current board. Each of these moves leads to a new board. If the original board has a winning sequence of moves, at least one of the new boards has a winning sequence of moves. Hence, we can solve the puzzle by recursively trying all moves until finding a winning position.

(define (solve-pegboard board)
        (if (board-is-winning? board)
                null ; no moves needed to reach winning position
                (try-moves board (all-legal-moves board))))

If there is a sequence of moves that wins the game starting from the input Board, solve-pegboard outputs a List of Moves representing a winning sequence. This could be null, in the case where the input board is already a winning board. If there is no sequence of moves to win from the input board, solve-pegboard outputs false.

It remains to define the try-moves procedure. It takes a Board and a List of Moves as inputs. If there is a sequence of moves that starts with one of the input moves and leads to a winning position it outputs a List of Moves that wins; otherwise, it outputs false.

The base case is when there are no moves to try. When the input list is null it means there are no moves to try. We output false to mean this attempt did not lead to a winning board. Otherwise, we try the first move. If it leads to a winning position, try-moves should output the List of Moves that starts with the first move and is followed by the rest of the moves needed to solve the board resulting from taking the first move (that is, the result of solve-pegboard applied to the Board resulting from taking the first move). If the first move doesn’t lead to a winning board, it tries the rest of the moves by calling try-moves recursively.

(define (try-moves board moves)
        (if (null? moves)
                false ; didn’t find a winner
                (if (solve-pegboard (board-execute-move board (car moves)))
                        (cons (car moves)
                                (solve-pegboard (board-execute-move board (car moves))))
                        (try-moves board (cdr moves)))))

Evaluating (solve-pegboard (make-board 5)) produces false since there is no way to win starting from a completely full board. Evaluating (solve-pegboard (board-remove-peg (make-board 5) (make-position 1 1))) takes about three minutes to produce this sequence of moves for winning the game starting from a 5-row board with the top peg removed:

((Move (Position 3 1) (Direction 0 -1))
 (Move (Position 3 3) (Direction -1 0))
 (Move (Position 1 1) (Direction 1 1))
 (Move (Position 4 1) (Direction 0 -1))
 ... ; 8 moves elided
 (Move (Position 5 1) (Direction 1 1)))
  1. $\left[\star \right]$Change the implementation to use a different Board representation, such as keeping a list of the Positions of each hole on the board. Only the procedures with names starting with board- should need to change when the Board representation is changed. Compare your implementation to this one.

  2. $\left[\star \right]$The standard pegboard puzzle uses a triangular board, but there is no reason the board has to be a triangle. Define a more general pegboard puzzle solver that works for a board of any shape.

  3. $\left[\star \star \right]$The described implementation is very inefficient. It does lots of redundant computation. For example, all-possible-moves evaluates to the same value every time it is applied to a board with the same number of rows. It is wasteful to recompute this over and over again to solve a given board. See how much faster you can make the pegboard solver. Can you make it fast enough to solve the 5-row board in less than half the original time? Can you make it fast enough to solve a 7-row board?

 


  1. The generalized pegboard puzzle is an example of a class of problems known as NP-Complete. This means it is not known whether or not any solution exists that is substantially better than the brute force solution, but it would be extraordinarily surprising (and of momentous significance!) to find one. 

  2. We will introduce mechanisms for changing state in Chapter 9. Allowing state to change breaks the substitution model of evaluation.