1.2.1 Information

Informally, we use information to mean knowledge. But to understand information quantitatively, as something we can measure, we need a more precise way to think about information.

The way computer scientists measure information is based on how what is known changes as a result of obtaining the information. The primary unit of information is a bit. One bit of information halves the amount of uncertainty. It is equivalent to answering a “yes” or “no” question, where either answer is equally likely beforehand. Before learning the answer, there were two possibilities; after learning the answer, there is one.

We call a question with two possible answers a binary question. Since a bit can have two possible values, we often represent the values as 0 and 1.

For example, suppose we perform a fair coin toss but do not reveal the result. Half of the time, the coin will land “heads”, and the other half of the time the coin will land “tails”. Without knowing any more information, our chances of guessing the correct answer are $\frac{1}{2}$. One bit of information would be enough to convey either “heads” or “tails”; we can use 0 to represent “heads” and 1 to represent “tails”. So, the amount of information in a coin toss is one bit.

Similarly, one bit can distinguish between the values 0 and 1:

Example 1.1: Dice

How many bits of information are there in the outcome of tossing a six-sided die?

There are six equally likely possible outcomes, so without any more information we have a one in six chance of guessing the correct value. One bit is not enough to identify the actual number, since one bit can only distinguish between two values. We could use five binary questions like this:

This is quite inefficient, though, since we need up to five questions to identify the value (and on average, expect to need $3\frac{1}{3}$ questions). Can we identify the value with fewer than 5 questions?

Our goal is to identify questions where the “yes” and “no” answers are equally likely—that way, each answer provides the most information possible. This is not the case if we start with, “Is the value 6?”, since that answer is expected to be “yes” only one time in six. Instead, we should start with a question like, “Is the value at least 4?”. Here, we expect the answer to be “yes” one half of the time, and the “yes” and “no” answers are equally likely. If the answer is “yes”, we know the result is 4, 5, or 6. With two more bits, we can distinguish between these three values (note that two bits is actually enough to distinguish among four different values, so some information is wasted here). Similarly, if the answer to the first question is no, we know the result is 1, 2, or 3. We need two more bits to distinguish which of the three values it is. Thus, with three bits, we can distinguish all six possible outcomes.

Three bits can convey more information that just six possible outcomes, however. In the binary question tree, there are some questions where the answer is not equally likely to be “yes” and “no” (for example, we expect the answer to “Is the value 3?” to be “yes” only one out of three times). Hence, we are not obtaining a full bit of information with each question.

Each bit doubles the number of possibilities we can distinguish, so with three bits we can distinguish between $2 * 2 * 2 = 8$ possibilities. In general, with $n$ bits, we can distinguish between $2^ n$ possibilities. Conversely, distinguishing among $k$ possible values requires $\log _2 k$ bits. The *logarithm* is defined such that if $a=b^ c$ then $\log _ b a = c$. Since each bit has two possibilities, we use the logarithm base 2 to determine the number of bits needed to distinguish among a set of distinct possibilities. For our six-sided die, $\log _2 6 \approx 2.58$, so we need approximately 2.58 binary questions. But, questions are discrete: we can’t ask $0.58$ of a question, so we need to use three binary questions.

Trees.Figure 1.1 depicts a structure of binary questions for distinguishing among eight values. We call this structure a binary tree. We will see many useful applications of tree-like structures in this book.

Computer scientists draw trees upside down. The root is the top of the tree, and the leaves are the numbers at the bottom (0, 1, 2, $\ldots $, 7). There is a unique path from the root of the tree to each leaf. Thus, we can describe each of the eight possible values using the answers to the questions down the tree. For example, if the answers are “No”, “No”, and “No”, we reach the leaf 0; if the answers are “Yes”, “No”, “Yes”, we reach the leaf 5. Since there are no more than two possible answers for each node, we call this a binary tree.

We can describe any non-negative integer using bits in this way, by just adding additional levels to the tree. For example, if we wanted to distinguish between 16 possible numbers, we would add a new question, “Is is $ > =$ 8?” to the top of the tree. If the answer is “No”, we use the tree in Figure 1.1 to distinguish numbers between 0 and 7. If the answer is “Yes”, we use a tree similar to the one in Figure 1.1, but add 8 to each of the numbers in the questions and the leaves.

The depth of a tree is the length of the longest path from the root to any leaf. The example tree has depth three. A binary tree of depth $d$ can distinguish up to $2^ d$ different values.

Figure 1.1: Using three bits to distinguish eight possible values.

Units of Information. One byte is defined as eight bits. Hence, one byte of information corresponds to eight binary questions, and can distinguish among $2^8$ (256) different values. For larger amounts of information, we use metric prefixes, but instead of scaling by factors of 1000 they scale by factors of $2^{10}$ (1024). Hence, one kilobyte is 1024 bytes; one megabyte is $2^{20}$ (approximately one million) bytes; one gigabyte is $2^{30}$ (approximately one billion) bytes; and one terabyte is $2^{40}$ (approximately one trillion) bytes.

Exercise 1.1.  Draw a binary tree with the minimum possible depth to:

  1. Distinguish among the numbers $0, 1, 2, \ldots , 15$.

  2. Distinguish among the 12 months of the year.

Exercise 1.2.  How many bits are needed:

  1. To uniquely identify any currently living human?

  2. To uniquely identify any human who ever lived?

  3. To identify any location on Earth within one square centimeter?

  4. To uniquely identify any atom in the observable universe?

Exercise 1.3.  The examples all use binary questions for which there are two possible answers. Suppose instead of basing our decisions on bits, we based it on trits where one trit can distinguish between three equally likely values. For each trit, we can ask a ternary question (a question with three possible answers).

  1. How many trits are needed to distinguish among eight possible values? (A convincing answer would show a ternary tree with the questions and answers for each node, and argue why it is not possible to distinguish all the values with a tree of lesser depth.)

  2. $\left[\star \right]$Devise a general formula for converting between bits and trits. How many trits does it require to describe $b$ bits of information?

Exploration 1.1: Guessing Numbers

The guess-a-number game starts with one player (the chooser) picking a number between 1 and 100 (inclusive) and secretly writing it down. The other player (the guesser) attempts to guess the number. After each guess, the chooser responds with “correct” (the guesser guessed the number and the game is over), “higher” (the actual number is higher than the guess), or “lower” (the actual number is lower than the guess).

  1. Explain why the guesser can receive slightly more than one bit of information for each response.

  2. Assuming the chooser picks the number randomly (that is, all values between 1 and 100 are equally likely), what are the best first guesses? Explain why these guesses are better than any other guess. (Hint: there are two equally good first guesses.)

  3. What is the maximum number of guesses the second player should need to always find the number?

  4. What is the average number of guesses needed (assuming the chooser picks the number randomly as before)?

  5. $\left[\star \right]$Suppose instead of picking randomly, the chooser picks the number with the goal of maximizing the number of guesses the second player will need. What number should she pick?

  6. $\left[\star \star \right]$How should the guesser adjust her strategy if she knows the chooser is picking adversarially?

  7. $\left[\star \star \right]$What are the best strategies for both players in the adversarial guess-a-number game where chooser’s goal is to pick a starting number that maximizes the number of guesses the guesser needs, and the guesser’s goal is to guess the number using as few guesses as possible.

Exploration 1.2: Twenty Questions

The two-player game twenty questions starts with the first player (the answerer) thinking of an object, and declaring if the object is an animal, vegetable, or mineral (meant to include all non-living things). After this, the second player (the questioner), asks binary questions to try and guess the object the first player thought of. The first player answers each question “yes” or “no”. The website http://www.20q.net/ offers a web-based twenty questions game where a human acts as the answerer and the computer as the questioner. The game is also sold as a $10 stand-alone toy (shown in the picture).

20Q Game Image from ThinkGeek

  1. How many different objects can be distinguished by a perfect questioner for the standard twenty questions game?

  2. What does it mean for the questioner to play perfectly?

  3. Try playing the 20Q game at http://www.20q.net. Did it guess your item?

  4. Instead of just “yes” and “no”, the 20Q game offers four different answers: “Yes”, “No”, “Sometimes”, and “Unknown”. (The website version of the game also has “Probably”, “Irrelevant”, and “Doubtful”.) If all four answers were equally likely (and meaningful), how many items could be distinguished in 20 questions?

  5. For an Animal, the first question 20Q sometimes asks is “Does it jump?” (20Q randomly selected from a few different first questions). Is this a good first question?

  6. $\left[\star \right]$How many items do you think 20Q has data for?

  7. $\left[\star \star \right]$Speculate on how 20Q could build up its database.