# 1.2.1 Information

Informally, we use *information* to mean knowledge. But to understand
information quantitatively, as something we can measure, we need a more
precise way to think about information.

The way computer scientists measure information is based on how what is
known changes as a result of obtaining the information. The primary unit
of information is a *bit*. *One bit* of information *halves* the amount
of uncertainty. It is equivalent to answering a “yes” or “no” question,
where either answer is equally likely beforehand. Before learning the
answer, there were two possibilities; after learning the answer, there
is one.

We call a question with two possible answers a *binary question*. Since
a bit can have two possible values, we often represent the values as
**0** and 1.

For example, suppose we perform a fair coin toss but do not reveal the
result. Half of the time, the coin will land “heads”, and the other half
of the time the coin will land “tails”. Without knowing any more
information, our chances of guessing the correct answer are
$\frac{1}{2}$. One bit of information would be enough to convey
either “heads” or “tails”; we can use **0** to represent “heads” and 1
to represent “tails”. So, the amount of information in a coin toss is
one bit.

Similarly, one bit can distinguish between the values 0 and 1:

How many bits of information are there in the outcome of tossing a six-sided die?

There are six equally likely possible outcomes, so without any more information we have a one in six chance of guessing the correct value. One bit is not enough to identify the actual number, since one bit can only distinguish between two values. We could use five binary questions like this:

This is quite inefficient, though, since we need up to five questions to identify the value (and on average, expect to need $3\frac{1}{3}$ questions). Can we identify the value with fewer than 5 questions?

Our goal is to identify questions where the “yes” and “no” answers are
equally likely—that way, each answer provides the most information
possible. This is not the case if we start with, “Is the value 6?”,
since that answer is expected to be “yes” only one time in six. Instead,
we should start with a question like, “Is the value at least 4?”. Here,
we expect the answer to be “yes” one half of the time, and the “yes” and
“no” answers are equally likely. If the answer is “yes”, we know the
result is 4, 5, or 6. With two more bits, we can distinguish between
these three values (note that two bits is actually enough to distinguish
among *four* different values, so some information is wasted here).
Similarly, if the answer to the first question is no, we know the result
is 1, 2, or 3. We need two more bits to distinguish which of the three
values it is. Thus, with three bits, we can distinguish all six possible
outcomes.

Three bits can convey more information that just six possible outcomes, however. In the binary question tree, there are some questions where the answer is not equally likely to be “yes” and “no” (for example, we expect the answer to “Is the value 3?” to be “yes” only one out of three times). Hence, we are not obtaining a full bit of information with each question.

Each bit doubles the number of possibilities we can distinguish, so with three bits we can distinguish between $2 * 2 * 2 = 8$ possibilities. In general, with $n$ bits, we can distinguish between $2^ n$ possibilities. Conversely, distinguishing among $k$ possible values requires $\log _2 k$ bits. The *logarithm* is defined such that if $a=b^ c$ then $\log _ b a = c$. Since each bit has two possibilities, we use the logarithm base 2 to determine the number of bits needed to distinguish among a set of distinct possibilities. For our six-sided die, $\log _2 6 \approx 2.58$, so we need approximately 2.58 binary questions. But, questions are discrete: we can’t ask $0.58$ of a question, so we need to use three binary questions.

**Trees.**Figure 1.1 depicts a
structure of binary questions for distinguishing among eight values. We
call this structure a *binary tree*. We will see many useful
applications of tree-like structures in this book.

Computer scientists draw trees upside down. The *root* is the top of the
tree, and the *leaves* are the numbers at the bottom (0, 1, 2, $\ldots
$, 7). There is a unique path from the root of the tree to each leaf.
Thus, we can describe each of the eight possible values using the
answers to the questions down the tree. For example, if the answers are
“No”, “No”, and “No”, we reach the leaf 0; if the answers are “Yes”,
“No”, “Yes”, we reach the leaf 5. Since there are no more than two
possible answers for each node, we call this a *binary* tree.

We can describe any non-negative integer using bits in this way, by just adding additional levels to the tree. For example, if we wanted to distinguish between 16 possible numbers, we would add a new question, “Is is $ > =$ 8?” to the top of the tree. If the answer is “No”, we use the tree in Figure 1.1 to distinguish numbers between 0 and 7. If the answer is “Yes”, we use a tree similar to the one in Figure 1.1, but add 8 to each of the numbers in the questions and the leaves.

The *depth* of a tree is the length of the longest path from the root to
any leaf. The example tree has depth three. A binary tree of depth $d$
can distinguish up to $2^ d$ different values.

**Units of Information.** One *byte* is defined as eight bits. Hence,
one byte of information corresponds to eight binary questions, and can
distinguish among $2^8$ (256) different values. For larger amounts of
information, we use metric prefixes, but instead of scaling by factors
of 1000 they scale by factors of $2^{10}$ (1024). Hence, one
*kilobyte* is 1024 bytes; one *megabyte* is $2^{20}$ (approximately
one million) bytes; one *gigabyte* is $2^{30}$ (approximately one
billion) bytes; and one *terabyte* is $2^{40}$ (approximately one
trillion) bytes.

**Exercise 1.1. ** Draw a binary tree with the minimum possible depth
to:

Distinguish among the numbers $0, 1, 2, \ldots , 15$.

Distinguish among the 12 months of the year.

**Exercise 1.2. ** How many bits are needed:

To uniquely identify any currently living human?

To uniquely identify any human who ever lived?

To identify any location on Earth within one square centimeter?

To uniquely identify any atom in the observable universe?

**Exercise 1.3. ** The examples all use binary questions for which there
are two possible answers. Suppose instead of basing our decisions on
bits, we based it on *trits* where one trit can distinguish between
three equally likely values. For each trit, we can ask a ternary
question (a question with three possible answers).

How many trits are needed to distinguish among eight possible values? (A convincing answer would show a ternary tree with the questions and answers for each node, and argue why it is not possible to distinguish all the values with a tree of lesser depth.)

$\left[\star \right]$Devise a general formula for converting between bits and trits. How many trits does it require to describe $b$ bits of information?

The guess-a-number game starts with one player (the *chooser*) picking a
number between 1 and 100 (inclusive) and secretly writing it down. The
other player (the *guesser*) attempts to guess the number. After each
guess, the chooser responds with “correct” (the guesser guessed the
number and the game is over), “higher” (the actual number is higher than
the guess), or “lower” (the actual number is lower than the guess).

Explain why the guesser can receive slightly more than one bit of information for each response.

Assuming the chooser picks the number randomly (that is, all values between 1 and 100 are equally likely), what are the best first guesses? Explain why these guesses are better than any other guess. (Hint: there are two equally good first guesses.)

What is the maximum number of guesses the second player should need to always find the number?

What is the average number of guesses needed (assuming the chooser picks the number randomly as before)?

$\left[\star \right]$Suppose instead of picking randomly, the chooser picks the number with the goal of maximizing the number of guesses the second player will need. What number should she pick?

$\left[\star \star \right]$How should the guesser adjust her strategy if she knows the chooser is picking adversarially?

$\left[\star \star \right]$What are the best strategies for both players in the adversarial guess-a-number game where chooser’s goal is to pick a starting number that maximizes the number of guesses the guesser needs, and the guesser’s goal is to guess the number using as few guesses as possible.

The two-player game *twenty questions* starts with the first player (the
*answerer*) thinking of an object, and declaring if the object is an
animal, vegetable, or mineral (meant to include all non-living things).
After this, the second player (the *questioner*), asks binary questions
to try and guess the object the first player thought of. The first
player answers each question “yes” or “no”. The website http://www.20q.net/ offers a web-based twenty questions game where a
human acts as the answerer and the computer as the questioner. The game
is also sold as a $10 stand-alone toy (shown in the picture).

How many different objects can be distinguished by a perfect questioner for the standard twenty questions game?

What does it mean for the questioner to play perfectly?

Try playing the 20Q game at http://www.20q.net. Did it guess your item?

Instead of just “yes” and “no”, the 20Q game offers four different answers: “Yes”, “No”, “Sometimes”, and “Unknown”. (The website version of the game also has “Probably”, “Irrelevant”, and “Doubtful”.) If all four answers were equally likely (and meaningful), how many items could be distinguished in 20 questions?

For an Animal, the first question 20Q sometimes asks is “Does it jump?” (20Q randomly selected from a few different first questions). Is this a good first question?

$\left[\star \right]$How many items do you think 20Q has data for?

$\left[\star \star \right]$Speculate on how 20Q could build up its database.