2.1 Surface forms and meanings

A language is a set of surface forms and meanings, and a mapping between the surface forms and their associated meanings. In the earliest human languages, the surface forms were sounds but surface forms can be anything that can be perceived by the communicating parties such as drum beats, hand gestures, or pictures.

A natural language is a language spoken by humans, such as English or Swahili. Natural languages are very complex since they have evolved over many thousands years of individual and cultural interaction. We focus on designed languages that are created by humans for some a specific purpose such as for expressing procedures to be executed by computers.

We focus on languages where the surface forms are text. In a textual language, the surface forms are linear sequences of characters. A string is a sequence of zero or more characters. Each character is a symbol drawn from a finite set known as an alphabet. For English, the alphabet is the set ${ a, b, c, \ldots , z } $ (for the full language, capital letters, numerals, and punctuation symbols are also needed).

A simple communication system can be described using a table of surface forms and their associated meanings. For example, this table describes a communication system between traffic lights and drivers:

 Surface Form   Meaning
Green Go
Yellow Caution
Red Stop

Communication systems involving humans are notoriously imprecise and subjective. A driver and a police officer may disagree on the actual meaning of the Yellow symbol, and may even disagree on which symbol is being transmitted by the traffic light at a particular time. Communication systems for computers demand precision: we want to know what our programs will do, so it is important that every step they make is understood precisely and unambiguously.

The method of defining a communication system by listing a table of

[ < \textit{Symbol}, \textit{Meaning} > ]

pairs can work adequately only for trivial communication systems. The number of possible meanings that can be expressed is limited by the number of entries in the table. It is impossible to express any new meaning since all meanings must already be listed in the table!

Languages and Infinity. A useful language must be able to express infinitely many different meanings. Hence, there must be a way to generate new surface forms and guess their meanings (see Exercise 2.1). No finite representation, such as a printed table, can contain all the surface forms and meanings in an infinite language. One way to generate infinitely large sets is to use repeating patterns. For example, most humans would interpret the notation: “1, 2, 3, …” as the set of all natural numbers. We interpret the “…” as meaning keep doing the same thing for ever. In this case, it means keep adding one to the preceding number. Thus, with only a few numbers and symbols we can describe a set containing infinitely many numbers. As discussed in Section 1.2.1, the language of the natural numbers is enough to encode all meanings in any countable set. But, finding a sensible mapping between most meanings and numbers is nearly impossible. The surface forms do not correspond closely enough to the ideas we want to express to be a useful language.