# 2.2 Language construction

To define more expressive infinite languages, we need a richer system for constructing new surface forms and associated meanings. We need ways to describe languages that allow us to define an infinitely large set of surface forms and meanings with a compact notation. The approach we use is to define a language by defining a set of rules that produce exactly the set of surface forms in the language.

Components of Language.A language is composed of:

• primitives — the smallest units of meaning.

• means of combination — rules for building new language elements by combining simpler ones.

The primitives are the smallest meaningful units (in natural languages these are known as morphemes). A primitive cannot be broken into smaller parts whose meanings can be combined to produce the meaning of the unit. The means of combination are rules for building words from primitives, and for building phrases and sentences from words.

Since we have rules for producing new words not all words are primitives. For example, we can create a new word by adding anti- in front of an existing word. The meaning of the new word can be inferred as “against the meaning of the original word”. Rules like this one mean anyone can invent a new word, and use it in communication in ways that will probably be understood by listeners who have never heard the word before.

For example, the verb freeze means to pass from a liquid state to a solid state; antifreeze is a substance designed to prevent freezing. English speakers who know the meaning of freeze and anti- could roughly guess the meaning of antifreeze even if they have never heard the word before.1

Primitives are the smallest units of meaning, not based on the surface forms. Both anti and freeze are primitive; they cannot be broken into smaller parts with meaning. We can break anti- into two syllables, or four letters, but those sub-components do not have meanings that could be combined to produce the meaning of the primitive.

Means of Abstraction.In addition to primitives and means of combination, powerful languages have an additional type of component that enables economic communication: means of abstraction. means of abstraction

Means of abstraction allow us to give a simple name to a complex entity. In English, the means of abstraction are pronouns like “she”, “it”, and “they”. The meaning of a pronoun depends on the context in which it is used. It abstracts a complex meaning with a simple word. For example, the it in the previous sentence abstracts “the meaning of a pronoun”, but the it in the sentence before that one abstracts “a pronoun”.

In natural languages, there are a limited number of means of abstraction. English, in particular, has a very limited set of pronouns for abstracting people. It has she and he for abstracting a female or male person, respectively, but no gender-neutral pronouns for abstracting a person of either sex. The interpretation of what a pronoun abstract in natural languages is often confusing. For example, it is unclear what the it in this sentence refers to. Languages for programming computers need means of abstraction that are both powerful and unambiguous.

Exercise 2.1.  According to the Guinness Book of World Records, the longest word in the English language is floccinaucinihilipilification, meaning “The act or habit of describing or regarding something as worthless”. This word was reputedly invented by a non-hip­po­poto­mon­stro­se­squipe­da­li­o­pho­bic student at Eton who combined four words in his Latin textbook. Prove Guinness wrong by identifying a longer English word. An English speaker (familiar with flocci­nauci­ni­hil­ipili­fi­ca­tion and the morphemes you use) should be able to deduce the meaning of your word.

Exercise 2.2.  Merriam-Webster’s word for the year for 2006 was truthiness, a word invented and popularized by Stephen Colbert. Its definition is, “truth that comes from the gut, not books”. Identify the morphemes that are used to build truthiness, and explain, based on its composition, what truthiness should mean.

Exercise 2.3.  According to the Oxford English Dictionary, Thomas Jefferson is the first person to use more than 60 words in the dictionary. Jeffersonian words include: (a) authentication, (b) belittle, (c) indecipherable, (d) inheritability, (e) odometer, (f) sanction, (g) vomit-grass, and (h) shag. For each Jeffersonian word, guess its derivation and explain whether or not its meaning could be inferred from its components.

Exercise 2.4.  Embiggening your vocabulary with anticromulent words ecdysiasts can grok.

1. Invent a new English word by combining common morphemes.

2. Get someone else to use the word you invented.

3. $\left[\star \star \right]$Convince Merriam-Webster to add your word to their dictionary.

Dictionaries are but the depositories of words already legitimated by usage. Society is the workshop in which new ones are elaborated. When an individual uses a new word, if ill formed, it is rejected; if well formed, adopted, and after due time, laid up in the depository of dictionaries. Thomas Jefferson, letter to John Adams, 1820

1. Guessing that it is a verb meaning to pass from the solid to liquid state would also be reasonable. This shows how imprecise and ambiguous natural languages are; for programming computers, we need the meanings of constructs to be clearly determined.