Headline for an NPR story by Laura Santhanam on February 25th:

Linguists link English, Hindi to single ancestor language spoken 6,500 years ago

And the beginning of the story:

Linguists have traced the roots of English, Hindi, Greek and all Indo-European languages to a common ancestor tongue first spoken on the Russian steppes as much as 6,500 years ago

The headline seems to be claiming that the newsworthy event is the discovery of a single ancestor language for English and Hindi and adds the information that this language was spoken 6,500 years ago. But the reconstruction of this ancestor language, Proto-Indo-European (PIE), is news from roughly 200 years ago. What’s current news is the claim that we now have solid evidence about where and when PIE was spoken; the first sentence of the story begins to re-frame the story, by treating the concept of the Indo-European languages as a given and highlighting the where and when.

The problem for the journalists here is that readers cannot be expected to be familiar with the concepts of the IE languages and of PIE (in the way that readers can be expected to be familiar with, say, the concept of DNA). One of the great intellectual achievements of linguistics has not made it far into public consciousness.

(Hat tip to Ann Burlingham for the NPR link.)

There are actually three stories here, and PIE is the middle story.

The tale of PIE. People had long noticed similarities between languages, particularly beween their stocks of words, but these could be attributed to sound symbolism or, most obviously, to borrowing. But once it had been appreciated that languages change in time (other than by borrowing), another possibility — descent from a common source — had to be taken seriously. (Still another possibility, accidental resemblance, was not fully appreciated for some time.) The problem was then to determine which similarities were attributable to common descent and, for the ones that were, what the shared ancestor was like.

In particular, in the 18th century, scholars looked at the languages of Europe and to the east (all the way to India), noting similarities that didn’t seem attributable to other causes, and they posited an Indo-European family of languages, with a common ancestor (which, in the words of Sir William Jones, “perhaps, no longer exists”).

We then see a substantial scholarly industry, devoted to questions like these:

(a) which languages are IE (an enormous number), and which not? (not: Basque, Finnish, Estonian, Hungarian, Turkish, Hebrew, Arabic, and many more)

(b) how to distinguish similarities due to common descent and those due to other causes?

(c) how to infer the features of the proto-language, in particular the pronunciation and meaning of words in the proto-language?

(d) how to distinguish subgroups of the languages, infer their features, and date their divergence from the others?

Answering such questions is a truly enormous task, requiring knowledge of large bodies of texts, ways of interpreting those texts, detailed information about the social and cultural contexts in which these languages were spoken, and more. At the center of this work is question (c), and the primary tool there is the method of comparative reconstruction, aka the comparative method (CM), according to which sets of cognates (those that have been argued to be related by common descent rather than from some other cause) are compared, with the object of inferring the most likely source for this array of words.

Models for the CM. The CM didn’t develop out of the air; instead, it uses forms of reasoning developed for other purposes, in the study of textual descent. The problem of textual descent is faced by philologists who are confronted with a set of variants of some text, generated by scribes who made copies of the text (this in the days before printing presses, photocopying, and other means of producing multiple copies) and so inadvertently introduced changes in the text; the problem then is inferring how the set of extant texts could have developed. Can one of them be argued to be the original, true, text? Or is the original not in the set of extant texts — that is, no longer exists (think of Sir William Jones) — and has to be inferred from the texts we have? In general, how to draw up a family tree for the texts, similar to the family trees that comparative linguists began drawing up in the 19th century?

Beyond linguistics. The striking successes of the CM in linguistics served as a model for other forms of scholarship where reasoning is needed to infer historical relationships and to posit earlier (currently unattested) states, in particular, in evolutionary biology. The CM in linguistics was an inspiration to Darwin — and so we get reconstructed organisms (not now extant), evolutionary family trees, and so on.

