Monday, March 15, 2021

Why Are Anatolian and Tocharian Languages The Most Divergent Indo-European Languages?

It is widely agreed that Anatolian and Tocharian are the most divergent Indo-European languages and the conventional view is to attribute this to a greater time depth of divergence from Proto-Indo-European.

The divergences are, in my humble opinion, not primarily due to time depth. 

(Note that this post expands upon a comment to this post at Gene Expression).

General Considerations In Historical Linguistics and Language Evolution

The naive mutational variation accumulation over time model of language divergence greatly overestimates that importance of that component of language change, which is actually much slower, and ignores the central role played by language contact.  See, e.g., the overview here.

One example of that is Icelandic, which was until very recent times when telecommunications and air travel became available, the closest of the Germanic languages to Old Norse (which is basically proto-Germanic), mostly because it had less contact with other languages due to its isolation at the frontier. See, e.g., here,

Another example is that phonetically, the Appalachian accent is the closest modern dialect of English to the Elizabethan English of Shakespeare, again, due to low levels of contact with other dialects of English. 

Likewise, the New Zealand accent was until recently more conservative of 19th century British dialect than modern British English, while adopting some Maori substrate words for concepts it didn't have words for and being influenced by contact with the Australian dialect.

Low population sizes also reduce mutational change in all of these case.

Also, language divergence actually tends to be punctuated:
We used vocabulary data from three of the world's major language groups—Bantu, Indo-European, and Austronesian—to show that 10 to 33% of the overall vocabulary differences among these languages arose from rapid bursts of change associated with language-splitting events. Our findings identify a general tendency for increased rates of linguistic evolution in fledgling languages, perhaps arising from a linguistic founder effect or a desire to establish a distinct social identity.
The divergence between Old English and Middle English, for example, is largely due to the singular impact of French Norman influence on the language after the Norman Conquest of England, in the common case of language change due to emulation of elite dialects (one of the most common sources of homogenization of language in a region). 

Language replacement scenarios also usually involve strong substrate influences (e.g. the quirks of the South Asian dialects of English) especially for words with no superstrate language counterpart like local botany words. 

It is also often the case that simplifications of language structure due to mass language learner effects. But see this paper reviewing this hypothesis critically.

The differences in American English from British English, in contrast, reflect another common punctuated influence, where a community of people deliberately exaggerate local dialect differences in order to create shibboleths that expose outsiders and to distinguish themselves culturally from a community that they are alienated from.

Language contact usually has mostly lexical impact (i.e. loan words), but also can give rise to other areal and contact language features (like the sentence closing term “lah” in Malaysian and Singaporian dialects derived from Arabic traders), and sometimes place names (e.g. Punic place names in Britain and Ireland).

Distinct Indo-European Substrates

The other key point is that in almost all of the Indo-European language family’s European ranges, hunter-gatherer languages were extinct or all but extinct, and the substrate first farmer languages shared a descent from the language family of Western Anatolian farmers (probably in two main subfamilies, one for Linear Pottery Farmers in the Danubian basis and point north, and the other for the Cardial Pottery Farmers of the Mediterranean coast). See also here and here.

As societies lacking metal and horses, these Neolithic first farmers of Europe also had fairly low population density (even though it was 100x that of terrestrial hunter-gatherers), so due to low population density and frontier status, the amount of divergence between the first European Neolithic farmers and the struggling farmer societies a couple of thousand years later when Indo-Europeans filled a vacuum was probably modest.

This shared substrate over so many Indo-European subfamilies no doubt hides the extent of Anatolian Neolithic language family substrate influence in them. See, e.g., here (reviewing Bronze Age outlines of Indo-European expansion in Europe). But not all Indo-European language families shared this substrate.

Why Are Tocharian Languages Divergent?

Tocharian is divergent because it is the purest of the descendants of Indo-European, because they had virtually no substrate influence or language contact, were on a frontier, and weren’t a particularly large language community despite fairly high population density, because it was geographically constrained to a handful of towns, rather than being divergent due to its great antiquity. 

Notably, J.P. Mallory, one of the leading Tocharian scholars, came around in about 2012 to the view that the Tarim basin civilization isn’t all that old, based upon archaeological evidence stating:
[T]here is really no serious evidence for arable agriculture (domestic cereals) east of the Dnieper until after c. 2000 BCE (see also Ryabogina & Ivanov 2011; Mallory, in press:a). This means that there is also no evidence for domestic cereals in the Asiatic steppe until the Late Bronze Age (Andronovo etc). From the perspective of the Pontic-Caspian model, the ancestors of the Indo-Iranians and Tokharians should not cross the Ural before c. 2000 BCE at the very earliest. Hypotheses linking the Tokharians to earlier eastward steppe expansions associated with the Afaasievo or Okunevo cultures of the Yenisei or Altai (Mallory and Mair 2000) become very difficult if not impossible to sustain (as long as there is no evidence of arable agriculture in these cultures) as Tokharian retains elements of the Indo-European agricultural vocabulary.
– J. P. Mallory, “Twenty-first century clouds over Indo-European homelands” (Conference Presentation in Moscow, September 12, 2012).

Mallory made the case in a 2011 talk that R1b was a Tocharian genetic signature based upon West Eurasian Y-DNA haplogroups found in Uyghur populations that were direct successors to and brought about the fall of the Tocharians during a period of Turkic expansion. There is also R1b in Iron Age ancient DNA east of the Tarim basin from what appears to be a related West Eurasian culture. But, ancient Tarim mummy DNA from ca. 1800 BCE, analyzed in 2009 showed uniformly R1a1a Y-DNA haplogroups (citing Li, Chunxiang, et al., “Evidence that a West-East admixed population lived in the Tarim Basin as early as the early Bronze Age” BMC Biology (February 17, 2010)).

Not exactly on point but relevant is that Tocharian cemeteries contained ephedra, a commonly hypothesized botanical extra drug to be the active ingredient in Indo-Iranian Soma/Homa. Tocharian culture also has primitive antecedents of physical culture attributes (e.g. basic Tartan weaves) typically associated with Celtic culture.

The Case For A Young Origin For Anatolian Languages

In the case of the Anatolian languages, in contrast, strong contact with a highly divergent substrate from the Anatolian Farmer substrate of Europe is what explains its divergence.

In the early metal ages, the Hattic language and civilization (probably derived from metal using civilizations of the Caucasus mountains and Zargos mountains in a language family that may have also included Hurrian and is probably modest strongly related to one or more of the modern Caucasian languages), spread across Anatolia replacing the Anatolian farmer language. There is suggestive evidence that the Minoan language was also from the same language family (e.g. the phonetic structure of the two languages, recorded in the Minoan case by Eto-Cretian inscriptions and Egyptian phonetic records of Minoan incantations).

Documentary and archaeological evidence, however, suggest that the Hittites occupied only a few towns in a sea of Hattic people ca. 1800 BCE, before their dramatic expansion, roughly contemporaneous with the appearance and expansion of the Mycenaean Greeks (the first Aegean people to speak Indo-European languages), and while there are many Anatolian languages attested, all but a couple have the relationship of the Romance language to Latin with Hittite, and the couple of earlier ones are not attested significantly earlier than the Hittite language. Iron use and cremation were important litmus tests of Anatolian Indo-Europeans that are corroborated with documentary evidence and archaeological evidence in the post-1800 BCE time period. I review some of that evidence here. See also here.

It also isn’t clear how much of the Anatolian languages were elite imitation driven (compare Hungarian ca. 1000 CE which results in language shift without much demic impact), and how much was due to population replacement/introgression. 

The frequency of R1b in modern Anatolian samples suggests a significant demic component, but is complicated by the multilayered palimpsest of periods in Anatolian history and prehistory including Hellenic sourced migration into Anatolia, and a period of Iranian steppe migration into the Levant, at least partially through Anatolia (See also here).

So, how do the Anatolian languages grow so divergent?

The grow divergent because the Hattic substrate in which they were immersed (to the extent that it influenced their choice of proper names and that Hattic remained a liturgical language in the Hittite empire centuries after it ceased to be used in daily life, like liturgical relicts of ancient Hebrew, ancient Latin, ancient Sumerian, and Coptic), was profoundly different from that of the Anatolian Neolithic farmer substrate in Europe or the Harappan substrate in Sanskrit (which may be shared by all Indo-Iranian languages, as BMAC was in the Harappan sphere of influence and both Harappan and BMAC languages may be derived at great time depth from the Caucasian Neolithic first farmer substrate). 

Copper age/early Bronze Age Hattic culture and language (like Copper Age/early Bronze Age Harappan culture) also had more staying power and influence than stone age European Neolithic culture because more advanced civilizations had more populations density and couldn’t just be trampled into oblivion by Indo-European successors. 

My suspicion is that pre-Indo-European conquest acquisition of metallurgy also was important in allowing Basque and related Vasconic languages to survive Indo-European obliteration. (See also here and here and here and here and here).

We don’t have enough Hittite, other Anatolian language speaker, and Hattic ancient DNA to confirm an appearance of steppe ancestry around 1800 BCE and its absence before then in Anatolia, but we also have no ancient or modern DNA evidence that isn’t a good fit to that hypothesis.

A Footnote On Armenian 

In the same vein, Armenian is hard to classify because it has mixed influences for different neighboring Indo-European language families, with Greek influence competing, for example, with Slavic and Indo-Iranian language influence. Armenian is attested in writing only in the mid- to late Iron Age, so a whole stew of diverse Indo-European linguistic influences is hard to parse out.

15 comments:

ryan said...

It seemed remarkable but believable that an early steppe group could have radiated out with some edge allowing it to out-compete and dominate so many cultures in so many areas.

But I had trouble understanding how this could happen over and over in different centuries and millennia. Repeatedly going back to the well and finding new advantages...? It didn't seem plausible.

This proposal, by compressing the time-scale and potentially linking several expansions previously considered independent to the same era and possibly the same underlying cause seems to provide needed parsimony.

Robert Hartmann said...

A enormously convincing interpretation of the facts we know. Respect!

andrew said...

Thanks Ryan and Robert.

Ryan said...

"both Harappan and BMAC languages may be derived at great time depth from the Caucasian Neolithic first farmer substrate"

IIRC there was a study that suggested the Caucasian ancestry in Harappa and BMAC pre-dates the Neolithic by maybe a factor of two.

andrew said...

@Ryan If you cn fid it, I'd love to see that paper.

Ryan said...

Found it (or at least the Eurogenes post about it).

https://eurogenes.blogspot.com/2019/09/on-surprising-genetic-origins-of_5.html

andrew said...

Thanks. I appreciate it.

andrew said...

Reading the post, it is a pretty powerful point.

We already knew that pre-Neolithic Levantine and Caucasian/Iranian hunter-gathers were as distinct as Europeans and Chinese people are from each other today. We also know that the very first Fertile Crescent Neolithic farmers arose in situ from the hunter-gathers of the Levant and the Caucasus/Iran with very little admixture at first.

This is true even though we know that the Fertile Crescent Neolithic package of domesticated plants and animals that was shared from the Levant through Anatolia and into Iran included some species with wild types native to the Levant, some with wild types native to Anatolia, and some species with wild types native to Iran. So, it follows that the Fertile Crescent Neolithic package was assembled through cultural diffusion and trade, rather than through colonization.

The European Neolithic picked up a couple of additional domesticates in the Balkans and some Balkan hunter-gather ancestry in a first wave of expansion by Western Anatolian first farmers that was antecedent to both Cardial and Linear Pottery waves of predominantly Anatolian farmer colonization of Europe with the Balkan expanded Fertile Crescent Neolithic package that didn't change much after that and very slight additional Western European hunter-gatherer admixture until the initial European Neolithic farmer package started to fail and there was mode Western European hunter-gatherer introgression.

The Egyptian Neolithic, tightly bound to the narrow Nile River basin, was an almost complete replacement of the local hunter-gatherer population which basically went extinct outside the desert almost immediately, and made only three major additions to the Fertile Crescent package, the donkey, cats, and a different sub-species of cow with some hybridization. We also have strong evidence to suggest that the remainder of the North African Neolithic, until you get to about Algeria and Morocco (which had Cardial derivation from Spain), and the spread of the first wave Neolithic to Sudan and Ethiopia and Chad and points South was also probably Levantine->Egyptian->beyond derived but primarily via herding as the Fertile Crescent horticulture crops don't work in the different climate of the Sahel and points south.

andrew said...


We also know that the first wave Neolithic people of the IVC received the Fertile Crescent Neolithic package virtually unchanged (but without the Balkan or Egyptian additions) at about the same time as the Egyptian and Balkan expansions of Levantine and Anatolian farmers respectively. So, it is pretty surprising that the IVC people apparently received this via cultural diffusion like the original Levantine and Anatolian and Caucasian/Iranian farmers did, rather than by colonization like the Balkans, Europeans and Egyptians did at about the same time. After all the IVC people got the whole horticulture farming package and not just herding like the Ethiopians and further Africans did, and it appears that the transition from hunter-gather to herder is a lot easier to accomplish mostly through cultural diffusion than it is to transition to farming which has happened far fewer times in history.

What was different? Did a shared Caucasian/Iranian hunter-gatherer language and cultural antecedent make it easier to communicate and share the Fertile Crescent package through exchange of information rather than colonization? Did the IVC hunter-gatherers have a sedentary fishing culture that could transition more easily to farming than a terrestrial hunter-gatherer culture could? If so, why did this not happen in the Nile basin?

The absence of colonization also strengthens the hypothesis that the Harappan language was a local hunter-gatherer language of the IVC not derived from Caucasian/Iranian first farmer languages, while European farmer languages probably all derived from West Anatolian farmer languages, and the Afro-Asiatic languages probably all derived from the first Levantine farmer language (even though Semitic is not basal in that language family and probably represent a reflux back from Northeast Africa). And, Iran itself, pre-Indo-European probably spoke a Caucasian/Iranian farmer language, with Caucasian languages, Hattic, Hurrian, Kassite, Elamite and Sumerian probably all derived remotely from that at a time depth too deep to establish with conventional linguistic methods.

Instead, Harappan was probably a genuine isolate with respect to the other Neolithic languages (and I've already discussed at length that it was probably not related at all closely to Dravidian languages either). So all bets are off on any Bayesian priors about it based upon any of those languages. This is consistent with Sumerian descriptions of Harappan as spoken in Harappan trading outposts there, which was utterly foreign to them.

andrew said...

One more piece of the puzzle is that we know that the Fertile Crescent package did not spread from the Harappan culture to Southern India for essentially the same reason that it didn't spread from Egypt to the Sahel. Southern India and the Sahel have basically the same climate, and not enough of the Fertile Crescent horticulture package is suited to that climate to be viable there.

The IVC receives the Fertile Crescent package about 6000 BCE. Unlike Mesopotamia, which was frequently divided into warring city states, by all indications the IVC was unified more or less continuously and free of war until the 4.2 kiloyear event (i.e. 2200 BCE ore later, with Indo-Aryans entering an already mostly collapsed Harappan region by about 1900 BCE) and expanded its influence into the Indus Periphery of BMAC and Eastern Iran in what the ancient DNA suggests is almost complete replacement/colonization judging by the nearly identical ancient DNA to the sole female pre-Indo-Aryan IVC woman.

The South Indian Neolithic, in contrast, only starts around 2500 BCE, about 3500 years after it does in the IVC, and this is made possible only by a large component of the Sahel Neolithic package that integrated domesticates of both West African and Esat African/Ethiopian wild types before exporting them to South India (with very little population genetic impact except possibly Y-DNA T magnified uniparentally while being diluted to almost nothing autosomally and in mtDNA). The South Indian Neolithic draws mostly on the Sahel Neolithic culturally and in crop/technology package although it borrows select Fertile Crescent Neolithic package domesticates that are now viable with the Sahel package included as a total package.

The Dravidian language almost surely emerges from one of the terrestrial hunter-gatherer languages of South India that is at the peak of the South Indian Neolithic but possibly with linguistic elite influence from the small number of elite Sahel transmitters of the package judging from proto-Dravidian's similarity to fringe Niger-Congo language that have some by not all of the features of core Niger-Congo languages (more like Swahili or the western fringe Niger Congo languages of Mali than the core languages). It probably borrows almost nothing linguistically from Harappan who had only very thin trade relations with South Indian hunter-gatherers at basically one outpost at their most southern extent for the previous few thousand years.

andrew said...

The only link of that narrative that makes me somewhat uncomfortable is explaining how the full Fertile Crescent Neolithic package was so rapidly transferred culturally to the IVC, while it was transmitted by colonization everywhere else.

And, colonization isn't just the the norm there. This is how it works in the North Chinese and South Chinese Neolithic revolutions based in Millet and Rice respectively until they expand into each other, and the North Chinese Neolithic basically assimilated the South Chinese one, while the partially mixed by South Chinese dominated rice Neolithic expands by land with Austro-Asiatic and other mainland Southeast Asian languages sources from South China all of the way to Northeast India, and by sea via Taiwan to island Southeast Asia, lowland Papuans, and Oceania. The Highland Papuan Neolithic doesn't expand or reach a metal age.

It is also what you see in the Americas, although somewhat more in fits and starts as the Meso-American Neolithic package is adapted to Peru, the Amazon and North America with differently bred crops and significant new additions.

There is lots of cases of reception of herding with only slight demic impact, but hunter-gathers receiving a full farming package that they didn't invent in part themselves from someone else is almost unprecedented.

Admittedly, the sample size is small. There weren't many independent Neolithic revolutions in the entire world. Another possible explanation could be that Mesopotamia and the Harappan region are so very similar to each other in climate and ecology that the transition was more effortless than it was anywhere else.

But the fact that this is such a stark outlier and the very significant genetic relationship between Caucsian/Iranian hunter-gathers and the farmers who arose from them in situ, also does make me question whether there is a methodological flaw of some kind that is getting the timing of the common ancestry of Harappans and Iranian first farmers wrong. Elite dominance with modest genetic impact could be another explanation and might also help explain why the IVC got off to such a unified start that endured for so long, while Mesopotamia didn't.

Ryan said...

Very good points (which maybe should be their own blog post?).

I wonder if the main difference was population density. But if that was the case, how did the Mesolithic Iranians replace the previous inhabitants so completely? Or were the ASI groups the intrusive ones?

Tom Bridgeland said...

Yes, very nice analysis. Thanks.

I have been hoping for a breakthrough in Harappan information for literally decades, since I was a kid in the 1970s and first read about them.

Mueue said...

Afrasian urheimat is even more weird. Egypt(4 languages),Libya(central position), Levant (first farming), Etiopia (divergence)

andrew said...

See here (the post is broader than the title): https://dispatchesfromturtleisland.blogspot.com/2012/05/nilo-saharan-homeland-proposed.html