It is widely agreed that Anatolian and Tocharian are the most divergent Indo-European languages and the conventional view is to attribute this to a greater time depth of divergence from Proto-Indo-European.
The divergences are, in my humble opinion, not primarily due to time depth.
(Note that this post expands upon a comment to this post at Gene Expression).
General Considerations In Historical Linguistics and Language Evolution
The naive mutational variation accumulation over time model of language divergence greatly overestimates that importance of that component of language change, which is actually much slower, and ignores the central role played by language contact. See, e.g., the overview here.
One example of that is Icelandic, which was until very recent times when telecommunications and air travel became available, the closest of the Germanic languages to Old Norse (which is basically proto-Germanic), mostly because it had less contact with other languages due to its isolation at the frontier. See, e.g., here,
Another example is that phonetically, the Appalachian accent is the closest modern dialect of English to the Elizabethan English of Shakespeare, again, due to low levels of contact with other dialects of English.
Likewise, the New Zealand accent was until recently more conservative of 19th century British dialect than modern British English, while adopting some Maori substrate words for concepts it didn't have words for and being influenced by contact with the Australian dialect.
Low population sizes also reduce mutational change in all of these case.
Also, language divergence actually tends to be punctuated:
We used vocabulary data from three of the world's major language groups—Bantu, Indo-European, and Austronesian—to show that 10 to 33% of the overall vocabulary differences among these languages arose from rapid bursts of change associated with language-splitting events. Our findings identify a general tendency for increased rates of linguistic evolution in fledgling languages, perhaps arising from a linguistic founder effect or a desire to establish a distinct social identity.
The divergence between Old English and Middle English, for example, is largely due to the singular impact of French Norman influence on the language after the Norman Conquest of England, in the common case of language change due to emulation of elite dialects (one of the most common sources of homogenization of language in a region).
Language replacement scenarios also usually involve strong substrate influences (e.g. the quirks of the South Asian dialects of English) especially for words with no superstrate language counterpart like local botany words.
It is also often the case that simplifications of language structure due to mass language learner effects. But see this paper reviewing this hypothesis critically.
The differences in American English from British English, in contrast, reflect another common punctuated influence, where a community of people deliberately exaggerate local dialect differences in order to create shibboleths that expose outsiders and to distinguish themselves culturally from a community that they are alienated from.
Language contact usually has mostly lexical impact (i.e. loan words), but also can give rise to other areal and contact language features (like the sentence closing term “lah” in Malaysian and Singaporian dialects derived from Arabic traders), and sometimes place names (e.g. Punic place names in Britain and Ireland).
Distinct Indo-European Substrates
The other key point is that in almost all of the Indo-European language family’s European ranges, hunter-gatherer languages were extinct or all but extinct, and the substrate first farmer languages shared a descent from the language family of Western Anatolian farmers (probably in two main subfamilies, one for Linear Pottery Farmers in the Danubian basis and point north, and the other for the Cardial Pottery Farmers of the Mediterranean coast). See also here and here.
As societies lacking metal and horses, these Neolithic first farmers of Europe also had fairly low population density (even though it was 100x that of terrestrial hunter-gatherers), so due to low population density and frontier status, the amount of divergence between the first European Neolithic farmers and the struggling farmer societies a couple of thousand years later when Indo-Europeans filled a vacuum was probably modest.
This shared substrate over so many Indo-European subfamilies no doubt hides the extent of Anatolian Neolithic language family substrate influence in them. See, e.g., here (reviewing Bronze Age outlines of Indo-European expansion in Europe). But not all Indo-European language families shared this substrate.
Why Are Tocharian Languages Divergent?
Tocharian is divergent because it is the purest of the descendants of Indo-European, because they had virtually no substrate influence or language contact, were on a frontier, and weren’t a particularly large language community despite fairly high population density, because it was geographically constrained to a handful of towns, rather than being divergent due to its great antiquity.
Notably, J.P. Mallory, one of the leading Tocharian scholars, came around in about 2012 to the view that the Tarim basin civilization isn’t all that old, based upon archaeological evidence stating:
[T]here is really no serious evidence for arable agriculture (domestic cereals) east of the Dnieper until after c. 2000 BCE (see also Ryabogina & Ivanov 2011; Mallory, in press:a). This means that there is also no evidence for domestic cereals in the Asiatic steppe until the Late Bronze Age (Andronovo etc). From the perspective of the Pontic-Caspian model, the ancestors of the Indo-Iranians and Tokharians should not cross the Ural before c. 2000 BCE at the very earliest. Hypotheses linking the Tokharians to earlier eastward steppe expansions associated with the Afaasievo or Okunevo cultures of the Yenisei or Altai (Mallory and Mair 2000) become very difficult if not impossible to sustain (as long as there is no evidence of arable agriculture in these cultures) as Tokharian retains elements of the Indo-European agricultural vocabulary.
– J. P. Mallory, “Twenty-first century clouds over Indo-European homelands” (Conference Presentation in Moscow, September 12, 2012).
Mallory made the case in a 2011 talk that R1b was a Tocharian genetic signature based upon West Eurasian Y-DNA haplogroups found in Uyghur populations that were direct successors to and brought about the fall of the Tocharians during a period of Turkic expansion. There is also R1b in Iron Age ancient DNA east of the Tarim basin from what appears to be a related West Eurasian culture. But, ancient Tarim mummy DNA from ca. 1800 BCE, analyzed in 2009 showed uniformly R1a1a Y-DNA haplogroups (citing Li, Chunxiang, et al., “Evidence that a West-East admixed population lived in the Tarim Basin as early as the early Bronze Age” BMC Biology (February 17, 2010)).
Not exactly on point but relevant is that Tocharian cemeteries contained ephedra, a commonly hypothesized botanical extra drug to be the active ingredient in Indo-Iranian Soma/Homa. Tocharian culture also has primitive antecedents of physical culture attributes (e.g. basic Tartan weaves) typically associated with Celtic culture.
The Case For A Young Origin For Anatolian Languages
In the case of the Anatolian languages, in contrast, strong contact with a highly divergent substrate from the Anatolian Farmer substrate of Europe is what explains its divergence.
In the early metal ages, the Hattic language and civilization (probably derived from metal using civilizations of the Caucasus mountains and Zargos mountains in a language family that may have also included Hurrian and is probably modest strongly related to one or more of the modern Caucasian languages), spread across Anatolia replacing the Anatolian farmer language. There is suggestive evidence that the Minoan language was also from the same language family (e.g. the phonetic structure of the two languages, recorded in the Minoan case by Eto-Cretian inscriptions and Egyptian phonetic records of Minoan incantations).
Documentary and archaeological evidence, however, suggest that the Hittites occupied only a few towns in a sea of Hattic people ca. 1800 BCE, before their dramatic expansion, roughly contemporaneous with the appearance and expansion of the Mycenaean Greeks (the first Aegean people to speak Indo-European languages), and while there are many Anatolian languages attested, all but a couple have the relationship of the Romance language to Latin with Hittite, and the couple of earlier ones are not attested significantly earlier than the Hittite language. Iron use and cremation were important litmus tests of Anatolian Indo-Europeans that are corroborated with documentary evidence and archaeological evidence in the post-1800 BCE time period. I review some of that evidence here. See also here.
It also isn’t clear how much of the Anatolian languages were elite imitation driven (compare Hungarian ca. 1000 CE which results in language shift without much demic impact), and how much was due to population replacement/introgression.
The frequency of R1b in modern Anatolian samples suggests a significant demic component, but is complicated by the multilayered palimpsest of periods in Anatolian history and prehistory including Hellenic sourced migration into Anatolia, and a period of Iranian steppe migration into the Levant, at least partially through Anatolia (See also here).
So, how do the Anatolian languages grow so divergent?
The grow divergent because the Hattic substrate in which they were immersed (to the extent that it influenced their choice of proper names and that Hattic remained a liturgical language in the Hittite empire centuries after it ceased to be used in daily life, like liturgical relicts of ancient Hebrew, ancient Latin, ancient Sumerian, and Coptic), was profoundly different from that of the Anatolian Neolithic farmer substrate in Europe or the Harappan substrate in Sanskrit (which may be shared by all Indo-Iranian languages, as BMAC was in the Harappan sphere of influence and both Harappan and BMAC languages may be derived at great time depth from the Caucasian Neolithic first farmer substrate).
Copper age/early Bronze Age Hattic culture and language (like Copper Age/early Bronze Age Harappan culture) also had more staying power and influence than stone age European Neolithic culture because more advanced civilizations had more populations density and couldn’t just be trampled into oblivion by Indo-European successors.
My suspicion is that pre-Indo-European conquest acquisition of metallurgy also was important in allowing Basque and related Vasconic languages to survive Indo-European obliteration. (See also here and here and here and here and here).
We don’t have enough Hittite, other Anatolian language speaker, and Hattic ancient DNA to confirm an appearance of steppe ancestry around 1800 BCE and its absence before then in Anatolia, but we also have no ancient or modern DNA evidence that isn’t a good fit to that hypothesis.
A Footnote On Armenian
In the same vein, Armenian is hard to classify because it has mixed influences for different neighboring Indo-European language families, with Greek influence competing, for example, with Slavic and Indo-Iranian language influence. Armenian is attested in writing only in the mid- to late Iron Age, so a whole stew of diverse Indo-European linguistic influences is hard to parse out.