Eurogenes reports progress in understanding Kalash genetics (from his own analysis of public datasets including newly available ancient genomes).
A couple of years ago Moorjani et al. concluded that present-day Georgians of the Transcaucasus were the best available proxy for the ancient West Eurasian population that mixed into the South Asian gene pool. This was a solid statistical fit. . . . But it was also a big fat coincidence . . . .
Thus, the Indo-Iranian and hence Indo-European speaking Kalash no longer looks very similar to the Kartvelian speaking Georgian. In fact, [the Kartvelian speaking Georgian] appears to be most closely related to the supposedly Indo-European speaking Afanasievo and Yamnaya nomads of the Early Bronze Age Eurasian steppe. The rest of his ancestry is probably best described as South Central Asian, which is an unknown quantity to me at this stage, but probably in large part of indigenous South Asian origin (see here).
I'm only able to show this thanks to the ancient samples that are on the tree, for which, as far as I know, there aren't any useful substitutes among present-day populations. Obviously, Moorjani et al. didn't have this luxury, so they ended up with a model that was statistically sound, but didn't make much sense otherwise, especially in terms of linguistics.
My . . . model is easily reproducible with most of the other South Asian samples from the Human Origins, and it gels nicely with uniparental marker data too. For instance . . . not only do Pathans cluster among these ancients from the Eurasian steppe, but most of them also carry the same Y-chromosome haplogroup: R1a-Z93, which is derived from R1a-M417, and in all likelihood first expanded in a big way with the Proto-Indo-Iranians of the Trans-Ural steppe.In another his own posts, linked above in the block quote, the key conclusion is that:
One of the toughest nuts to crack in population genetics has proved to be the story of the people of the Hindu Kush. However, using Treemix and ancient genomes from the recent Allentoft et al. and Haak et al. papers, I'm seeing most of the Kalash and Pathan individuals from the HGDP modeled as ~65% Late Neolithic/Early Bronze Age (LN/EBA) European and ~35% Central Asian. . . . [T]he Kalash and Pathans come out ~65% LNE/EBA European (which includes substantial Caucasus or Caucasus-related ancestry), ~12% ASI, and ~23% something as yet undefined. If I had to guess, I'd say the mystery ~23% was Neolithic admixture from what is now Iran. But ancient DNA has thrown plenty of curve balls at us already, so that's a low confidence prediction, even though it does make good sense.Kalash Y-DNA is about 45% West Eurasian, and the percentage of Kalash mtDNA that is potentially West Eurasian in character (a somewhat less definitive geographic attribution than for the available Y-DNA data) is about 43%. The absence of a strong gender imbalance in uniparental markers is notable. Also, LN/EBA Europeans, themselves, aren't necessarily purely European and have a significant indigenous steppe component. So those percentages aren't necessarily inconsistent and given the small effective size of the Kalash population, genetic drift and founder effects are also to be expected.
It has long been recognized that the Kalash may look like a high level branch of the population genetic history of non-African modern humans, when in fact, they are merely an admixed population that has been isolated and inbred for a sufficiently long time to look like something unique. (The alternative view that the Kalash were isolate for 11,800 years that was expressed in Qasim Ayub, et al. (2015) is a completely implausible interpretation of the genetic data that they examined in what was otherwise a useful paper.) But, this analysis is starting to finally establish precisely what is happened to form this genetically distinctive people of the Hindu Kush with more than guess work.
This puts the oldest possible point of Kalash ethnogenesis at about 4500 years ago, a few centuries before the earliest archaeological evidence (Cemetery H), of Indo-European appearance in South Asia, but after the replacement of the Afanasievo culture with a genetically distinct successor culture in the Central Asian steppe around 2500 BCE to 2000 BCE. The Yamna culture is contemporaneous with it and more or less contiguous to the west of the Afanasievo culture.
The collapse of these cultures in favor of Y-DNA genetically distinct cultures that are otherwise quite similar to their north (who are the proto-Indo-Iranians) as the northern cultures penetrate South Asia appears to be another remarkable untold story of prehistory. The timing, however, strongly suggests that the 4.2 kiloyear climate event was almost surely an important cause of this sudden upset.
The boundaries on the possible youth of Kalash ethnogenesis aren't quite as specific, but the fact that their Dardic language is very basal within the Indo-Aryan languages (or alternately, its status as a fourth basal branch of Indo-Iranian) suggests that the earliest possible date is an appropriate place to expect to find Kalash ethnogenesis. Asko Parpola suggests in a 1999 scholarly anthology that the Dardic languages broke off from proto-Rig Vedic Sanskrit around 1700 BCE based upon Rig Vedic linguistic features found in Dardic languages and absent in other Indo-Aryan languages.
While Eurogenes understates the point a bit, I will underline it:
The long standing hypothesis that the Y-DNA R1b dominated Afanasievo and Yamnaya peoples were linguistically Indo-European is increasingly ill supported. The Afanasievo and Yamnaya peoples had closer ties to their Caucasian neighbors than their definitely linguistically Indo-European neighbors to the North of them.
This also tends to support my hypothesis that heavily Y-DNA R1b people of Western Europe were probably part of the same Vasconic language family as the modern Basque until around the time of Bronze Age collapse, when there was a mass language shift to Germanic, Celtic and Italic language, respectively, in Western Europe, with only a fairly modest population genetic impact.
And, it also supports the argument that the Vasconic languages are distant relatives (at a time depth of about 4000-5000 years ago) of languages spoken in the highlands of the Caucasus Mountains, Iran and/or Anatolia, perhaps with a strong Atlantic Megalithic linguistic substrate, whose closest surviving relatives are one or more of the modern languages of the Caucasus mountains.
I remain agnostic regarding which of those languages are the closest relative and it could be that the proto-Vasconic languages were a sister language family to all of them.