Dienekes' Anthropology blog notes a recent paper by Moorjani, et al., estimating the date of admixture of genetically Ancestral North Indian (ANI) people and genetically Ancestral South Indian (ASI) people in South Asia. This post makes a conjecture about what kind of prehistoric narrative could have given rise to their data that makes more sense than the one provided by the authors.
ANI genes (which by definition tend to be more common in North India than in South India) are closer to those of other West Eurasians (e.g. in Iran and the Caucuses and Central Asia) than ASI genes. This makes total geographic sense. The easiest way for large numbers of people to migrate from the rest of West Eurasia to India is via Northwestern India. If people migrate overland, from West Eurasia to India they will get to Northern India before they get to Southern India. And, you would expect people who are geographically close to each other to be more genetically similar than people who are most distant from each other unless some geographic circumstance or remarkable and atypical folk migration led to a different result.
But, surprisingly, North Indian populations with higher levels of ANI admixture tend to show more recent dates of ANI-ASI admixture than those in South Asia. This doesn't make a lot of sense without an explanation. The first substantial ANI-ASI admixture almost certainly had to take place in North India before it did in South India, in the time period (ca. 2200 BCE to 100 CE), estimated by the authors if the data are all lumped together and a single admixture event is assumed.
What could give rise to this data?
A sensible explanation requires two things. First, an understanding of a quirk associated with the methodology they use in cases where there are multiple episodes of admixture between the same two populations that are separated greatly in time. Second, a historical narrative that could account for the data observed. This post provides each in turn below the break.
The study uses a methodology called linkage disequilbrium, which looks at how thoroughly mixed genes in a give person's genome from one source population are mixed with genes from another source population.
One a mother and father have a child, the child has roughly half of the father's genes and roughly half of the mother's genes and the manner in which this happens for the millions of genes that make up a genome is a nearly perfect match to a quite simple mathematical model in every kind of plant and animal that reproduces sexually.
In a child with one parent purely from one population and the other parent purely from the other, you see big chunks of ANI genes and big chunks of ASI genes. If the resulting mixed children start intermarrying randomly, the average size of the chunks of ANI or ASI genes respectively gets smaller to an extremely predictable extent.
In cases where there is a single admixture event between two populations (whose relative proportions need not be equal for the methodology to work), you can predict the average ANI and ASI gene chunk size any given number of generations later with great accuracy. So, if you see a population with a certain distribution of ANI and ASI gene chunks in the genomes of its members, you can very accurately determine how many generations ago the admixture event took place and you can know with great certainty the accuracy of your best guess estimate which is a matter of fairly straightforward statistical calculation.
These dates are precise enough to make meaningful distinctions between absolute dates of admixture that are more than several hundred years apart in the Holocene era that makes up the interesting parts of human history and prehistory.
Unlike estimates of historical population genetic evidence based upon gene mutation rates (particularly gene mutation rates associated in non-recombining Y-DNA lineages) for which the calibration rate of the number of mutations observed to absolute age is controversial with rates used varying by about a factor of three, and for which there are good reasons to doubt that a consistent mutation rate can be reliably applied to all parts of the genome, the only meaningful moving part involved in calibrating dates determined using linkage disequilbrium to absolute dates is the average generation length. And, there is wide consensus in the historical population genetics field that for modern humans in ancient and prehistoric periods that the average generation length was a quite stable twenty-nine years.
The downside of this methodology, however, is that linkage disequilbrium estimates are more problematic in cases where there are multiple distinct episodes of significant admixture between the same two genetically distinct populations. It is hard to distinguish the raw data that a multiple admixture episode history produces from a scenario in which there is a single wave of admixture that took place close in time to what was really the last of more than one episode of admixture. One big wave of admixture between two populations mostly erases linkage disequilbrium evidence of previous episodes of significant admixture between the same two populations.
Piecing Together A Historical Narrative That Makes Sense
The headline estimate of the inferred admixture date for the entire South Asian sample is a meaningless number when different sub-sample populations differ by more than a factor of two from 64 +/- 11 to 144 +/- 27 generations as they do in this study. Historical, archaeology, geographical realities and the patterns seen in the data itself, all strongly support the notion that admixture in South Asia was not a single event happening at a single time every place in the subcontinent.
But, the geographic distribution of the dates also isn't a good fit for a single admixture event that gradually expanded from a core area where admixture happened first to a gradually moving frontier where it happened later.
The only way to interpret the LD data in a way that makes any sense is to assume that there were at least two separate waves of ANI-ASI admixture many centuries apart from each other in what are mostly linguistically Indo-European populations in South Asia. This interpretation is possible because the LD method used mostly reflects the most recent admixture date, and ignores the time at which one or more additional previous episodes of ANI-ASI admixture that contribute to the current ANI-ASI genetic mix in modern populations could have taken place.
A narrative like the one below with a first episode of ANI-ASI admixture that spans the entire subcontinent and a second one that is limited to linguistically Indo-European areas can provide a plausible fit to the data that also makes sense in terms of the historical linguistics of the Dravidian language family and the historical precedents of events similar to the one imagined.
Imagine that Sanskrit speaking, religiously proto-Hindu people (also known as the Indo-Aryans because Sanskrit is an Indo-European language), arrive in South Asia around 2000 BCE. These people may themselves be a mix between proto-Indo-European and Harappan people somewhere Northwest India, Northern Pakistan or nearby parts of Central Asia whose ethnogenesis as Indo-Aryans takes place sometime prior to the Indo-Aryan expansion.
Suppose that these Indo-Aryans, who are politically united, warlike and are aided by Bronze Age technologies and horses not immediately available to the indigenous South Asian people that they are conquering, sweep across and conquer almost the entire South Asian continent, everywhere contributing about 30% to 40% of the genes in the local gene pool, in a manner biased towards higher castes, and making their particular dialect of Sanskrit the language of the lands they control in the time period from about 2000 BCE to 1000 BCE. In the process almost all of the indigeneous South Asian language families are wiped out. Only a few holdout kingdoms, midway down the Eastern coast of the Deccan Pennisula (maybe even just one) remain.
Then, the king or chief of one of the last holdout regions (probably midway down the East Coast of the Deccan Peninsula) whose country speaks the proto-Dravidian language leads a counterrevolution against the Indo-Aryans that unites the people of South India and restore the pre-Indo-Aryan Dravidian language in a huge empire covering most of the Southern half of the subcontinent. The new kingdom adopts the technological and culture innovations brought by the Indo-Aryans that it needs to successfully resist their military advances while retaining as much of their pre-Indo-Aryan culture as they can. This counterrevolution, which may well have taken many generations to reach its full extent takes place ca. 1000 BCE to 500 BCE. During this time period, there are no mass migrations into or out of Southern India. These are basically civil wars, not international ones.
The dynasty that rules this Dravidian empire lasts long enough for most of the people who live in it to start speaking proto-Dravidian. A few generations later, however, when counterrevolution's ouster of a linguistically Indo-European ruling class is secure, the practical need to be united politically to defeat the Indo-Aryan conquerors dissipated. An extended period of unfavorable climate conditions and a great-great grandchild who is a lousy king makes the kingdom weak, and the kingdom breaks up into fiefdoms whose languages start to differentiate to giving us the various languages of the modern Dravidian language family.
Then ca. 400 BCE to 100 CE, there is a second wave of Indo-Aryan migration to those parts of South Asia where the Dravidian counterrevolution failed, resulting in an infusion of 10% to 30% (of the total) of additional ANI admixture in places where Indo-Aryans had successfully defeated counter-revolutions. But, these second wave migrants were unwelcome and kept out of places where the Dravidian counter-revolution succeeded. So, the LD data from places where there was a second wave of Indo-Aryan migration appear to have experienced ANI-ASI admixture more recently than Dravidian areas.
Is this plausible?
A familiar analogy (in terms of political see-saws, not necessarily population genetics) would be the rapid Moorish conquest of Spain ca. 700 CE, followed by eight centuries of reconquest of Iberia by Europeans, with some holdout areas as late as 1492, during which more significantly more North African Muslims might migrate Grenada which remained Moorish until almost the end centuries after the original Moorish invasion, but did not migrate to territories that Europeans had retaken.
The surprisingly young time depth of the Dravidian language family fits this scenario well, so this narrative explains not only the population genetic data but the linguistic data as well. And, there is no strong evidence to contradict this scenario.