Explaining South Asian Genetics And Dravidian Linguistic Unity

Dienekes' Anthropology blog notes a recent paper by Moorjani, et al., estimating the date of admixture of genetically Ancestral North Indian (ANI) people and genetically Ancestral South Indian (ASI) people in South Asia.  This post makes a conjecture about what kind of prehistoric narrative could have given rise to their data that makes more sense than the one provided by the authors.

ANI genes (which by definition tend to be more common in North India than in South India) are closer to those of other West Eurasians (e.g. in Iran and the Caucuses and Central Asia) than ASI genes.  This makes total geographic sense.  The easiest way for large numbers of people to migrate from the rest of West Eurasia to India is via Northwestern India.  If people migrate overland, from West Eurasia to India they will get to Northern India before they get to Southern India.  And, you would expect people who are geographically close to each other to be more genetically similar than people who are most distant from each other unless some geographic circumstance or remarkable and atypical folk migration led to a different result.

But, surprisingly, North Indian populations with higher levels of ANI admixture tend to show more recent dates of ANI-ASI admixture than those in South Asia.  This doesn't make a lot of sense without an explanation.  The first substantial ANI-ASI admixture almost certainly had to take place in North India before it did in South India, in the time period (ca. 2200 BCE to 100 CE), estimated by the authors if the data are all lumped together and a single admixture event is assumed.

What could give rise to this data?

A sensible explanation requires two things.  First, an understanding of a quirk associated with the methodology they use in cases where there are multiple episodes of admixture between the same two populations that are separated greatly in time.  Second, a historical narrative that could account for the data observed.  This post provides each in turn below the break.

The Methodology

The study uses a methodology called linkage disequilbrium, which looks at how thoroughly mixed genes in a give person's genome from one source population are mixed with genes from another source population.

One a mother and father have a child, the child has roughly half of the father's genes and roughly half of the mother's genes and the manner in which this happens for the millions of genes that make up a genome is a nearly perfect match to a quite simple mathematical model in every kind of plant and animal that reproduces sexually.

In a child with one parent purely from one population  and the other parent purely from the other, you see big chunks of ANI genes and big chunks of ASI genes.  If the resulting mixed children start intermarrying randomly, the average size of the chunks of ANI or ASI genes respectively gets smaller to an extremely predictable extent.

In cases where there is a single admixture event between two populations (whose relative proportions need not be equal for the methodology to work), you can predict the average ANI and ASI gene chunk size any given number of generations later with great accuracy.  So, if you see a population with a certain distribution of ANI and ASI gene chunks in the genomes of its members, you can very accurately determine how many generations ago the admixture event took place and you can know with great certainty the accuracy of your best guess estimate which is a matter of fairly straightforward statistical calculation.

These dates are precise enough to make meaningful distinctions between absolute dates of admixture that are more than several hundred years apart in the Holocene era that makes up the interesting parts of human history and prehistory.

Unlike estimates of historical population genetic evidence based upon gene mutation rates (particularly gene mutation rates associated in non-recombining Y-DNA lineages) for which the calibration rate of the number of mutations observed to absolute age is controversial with rates used varying by about a factor of three, and for which there are good reasons to doubt that a consistent mutation rate can be reliably applied to all parts of the genome, the only meaningful moving part involved in calibrating dates determined using linkage disequilbrium to absolute dates is the average generation length.  And, there is wide consensus in the historical population genetics field that for modern humans in ancient and prehistoric periods that the average generation length was a quite stable twenty-nine years.
The downside of this methodology, however, is that linkage disequilbrium estimates are more problematic in cases where there are multiple distinct episodes of significant admixture between the same two genetically distinct populations.  It is hard to distinguish the raw data that a multiple admixture episode history produces from a scenario in which there is a single wave of admixture that took place close in time to what was really the last of more than one episode of admixture.  One big wave of admixture between two populations mostly erases linkage disequilbrium evidence of previous episodes of significant admixture between the same two populations.

Piecing Together A Historical Narrative That Makes Sense

The headline estimate of the inferred admixture date for the entire South Asian sample is a meaningless number when different sub-sample populations differ by more than a factor of two from 64 +/- 11 to 144 +/- 27 generations as they do in this study.  Historical, archaeology, geographical realities and the patterns seen in the data itself, all strongly support the notion that admixture in South Asia was not a single event happening at a single time every place in the subcontinent.

But, the geographic distribution of the dates also isn't a good fit for a single admixture event that gradually expanded from a core area where admixture happened first to a gradually moving frontier where it happened later.

The only way to interpret the LD data in a way that makes any sense is to assume that there were at least two separate waves of ANI-ASI admixture many centuries apart from each other in what are mostly linguistically Indo-European populations in South Asia.  This interpretation is possible because the LD method used mostly reflects the most recent admixture date, and ignores the time at which one or more additional previous episodes of ANI-ASI admixture that contribute to the current ANI-ASI genetic mix in modern populations could have taken place.

A narrative like the one below with a first episode of ANI-ASI admixture that spans the entire subcontinent and a second one that is limited to linguistically Indo-European areas can provide a plausible fit to the data that also makes sense in terms of the historical linguistics of the Dravidian language family and the historical precedents of events similar to the one imagined.

The Narrative.

Imagine that Sanskrit speaking, religiously proto-Hindu people (also known as the Indo-Aryans because Sanskrit is an Indo-European language), arrive in South Asia around 2000 BCE.  These people may themselves be a mix between proto-Indo-European and Harappan people somewhere Northwest India, Northern Pakistan or nearby parts of Central Asia whose ethnogenesis as Indo-Aryans takes place sometime prior to the Indo-Aryan expansion.

Suppose that these Indo-Aryans, who are politically united, warlike and are aided by Bronze Age technologies and horses not immediately available to the indigenous South Asian people that they are conquering, sweep across and conquer almost the entire South Asian continent, everywhere contributing about 30% to 40% of the genes in the local gene pool, in a manner biased towards higher castes, and making their particular dialect of Sanskrit the language of the lands they control in the time period from about 2000 BCE to 1000 BCE.  In the process almost all of the indigeneous South Asian language families are wiped out.  Only a few holdout kingdoms, midway down the Eastern coast of the Deccan Pennisula (maybe even just one)  remain.

Then, the king or chief of one of the last holdout regions (probably midway down the East Coast of the Deccan Peninsula) whose country speaks the proto-Dravidian language leads a counterrevolution against the Indo-Aryans that unites the people of South India and restore the pre-Indo-Aryan Dravidian language in a huge empire covering most of the Southern half of the subcontinent.  The new kingdom adopts the technological and culture innovations brought by the Indo-Aryans that it needs to successfully resist their military advances while retaining as much of their pre-Indo-Aryan culture as they can.  This counterrevolution, which may well have taken many generations to reach its full extent takes place ca. 1000 BCE to 500 BCE.  During this time period, there are no mass migrations into or out of Southern India.  These are basically civil wars, not international ones.

The dynasty that rules this Dravidian empire lasts long enough for most of the people who live in it to start speaking proto-Dravidian.  A few generations later, however, when counterrevolution's ouster of a linguistically Indo-European ruling class is secure, the practical need to be united politically to defeat the Indo-Aryan conquerors dissipated.  An extended period of unfavorable climate conditions and a great-great grandchild who is a lousy king makes the kingdom weak, and the kingdom breaks up into fiefdoms whose languages start to differentiate to giving us the various languages of the modern Dravidian language family.

Then ca. 400 BCE to 100 CE, there is a second wave of Indo-Aryan migration to those parts of South Asia where the Dravidian counterrevolution failed, resulting in an infusion of 10% to 30% (of the total) of additional ANI admixture in places where Indo-Aryans had successfully defeated counter-revolutions.  But, these second wave migrants were unwelcome and kept out of places where the Dravidian counter-revolution succeeded.  So, the LD data from places where there was a second wave of Indo-Aryan migration appear to have experienced ANI-ASI admixture more recently than Dravidian areas.

Is this plausible?

A familiar analogy (in terms of political see-saws, not necessarily population genetics) would be the rapid Moorish conquest of Spain ca. 700 CE, followed by eight centuries of reconquest of Iberia by Europeans, with some holdout areas as late as 1492, during which more significantly more North African Muslims might migrate Grenada which remained Moorish until almost the end centuries after the original Moorish invasion, but did not migrate to territories that Europeans had retaken.

The surprisingly young time depth  of the Dravidian language family fits this scenario well, so this narrative explains not only the population genetic data but the linguistic data as well.  And, there is no strong evidence to contradict this scenario.


Congratulations, Andrew. It is a very well explained article that I enjoyed reading very much.

Now, as you know I am not at all confident of molecular clock estimates and what you explain in the article does not seem to underline their solidity: in pure theory there could have been even hundreds or thousands of older ANI-ASI admixture episodes that would not be accounted for, right?

As you may also know, I strongly doubt that the ANI component can be related to Indoeuropean invasions. If so it'd be very close to that of Eastern Europe and AFAIK it is not: it is closest to ASI, then to West Asia, etc. Feel free to correct me if I'm wrong, of course.

Just because ANI is closer to the wider West Eurasian component than ASI, it does not make it derived from West Eurasia, much less recently so. It could for example be the other way around (ANI → WEA instead of WEA → ANI) or belong to a much older back-migration from West Asia, for example in the Neolithic. My way to study this would be to run Admixture and check for Fst distances, which may be organized in a hierarchical tree-like form. I haven't yet done that so I don't know which is the result but I have heard of such kind of criticisms to the Eurocentric interpretation of the ANI/ASI duality.

For example all Indians are clearly less distant from Tuscans (Fst values: 6.2-11.5) than from NW Europeans (Fst values: 10.4-15.8), ref.

Also North Indians/Brahmins seem less diverse than South Indians, however they are more diverse than Europeans or East Asians. So I believe that there is an oversimplification going on here, probably caused by wrong approaches, in turn caused by an Eurocentric bias that disdains Indian great diversity and antiquity and makes incoherent assumptions.

But I have not studied the matter in enough depth, so I am not sure.

I still suspect that the fact that the ANI-ASI admixture being older in the South may be caused by an ancient expansion from North India, and that the more recent ANI-ASI admixture dates in North India may be caused by an expansion from the South at a later time. But this is very hypothetical and needs testing.

BTW, can you send me a copy of the study to lialdamiz[at]gmail[dot]com? Thanks in advance.

I'm re-reading Metspalu 2011 and sadly there is no Fst table between components, only between "raw" populations. These however show much stronger affinity within South Asia than with any other population. Even Pakistan, which is more intermediate, does not show any particular European affinity.

The ANI component seems to be the Balochi-Caucasian one, i.e. a likely West Asian Neolithic component, totally unrelated to IE migrations.

I don't have an open access copy of the study. But, the quotes and discussions from others who do and my familiarity with the topic are enough to make sense of what is likely going on.

I have no doubt that Indo-Aryan expansion out of NW India happened and that it transformed India both demographically and linguistically at about the time traditionally attributed to it. Also, on the basis of the uniparental markers, particularly the male ones, I think it is fair to assume that there was some West Eurasian component to Indo-Aryan expansion.

However, I don't disagree that a lot of autosomal ANI is attributable to an Indus River Valley Civilization/Harappan substrate that was present before culturally and linguistically Indo-European people arrived on the scene around 2000 BCE. My suspicion is that this is the biggest source of it and probably dates from the IVC Neolithic ca. 6000-7000 BCE, with smaller additional contributions from pre-Neolitical clinal variation in forager populations and from West Eurasian Indo-Europeans (probably from a source in or near the Caucasus Mountains).

I suspect that the secondary expansion that is being picked up corresponds to the series of expansions originating in Northern India that culminated in the rule of King Ashoka, a hailed as an enlightened prince who ushered in a brief era during which Buddhism was the official religion of India, a religious movement that didn't last in India but had long standing impact in SE Asia and E Asia.

While Ashoka's empire covered almost the entire subcontinent, until that of his predecessors in the dynasty and the kingdom that preceded the Maurya dynasty, he, unlikely his predecessors disavowed the brutal conquests of his father and grandfather that likely had more of a demographic impact than his own more peaceful method of consolidating power.

Someone else read the comment, it seems, and sent me a copy already. I'll send you a copy, as you seem so interested.

"I have no doubt that Indo-Aryan expansion out of NW India happened and that it transformed India both demographically and linguistically at about the time traditionally attributed to it".

I have many doubts about all that, especially about the demographic impact. Instead we do know that Neolithic expansion and IVC influences, even after the IE conquest, were important. A component that is centered in Balochistan and West Asia looks anything but Indoeuropean. However, as some people have noticed some groups (notably Brahmins) have an specifically European signature that is probably associated to the IE conquest. However it is rather minor in comparison with the aboriginal (ASI) and West Asian (ANI/Baloch-Caucasian) components.

... "series of expansions originating in Northern India that culminated in the rule of King Ashoka"...

If so, why is the ANI and IE signatures especially carried by Brahmins, who are the Hinduist priestly caste and the greatest enemies of Buddhism historically, as well as any other movement against caste hierarchy? Notice that Asoka's kingdom originated in what is now Bengal, Bihar, Orissa, etc., and not in the NW. It seems to me that Buddhism is a reaction to Vedic Hinduism (i.e. the archaic and more purely IE variant of non-Shaivite, non-Shakti, Hinduism, with Indra (~Zeus) and Vishnu (~Apolo) on top) and especially the hierarchical caste system, with much weaker roots in the South and East.

Other reactions are Shaivism and especially the Shakti variant. Modern Hinduism is a complex and often contradictory synthesis of Vedic remnants, Shaivism/Shaktism and diverse local cults. The opposition between these two currents is still apparent between for example the Babas (street monks) and the Brahmins (caste priests), who don't see each other too well.

Even Jainism is also a (very ascetic) reaction to caste Hinduism.

Going to the paper, it does have some powerful arguments for single-pulse admixture (table 2), at least for many populations. Probably their p<0.05 line is a bit too loose but, even if we are strict and use p<0.001, 6/11 populations are within bounds. So, assuming that the statistical methods applied are correct, then there was a single pulse of ANI-ASI admixture almost for sure in the following cases: Brahmin, Bhil, Mala, Dharkar, Sindhi and Pathan. These are all IE speakers from the NW excepting the Mala. All other Dravidians (Vysya and Madiga) are quite unlikely to be the product of a single pulse (p>0.05), while other IE groups like the Ksatriya (former warrior caste), the Kashmiri Pandit (upper caste) and the Chamar (tribe) are in between those values. Not sure what these differences mean but it seems that in many cases there was a single admixture event.

Also it is very clear in the paper that Pop1 (ANI) is of West Asian origin (Georgians as best reference) and not European (Basques as reference). Actually in Europe they are closest to Italians (Balcans not compared), what is consistent with West Asian origins of the so-called ANI component rather than IE origins from Eastern Europe and Central Asia.

andrew said...

"If so, why is the ANI and IE signatures especially carried by Brahmins, who are the Hinduist priestly caste and the greatest enemies of Buddhism historically"

The pre-Asoka leaders of the dynasty and its predecessor weren't Buddhists and were very unlike him in style and policy. He basically stopped the trend brought about by his predecessors in its tracks from the top, a bit like Gorbachev and the Soviet Union.