Tuesday, February 2, 2021

Harappan, Dravidian and Indo-Aryan Legacies

Razib Khan has made some interesting posts at Brown Pundits and at his other online forums in early 2021, one of which I agree with completely, and another of which I think is probably not on target. I started writing this responsive post at the time, but go derailed. I've completed it and posted it now with only modest revision.

In the post that I fully agree with, regarding the Y-DNA of Brahmins in India, he writes at Brown Pundits (some linked added by me):

I was talking to a person of South Indian Brahmin origin today about their genetics. Over the course of the conversation, he showed me Y and mtDNA haplogroup types amongst his jati. The vast majority of the Y haplogroups were not R1a.

Brahmin groups in India seem to be about 15% to 30% steppe in their overall genome. But their Y chromosomes are usually 50% or so R1a1a-Z93. The lineage associated with Indo-Iranian pastoralists.

So what’s going on with the other haplogroups? For example, J2, L, C, G, and H?

From what I can see J2 and L are the next most frequent haplogroups after R1a1a-Z93. This tells us something. These are haplogroups found in ancient “Indus Periphery” samples. And, these two haplogroups are found at high concentrations in the northwest of the subcontinent.

It doesn’t take a Brahmin to connect the dots here. Some of the gotra as early as the Vedic period were almost certainly derived from high-status individuals in the post-IVC society. Warriors and priests in the fallen civilization of the IVC, which had likely degraded itself to a level of barbarism by the time the Indo-Aryans became ascendant.

The notion that the mostly Indo-Aryan steppe people who were a leading component of the Brahmin class also included elites from the Harappan culture which was already in some stage of collapse when they migrated to South Asia seems very likely.

Relevant Y-DNA Haplogroups

Y-DNA H is the most definitively autochthonous Y-DNA haplogroup of India and South Asia, is most common in Sri Lanka (25%), South India (26-27%) and Bangladesh (36%), fairly common in North India (25% but varying from 10% to 44% by caste with lower frequencies in high caste and higher frequencies in low caste) and Nepal (6%-39% with large regional variation and the high frequencies restricted to villages with founder effect issues). 

Y-DNA H frequencies are much lower in even nearby adjacent populations Pakistani populations have 3-8% (except the genetically distinctive the Kalash with 20%) and Afghanistan has 6-7%. But it is found in 30%-60% of men in most Roma populations of Europe (a.k.a. Gypsy, with all collective names for this people being somewhat problematic) which are derived from India.

Y-DNA L has a South Asian centered distribution but a quite different distribution than Y-DNA L, suggestive of West Asia origins. It has higher frequency among members of Dravidian castes (ca. 17-19%) (as opposed to sub-caste Dalit and "tribal" people of India) but is somewhat rarer in members of Indo-Aryan castes (ca. 5-6%). It is also present at high frequencies among the Kalash.


Y-DNA J2 and G have a distribution suggestive of West Asian origins.

Haplogroup J2 has been present in South Asia mostly as J2a-M410 and J2b-M102, since neolithic times (9500 YBP). J2-M172 was found to be significantly higher among Dravidian castes at 19% than among Indo-European castes at 11%. J2-M172 and J-M410 is found 21% among Dravidian middle castes, followed by upper castes, 18.6%, and lower castes 14%. . . . 
In Pakistan, the highest frequencies of J2-M172 were observed among the Parsis at 38.89%, the Dravidian speaking Brahui's at 28.18% and the Makrani Balochs at 24%. It also occurs at 18.18% in Makrani Siddis and at 3% in Karnataka Siddis. J2-M172 is found at an overall frequency of 16.1% in the people of Sri Lanka.

What role did Harappans play in Dravidian ethnogenesis?

I am much more skeptical of Razib Khan's conjecture in another post at Brown Pundits. He writes:

Peter Bellwood in First Farmers presents a hypothesis for the expansion of the Dravidian languages into southern India in the late Neolithic through the spread of an agro-pastoralist lifestyle through the western Deccan, pushing southward along the Arabian sea fringe. At the time I was skeptical, but now I am modestly confident that this is close to the reality. 
[The South Indian Neolithic was probably the source of the expansion of the Dravidian languages, from a focal point in southern India, not "into it", and given that it involved a mix of Sahel African domesticates, Fertile Crescent Neolithic domesticates, and a few local domesticates, I suspect that there were maritime impact from Africa, and as well as borrowings from IVC agriculturists. But the IVC culture didn't spread south sooner because the Fertile Crescent package of crops didn't thrive well enough to support a Neolithic culture by itself in southern India so the input of Sahel African domesticates was probably the critical final piece of the puzzle.]
There is always talk about “steppe” ancestry on this weblog. But there are groups that seem “enriched” from IVC ancestry, as judged by the Indus Periphery samples. The confidence is lower since we don’t have nearly as good a sample coverage…but I think I can pass on what we’ve seen so far: groups in southern Pakistan, non-Brahmin elites in South India, and some Sudra groups in Gujarat and Maharashtra, seem to be relatively enriched for IVC-like ancestry. Then there is the supposed existence of Dravidian toponyms in Sindh, Gujarat, and Maharashtra. And, their total absence in the Gangetic plain.

[Discussed below.] 

There have been decades of debate about Brahui. I’ve looked closely at Brahui genetics, and they are no different from the Baloch. Combined with evidence from Y chromosomes (the Baloch and Brahui have some of the highest frequencies of haplogroups found in IVC-related ancient DNA), I doubt the thesis they are medieval intruders (if they are, their distinctive genes were totally replaced).

[The case that they are medieval intruders whose distinctive genes were totally replaced is the stronger one as discussed below.] 

Genetically, we know that some southern tribes, such as the Pulliyar, have some IVC-related ancestry. But other groups, such as Reddy in Andhra Pradesh, have a lot more. How does this cline emerge? 

[Discussed below.] 

My conjecture is that there were several movements of “Dravidian” people from Sindh and Gujarat into southern India, simultaneous with the expansion of Vedic Aryans to the north into the Gangetic plain. The region the Vedic Aryans intruded upon, Punjab, was not inhabited by Dravidian speakers. Like Mesopotamia, the Indus Valley Civilization was probably multi-lingual, despite broad cultural affinities developed over time.

[I disagree with almost all of this as discussed below.] 

I don't disagree that he is correct on the distribution of Dravidian toponyms or the enhanced levels of IVC-type genetics where he notes it. But I do disagree with the narrative he provides for those facts.

The strongest evidence for a ca. 1000 CE origin for the Brahui, rather than a deep and ancient one, is linguistic. The Brahui language contains linguistic innovations not present in the Dravidian languages until after about 1000 CE that are specific to the North Dravidian language of which it is one.

All of the communities with North Dravidian language speakers also have traditions of an origin in the Deccan Peninsula, which are more likely to have persisted over time in a more recent migration.

Brahui is spoken beyond and outside the range of widespread Dravidian toponyms, which once established tend to be stable.

Obviously, the genetics suggests a scenario of elite language shift with only a little, possibly no longer discernible due to dilution, demic contribution to the population genetics of the founding elite. But, this is not unprecedented. Something similar took place in Hungary in about the same era. In Turkey, also in roughly the same era, there was language shift with a quite modest ethnically Turk genetic contribution, and more than twenty centuries before that, language shift to the Hittite language was also elite dominated. In South Asia, there was more demic impact, but the language shift to Sanskrit derived Indo-Aryan languages was still elite dominated. This also happened in post-Columbian Latin America.

In the Indus River Valley itself and adjacent areas, IVC ancestry is likely due to Harappan migration (a society that was not Dravidian language speaking although it had contacts with Dravidian language speakers) prior to the arrival of the Indo-Aryans, and this is probably the source of IVC ancestry in the Brahui.

The archaeological evidence and ancient seals and limits historical accounts from Mesopotamia, all tend to disfavor the conclusion that the "Indus Valley Civilization was probably multi-lingual", and instead tend to favor a linguistic, cultural, and political unity maintains from the outset of its adoption of Fertile Crescent agriculture that persisted until close to its collapse. There isn't evidence of war or war-like fortifications in the core IVC region until very late, which political unity which usually leads to linguistic and cultural unity, can produce.

There is also essentially no evidence for the narrative that "there were several movements of “Dravidian” people from Sindh and Gujarat into southern India, simultaneous with the expansion of Vedic Aryans to the north into the Gangetic plain." 

The evidence instead suggests that the migrations went the other way and ultimately collapsed in some areas, leaving only toponyms and a few isolated linguistic communities like the Brahui and other Northern Dravidian enclaves from a high water mark of Dravidian expansion from South India.

Moorjani, et al., "Genetic Evidence for Recent Population Mixture in In India" 93 American Journal of Human Genetics 422-438 (September 5, 2013) estimated the admixture history of India based upon an analysis of a moderate sized samples of modern Indian genomes. 

Moorjani found that the timing of admixture was later than the South Asian Neolithic era and was in some cases consistent with only a single wave of admixture based not only on Linkage Disequilibrium methods (which are less prone to uncertainty that mutation rate methods) but also by confirming that the components that did admix were consistent with being from the same autosomal gene pools as opposed to different ones as would be expected if there were two waves of admixture from different sources. 

Ancestral North Indian (ANI) ancestry is a mix of steppe ancestry and ancestry from co-opted IVC people. The admixture into Dravidian peoples is older, reflecting a broad sweep of the subcontinent that went unchecked, but presumably was then beaten back in a Dravidian reconquest of Southern India, which accounts for both the fact that these regions don't speak Indo-Aryan languages now (although the pro-Hindu religion stuck), and the deceptively young apparent age of the Dravidian language family due to the extinction of all Dravidian dialects except the relict core from which the reconquest was mounted. In contrast, in linguistically Indo-Aryan North India, there was a second pulse of steppe introgression that came later. 

The higher ratio of IVC genetics to steppe ancestry in South Indian Brahmins likely reflects a narrative in which lower status post-IVC local elites sought positions leading the conquest of South India at a greater rate than higher status steppe elites, because they had better opportunities for promotion there, and may also, more conjecturally, found the climate less out of their comfort zone than the steppe elites who were already at the fringe of their comfort zones in North India. 

The abstract and body text of Moorjani (2013) notes that:


Genetic evidence indicates that most of the ethno-linguistic groups in India descend from a mixture of two divergent ancestral populations: Ancestral North Indians (ANI) related to West Eurasians (people of Central Asia, the Middle East, the Caucasus, and Europe) and Ancestral South Indians (ASI) related (distantly) to indigenous Andaman Islanders. The evidence for mixture was initially documented based on analysis of Y chromosomes and mitochondrial DNA and then confirmed and extended through whole-genome studies.

Archaeological and linguistic studies provide support for the genetic findings of a mixture of at least two very distinct populations in the history of the Indian subcontinent. The earliest archaeological evidence for agriculture in the region dates to 8,000–9,000 years before present (BP) (Mehrgarh in present-day Pakistan) and involved wheat and barley derived from crops originally domesticated in West Asia. The earliest evidence for agriculture in the south dates to much later, around 4,600 years BP, and has no clear affinities to West Eurasian agriculture (it was dominated by native pulses such as mungbean and horsegram, as well as indigenous millets). 
Linguistic analyses also support a history of contacts between divergent populations in India, including at least one with West Eurasian affinities. Indo-European languages including Sanskrit and Hindi (primarily spoken in northern India) are part of a larger language family that includes the great majority of European languages. In contrast, Dravidian languages including Tamil and Telugu (primarily spoken in southern India) are not closely related to languages outside of South Asia. Evidence for long-term contact between speakers of these two language groups in India is evident from the fact that there are Dravidian loan words (borrowed vocabulary) in the earliest Hindu text (the Rig Veda, written in archaic Sanskrit) that are not found in Indo-European languages outside the Indian subcontinent.

Although genetic studies and other lines of evidence are consistent in pointing to mixture of distinct groups in Indian history, the dates are unknown. Three different hypotheses (which are not mutually exclusive) seem most plausible for migrations that could have brought together people of ANI and ASI ancestry in India. The first hypothesis is that the current geographic distribution of people with West Eurasian genetic affinities is due to migrations that occurred prior to the development of agriculture. Evidence for this comes from mitochondrial DNA studies, which have shown that the mitochondrial haplogroups (hg U2, U7, and W) that are most closely shared between Indians and West Eurasians diverged about 30,000–40,000 years BP. The second is that Western Asian peoples migrated to India along with the spread of agriculture; such mass movements are plausible because they are known to have occurred in Europe as has been directly documented by ancient DNA. Any such agriculture related migrations would probably have begun at least 8,000–9,000 years BP (based on the dates for Mehrgarh) and may have continued into the period of the Indus civilization that began around 4,600 years BP and depended upon West Asian crops. The third possibility is that West Eurasian genetic affinities in India owe their origins to migrations from Western or Central Asia from 3,000 to 4,000 years BP, a time during which it is likely that Indo-European languages began to be spoken in the subcontinent. A difficulty with this theory, however, is that by this time India was a densely populated region with widespread agriculture, so the number of migrants of West Eurasian ancestry must have been extraordinarily large to explain the fact that today about half the ancestry in India derives from the ANI. It is also important to recognize that a date of mixture is very different from the date of a migration; in particular, mixture always postdates migration. Nevertheless, a genetic date for the mixture would place a minimum on the date of migration and identify periods of important demographic change in India. . . . 
Most Indian groups descend from a mixture of two genetically divergent populations: Ancestral North Indians (ANI) related to Central Asians, Middle Easterners, Caucasians, and Europeans; and Ancestral South Indians (ASI) not closely related to groups outside the subcontinent. The date of mixture is unknown but has implications for understanding Indian history. We report genome-wide data from 73 groups from the Indian subcontinent and analyze linkage disequilibrium to estimate ANI-ASI mixture dates ranging from about 1,900 to 4,200 years ago. In a subset of groups, 100% of the mixture is consistent with having occurred during this period. . . . 
By using f4 ratio estimation that analyzes allele frequency correlation patterns to infer mixture proportions, we estimate that ANI ancestry along the Indian cline ranges from as low as 17% (Paniya) to as high as 71% (Pathan). Traditionally lower caste, Dravidian, and tribal groups tend to have lower proportions of ANI ancestry than traditionally upper caste and Indo-European groups (p < 0.001). . . . 
To date ANI-ASI mixture, we capitalized on the fact that admixture between two populations generates allelic association (linkage disequilibrium [LD]) between pairs of SNPs. The LD decays at a constant rate as recombination breaks down the contiguous chromosomal blocks inherited from the ancestral mixing populations. The expected value of the admixture LD is related to the genetic distance between SNPs (the probability of recombination per generation between them) and the time that has elapsed since mixture. We previously reported simulations showing that dating population mixture based on the scale of admixture LD is robust to the use of imperfect surrogates for the ancestral populations, fine-scale errors in the genetic map, and a history of founder events in the admixed population, and is able to provide unbiased estimates for the dates of events up to 500 generations ago. We confirmed this by using new simulations with demographic parameters relevant to India. 
We estimated admixture dates for all the groups on the Indian cline with more than five samples (a minimum sample size is important for measuring LD with precision). We observe a decay of LD with genetic distance for all groups. By fitting an exponential function using least-squares (via rolloff), our point estimates for the dates range from 64 to 144 generations ago, or 1,856 to 4,176 years assuming 29 years per generation.
We highlight two implications of these dates. 
First, nearly all groups experienced major mixture in the last few thousand years, including tribal groups like the Bhil, Chamar, and Kallar that might be expected to be more isolated. 
Second, the date estimates are typically more recent in Indo-Europeans (average of 72 generations) compared to Dravidians (108 generations). A jackknife estimate of the difference is highly significant at 35 ± 8 generations (Z = 4.5 standard errors from zero). A possible explanation is a secondary wave of mixture in the history of many Indo-European groups, which would decrease the estimated admixture date. . . . 
A caveat for these dating analyses is that they assume that the entire admixture occurred instantaneously (or over a small number of generations). However, population mixture can be noninstantaneous, such that the date we obtain from our method may actually be an average of multiple dates spread out over a substantial period. One way to detect a history of noninstantaneous gene flow is to fit a sum of exponential functions to the decay of admixture LD and to show that this provides a better fit to the data than a single exponential function, as we in fact find for the Kashmiri Pandit, Kshatriya, Sindhi, and Pathan. However, even if we fail to detect a nonexponential decay, we cannot rule out noninstantaneous gene flow, because the decay can be noisy, making the statistical detection of a mixture of exponential functions difficult. 
A particularly important scenario we could not rule out by this method is that several thousand years ago, Indian groups were already admixed, and thus the LD decay we detect is the result of mixture of already admixed ancestral groups with different proportions of ANI ancestry. If the initial admixture was more than 10,000 years old, the associated admixture LD would have decayed to such a short distance that our methods would have poor power to detect it. The LD we measure might in this case reflect only the final admixture events, complicating interpretation of the results. . . . 
[W]e identified previously undetected complexity in Indian history, with many sets of Indian groups not consistent with a simple ANI-ASI admixture. . . . . we find that the Indian groups consistent with simple ANI-ASI mixture are most often from tribal and traditionally lower-caste groups. Middle- and upper-caste groups tend to have evidence of more complex histories, with signals of multiple layers of ANI ancestry from slightly different ANI ancestral populations. Further evidence for multiple waves of admixture in the history of many traditionally middle- and upper-caste groups (as well as Indo-European and northern groups) comes from the more recent admixture dates we observe in these groups and the fact that a sum of two exponential functions often produces a better fit to the decay of admixture LD than does a single exponential. Evidence for multiple components of West Eurasian-related ancestry in northern Indian populations has also been reported by Metspalu et al. based on clustering analysis. 
Focusing on the largest set of Indo-Europeans (four groups) and the largest set of Dravidians (five groups) consistent with mixture of the same ANI and ASI ancestral populations, we find that the expected and observed admixture LD amplitudes are equivalent to within the limits of our resolution. . . . our data are consistent with all of the ANI ancestry in some selected sets of Indians (including groups speaking both Indo-European and Dravidian languages) being due to admixture events that we can date to within the past few thousand years. Accounting for statistical uncertainty, we estimate that the ANI ancestry that cannot be explained by a single wave of admixture in the last few thousand years has a 95% confidence interval (truncated to 0) of 0%–19% for Indo-Europeans and 0%–16% for Dravidians. Thus, all the ANI ancestry in some groups is consistent with deriving from admixture events that have occurred in the past few thousand years. 
Our analysis documents major mixture between populations in India that occurred 1,900–4,200 years BP, well after the establishment of agriculture in the subcontinent. We have further shown that groups with unmixed ANI and ASI ancestry were plausibly living in India until this time. This contrasts with the situation today in which all groups in mainland India are admixed. 
These results are striking in light of the endogamy that has characterized many groups in India since the time of mixture. For example, genetic analysis suggests that the Vysya from Andhra Pradesh have experienced negligible gene flow from neighboring groups in India for an estimated 3,000 years. Thus, India experienced a demographic transformation during this time, shifting from a region where major mixture between groups was common and affected even isolated tribes such as the Palliyar and Bhil to a region in which mixture was rare. Our estimated dates of mixture correlate to geography and language, with northern groups that speak Indo-European languages having significantly younger admixture dates than southern groups that speak Dravidian languages. This shows that at least some of the history of population mixture in India is related to the spread of languages in the subcontinent. 
One possible explanation for the generally younger dates in northern Indians is that after an original mixture event of ANI and ASI that contributed to all present-day Indians, some northern groups received additional gene flow from groups with high proportions of West Eurasian ancestry, bringing down their average mixture date. This hypothesis would also explain the nonexponential decays of LD in many northern groups and their higher proportions of ANI ancestry. . . .
The dates we report have significant implications for Indian history in the sense that they document a period of demographic and cultural change in which mixture between highly differentiated populations became pervasive before it eventually became uncommon. The period of around 1,900–4,200 years BP was a time of profound change in India, characterized by the deurbanization of the Indus civilization, increasing population density in the central and downstream portions of the Gangetic system, shifts in burial practices, and the likely first appearance of Indo-European languages and Vedic religion in the subcontinent. The shift from widespread mixture to strict endogamy that we document is mirrored in ancient Indian texts. The Rig Veda, the oldest text in India, has sections that are believed to have been composed at different times. The older parts do not mention the caste system at all, and in fact suggest that there was substantial social movement across groups as reflected in the acceptance of people with non-Indo-European names as kings (or chieftains) and poets. The four-class (varna) system, comprised of Brahmanas, Ksatriyas, Vaisyas, and Sudras, is mentioned only in the part of the Rig Veda that was likely to have been composed later (book 10). The caste (jati) system of endogamous groups having specific social or occupational roles is not mentioned in the Rig Veda at all and is referred to only in texts composed centuries after the Rig Veda, for example, the law code of Manu that forbade intermarriage between castes. Thus, the evolution of Indian texts during this period provides confirmatory support as well as context for our genetic findings. 
It is also important to emphasize what our study has not shown. Although we have documented evidence for mixture in India between about 1,900 and 4,200 years BP, this does not imply migration from West Eurasia into India during this time. On the contrary, a recent study that searched for West Eurasian groups most closely related to the ANI ancestors of Indians failed to find any evidence for shared ancestry between the ANI and groups in West Eurasia within the past 12,500 years (although it is possible that with further sampling and new methods such relatedness might be detected). 
An alternative possibility that is also consistent with our data is that the ANI and ASI were both living in or near South Asia for a substantial period prior to their mixture. Such a pattern has been documented elsewhere; for example, ancient DNA studies of northern Europeans have shown that Neolithic farmers originating in Western Asia migrated to Europe about 7,500 years BP but did not mix with local hunter gatherers until thousands of years later to form the present-day populations of northern Europe. 
The most remarkable aspect of the ANI-ASI mixture is how pervasive it was, in the sense that it has left its mark on nearly every group in India. It has affected not just traditionally upper-caste groups, but also traditionally lower-caste and isolated tribal groups, all of whom are united in their history of mixture in the past few thousand years. 

2 comments:

Guy said...

Well, I think that when we get significant amounts of aDNA from south east Asia that we will find that the story is much more complex than we imagined. Because that is the constant we have seen in Europe over the last five to seven years. And you might be right in some of the particulars, but I dare say it will be due to lucky guessing as much as penetrating analysis!

andrew said...

Southeast Asian aDNA has almost nothing to do with any of these disputes. If we had it, it would pertain to Munda origins and migrations about which there is far less controversy concerning the overall narrative and about which the linguistic, archeological, and genetic evidence are telling a more or less consistent and unambiguous story.

More ancient DNA from Eastern Iran, South India, and Southern Pakistan, on the other hand, could shed light on the issue addressed by this post, and might tell a more complex story.

Among other issues, we still don't have enough data to really understand the story of Y-DNA T in South Asia, which may be critical to the Dravidian story.