Tuesday, January 26, 2016

South Asian Population History

A more sophisticated autosomal genetic model of South Asian populations than Reich's ANI (ancestral North Indian) and ASI (ancestral South Indian) model from 2009, using richer data is now available.
India, occupying the center stage of Paleolithic and Neolithic migrations, has been underrepresented in genome-wide studies of variation. Systematic analysis of genome-wide data, using multiple robust statistical methods, on (i) 367 unrelated individuals drawn from 18 mainland and 2 island (Andaman and Nicobar Islands) populations selected to represent geographic, linguistic, and ethnic diversities, and (ii) individuals from populations represented in the Human Genome Diversity Panel (HGDP), reveal four major ancestries in mainland India. This contrasts with an earlier inference of two ancestries based on limited population sampling. A distinct ancestry of the populations of Andaman archipelago was identified and found to be coancestral to Oceanic populations. Analysis of ancestral haplotype blocks revealed that extant mainland populations (i) admixed widely irrespective of ancestry, although admixtures between populations was not always symmetric, and (ii) this practice was rapidly replaced by endogamy about 70 generations ago, among upper castes and Indo-European speakers predominantly. This estimated time coincides with the historical period of formulation and adoption of sociocultural norms restricting intermarriage in large social strata. A similar replacement observed among tribal populations was temporally less uniform.
Basu et al., Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure, PNAS, Published online before print January 25, 2016, doi: 10.1073/pnas.1513197113 (Open access).

The samples cover 13 tribal populations (three Dravidian, four Munda, one Dravidian and Munda, two Ongan, one Indo-European and two Tibeto-Burman), 1 lower-middle caste population (Dravidian), and 6 upper caste populations (four Indo-European, one Dravidian and one Tibeto-Burman).  Thus, it may be under-representative of the vast middle of Indian society (lower caste Indo-European and Dravidian populations).

Further inspection reveals that the two new mainland genetic components, one for Austro-Asiatic Munda speaking populations and one for Tibeto-Burman populations, are completely expected and unsurprising, although the new component for Andaman and Nicobarese Islanders, previously equated with ASI, is an innovation, albeit a modest one (the supplemental materials show that the Andamanese are indeed very distinct from any mainland population and cluster with Papuans instead). As the paper explains:
Contemporary populations of India are linguistically, geographically, and socially stratified, and are largely endogamous with variable degrees of porosity. We analyzed high quality genotype data, generated using a DNA microarray (Methods) at 803,570 autosomal SNPs on 367 individuals drawn from 20 ethnic populations of India (Table 1 and SI Appendix, Fig. S1), to provide evidence that the ancestry of the hunter gatherers of A&N is distinct from mainland Indian populations, but is coancestral to contemporary Pacific Islanders (PI). Our analysis reveals that the genomic structure of mainland Indian populations is best explained by contributions from four ancestral components. In addition to the ANI and ASI, we identified two ancestral components in mainland India that are major for the AA-speaking tribals and the TB speakers, which we respectively denote as AAA (for “Ancestral Austro-Asiatic”) and ATB (for “Ancestral Tibeto-Burman”). Extant populations have experienced extensive multicomponent admixtures. Our results indicate that the census sizes of AA and TB speakers in contemporary India are gross underestimates of the extent of the AAA and the ATB components in extant populations. 
All of the South Asian mainland populations studies have ANI, ASI and AAA admixture, and while ATB admixture was much more geographically defined, some populations that are not linguistically TB had more than trace amounts of ATB admixture.

The paper's insights about when the caste system came into existence in its modern endogamous form is also notable. The leading schools of thought in conventional wisdom had been that caste either arose immediately at the time of the arrival of the Indo-Aryans, ca. 4000 years ago, or emerged only under British colonial rule ca. 500 years ago.  There is now convincing evidence to suggest that both of these theories are wrong (link added editorially).
We have inferred that the practice of endogamy was established almost simultaneously, possibly by decree of the rulers, in upper-caste populations of all geographical regions, about 70 generations before present, probably during the reign (319–550 CE) of the ardent Hindu Gupta rulers. The time of establishment of endogamy among tribal populations was less uniform.
Since the genetic evidence points to strict caste endogamy as arising during the historic era, historians may be able, with this hint, to more precisely pin down what occurred from written documentation from the people who experienced that event.  They may have the dates wrong, however.  The authors assume a 22.5 year generation, when the standard assumption of generation length in this field is usually about 29 years, which would point to caste solidification three or four centuries earlier, although the historical case tends to favor the more recent Gupta era date.

The paper does not extensively examine the affinities and structure of the ANI component and its arrival in South Asia, for example, to attempt to discern if there was more than one wave of ANI introgression or to see which West Eurasian populations show the greatest affinity to it.

UPDATE: Razib offers some methodological criticism focused on a misunderstanding of what ADMIXTURE does which understates the extent to which populations are admixed and faults the investigators for not incorporating 1000 Genomes data and for not doing f4 and D statistic analysis. (Ironically, his title, "South Asians are not descended from four populations" is probably not true and not supported by his analysis.)

7 comments:

terryt said...

"the new component for Andaman and Nicobarese Islanders, previously equated with ASI, is an innovation, albeit a modest one (the supplemental materials show that the Andamanese are indeed very distinct from any mainland population and cluster with Papuans instead)".

A 'modest' innovation? I don't think so. How does that fit with the idea of the Andaman and New Guinea ancestors having reached the region via India? Unless you're going to postulate a total replacement by the other four populations.

"we identified two ancestral components in mainland India that are major for the AA-speaking tribals and the TB speakers"

Both fairly obviously immigrants from the east.

"All of the South Asian mainland populations studies have ANI, ASI and AAA admixture, and while ATB admixture was much more geographically defined, some populations that are not linguistically TB had more than trace amounts of ATB admixture".

Now that is a modest innovation and makes complete sense. Genes and languages become unhitched from each other quite readily. Sometimes the languages move much further than the genes and sometimes the other way round.

"We have inferred that the practice of endogamy was established almost simultaneously, possibly by decree of the rulers, in upper-caste populations of all geographical regions, about 70 generations before present, probably during the reign (319–550 CE) of the ardent Hindu Gupta rulers".

That is fascinating.

"The paper does not extensively examine the affinities and structure of the ANI component and its arrival in South Asia"

I can't access the paper and so don't know what it says about the affinities of the ASI or its arrival. Any information?

andrew said...

The paper is open access so you should be able to look at it. It doesn't say much about the affinities of ASI or its arrival either, except to demonstrate the gap between ASI bearing populations and the Andamanese.

Some of that gap could be due to ANI or AAA contamination on the mainland over the last 20,000 years or so with the Andamanese being a relatively pure type, because they don't really compare a ghost ASI genome to the Andamanese one, and don't do the kind of f4 and D statistic comparisons that would really help nail the source of the distinction. Given Reich's association of ASI and Andamanese, I am cautious about reaching rash conclusions without a bit more solid data.

Ryan said...

The Manusmriti dates to 200 BCE to 200 CE, so there timeline isn't too far off.

terryt said...

Thanks Andrew. I didn't seem able to access the paper from Dienekes' blog. It does seem as though this paper does draw a well-defined distinction between ASI and Andamans though.

andrew said...

I don't doubt that ASI and Andamans are distinguishable. But, the question is why.

I guess it is correct that ASI and Andaman show up as different components, so it can't just be admixture. But, given the tendency of small inbred island populations to be artificially distinctive, I'd like to see more analysis of the two components and their differences. The PCAs don't give a take on where the components themselves sit in relation to each other since while there is a pure AAN there is not a pure ASI in the sample. My guess is that a pure ASI would be close to AAN than any of the extant samples.

andrew said...

The Manusmriti, which dates to 200 BCE to 200 CE, would be right on the button of a 70 generation, standard 29 years per generation estimate of 15 BCE, and is very pertinent since this is essentially a law proclaiming document that spells out the four tier caste system. Written codification of rules related to caste would coincide perfectly with a sudden ossification of caste endogamy at the same time.

This would precede the earliest Gupta kings by about 300 years and there is really nothing in the Gupta empire that screams out for a major change in caste practices. If anything, the fact that a warrior caste rather than Brahmin caste dynasty could rule India was considered somewhat scandalous at the time.

terryt said...

Andrew. The problem is that the Andaman Islanders are not 'artificially distinctive'. They fit quite well with New Guinea natives. I agree that 'a pure ASI would be close to AAN than any of the extant samples'. But, as the AAN is definitely from the east, that would strongly indicate that ASI is eastern in origin too.

"I'd like to see more analysis of the two components and their differences".

Correct. And it is pretty obvious that New Guinea and Australia are more different than would be the case if the two had simply been separated by the rising sea level splitting Australia and New Guinea at some time during the recent geological past. It is long past time to differentiate these two as well. The 'belief' for long was in a single expansion. Now the belief has been expanded to two expansions. I'm absolutely certain reality is even far more complicated.