Basu et al., Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure, PNAS, Published online before print January 25, 2016, doi: 10.1073/pnas.1513197113 (Open access).India, occupying the center stage of Paleolithic and Neolithic migrations, has been underrepresented in genome-wide studies of variation. Systematic analysis of genome-wide data, using multiple robust statistical methods, on (i) 367 unrelated individuals drawn from 18 mainland and 2 island (Andaman and Nicobar Islands) populations selected to represent geographic, linguistic, and ethnic diversities, and (ii) individuals from populations represented in the Human Genome Diversity Panel (HGDP), reveal four major ancestries in mainland India. This contrasts with an earlier inference of two ancestries based on limited population sampling. A distinct ancestry of the populations of Andaman archipelago was identified and found to be coancestral to Oceanic populations. Analysis of ancestral haplotype blocks revealed that extant mainland populations (i) admixed widely irrespective of ancestry, although admixtures between populations was not always symmetric, and (ii) this practice was rapidly replaced by endogamy about 70 generations ago, among upper castes and Indo-European speakers predominantly. This estimated time coincides with the historical period of formulation and adoption of sociocultural norms restricting intermarriage in large social strata. A similar replacement observed among tribal populations was temporally less uniform.
The samples cover 13 tribal populations (three Dravidian, four Munda, one Dravidian and Munda, two Ongan, one Indo-European and two Tibeto-Burman), 1 lower-middle caste population (Dravidian), and 6 upper caste populations (four Indo-European, one Dravidian and one Tibeto-Burman). Thus, it may be under-representative of the vast middle of Indian society (lower caste Indo-European and Dravidian populations).
Further inspection reveals that the two new mainland genetic components, one for Austro-Asiatic Munda speaking populations and one for Tibeto-Burman populations, are completely expected and unsurprising, although the new component for Andaman and Nicobarese Islanders, previously equated with ASI, is an innovation, albeit a modest one (the supplemental materials show that the Andamanese are indeed very distinct from any mainland population and cluster with Papuans instead). As the paper explains:
All of the South Asian mainland populations studies have ANI, ASI and AAA admixture, and while ATB admixture was much more geographically defined, some populations that are not linguistically TB had more than trace amounts of ATB admixture.Contemporary populations of India are linguistically, geographically, and socially stratified, and are largely endogamous with variable degrees of porosity. We analyzed high quality genotype data, generated using a DNA microarray (Methods) at 803,570 autosomal SNPs on 367 individuals drawn from 20 ethnic populations of India (Table 1 and SI Appendix, Fig. S1), to provide evidence that the ancestry of the hunter gatherers of A&N is distinct from mainland Indian populations, but is coancestral to contemporary Pacific Islanders (PI). Our analysis reveals that the genomic structure of mainland Indian populations is best explained by contributions from four ancestral components. In addition to the ANI and ASI, we identified two ancestral components in mainland India that are major for the AA-speaking tribals and the TB speakers, which we respectively denote as AAA (for “Ancestral Austro-Asiatic”) and ATB (for “Ancestral Tibeto-Burman”). Extant populations have experienced extensive multicomponent admixtures. Our results indicate that the census sizes of AA and TB speakers in contemporary India are gross underestimates of the extent of the AAA and the ATB components in extant populations.
The paper's insights about when the caste system came into existence in its modern endogamous form is also notable. The leading schools of thought in conventional wisdom had been that caste either arose immediately at the time of the arrival of the Indo-Aryans, ca. 4000 years ago, or emerged only under British colonial rule ca. 500 years ago. There is now convincing evidence to suggest that both of these theories are wrong (link added editorially).
Since the genetic evidence points to strict caste endogamy as arising during the historic era, historians may be able, with this hint, to more precisely pin down what occurred from written documentation from the people who experienced that event. They may have the dates wrong, however. The authors assume a 22.5 year generation, when the standard assumption of generation length in this field is usually about 29 years, which would point to caste solidification three or four centuries earlier, although the historical case tends to favor the more recent Gupta era date.We have inferred that the practice of endogamy was established almost simultaneously, possibly by decree of the rulers, in upper-caste populations of all geographical regions, about 70 generations before present, probably during the reign (319–550 CE) of the ardent Hindu Gupta rulers. The time of establishment of endogamy among tribal populations was less uniform.
The paper does not extensively examine the affinities and structure of the ANI component and its arrival in South Asia, for example, to attempt to discern if there was more than one wave of ANI introgression or to see which West Eurasian populations show the greatest affinity to it.
UPDATE: Razib offers some methodological criticism focused on a misunderstanding of what ADMIXTURE does which understates the extent to which populations are admixed and faults the investigators for not incorporating 1000 Genomes data and for not doing f4 and D statistic analysis. (Ironically, his title, "South Asians are not descended from four populations" is probably not true and not supported by his analysis.)