Archaeological excavations revealed artefacts used by homo Erectus as long as 500-200ky. The moistening at the end of the last glacial period brought expanded subsistence; drying then spread agriculture from 8-5kya, marking some of the earliest migrations and expansions. Around 5ky, the Indus Valley civilization began with the much matured Harappan civilization, whose de-urbanization led to the initiation of the Vedic period. Following this, displacements followed as foreign rulers established dominance in the Indian subcontinent: from Greeks and Scythians, to the first seeds of Muslim invasions, followed by the Mughal Empire. In this phase, India had diverse rulers (including Afghans, Turks, and Mongols).
The migrations led to widespread admixture of the Indian population, influencing language, culture, caste endogamy, metallurgical technologies, and more, resulting in a complex and differentiated structure. We set out to explore modern genetics correlating with migration routes into the subcontinent, and to study genomic variation in 48,570 SNPs genotyped in 1484 individuals, across 104 population groups. We propose, COGG (Correlation Optimization of Genetics and Geodemographics), a novel optimization method to model genetic relationships with social factors such as castes, languages, occupation, and maximize the correlation with geography. We calculated the shared ancestry between different caste groups in the subcontinent with other reference populations from Eurasia, using a novel approach. We tested different migration theories into the subcontinent using a Linear Discriminant Analysis of redescription clusters and study recombination events shaping the gene pool.
Our results demonstrate that COGG gives us significantly higher correlations, with p-values lower than 10-8. Identification of significant components among caste, language and genetics simplifies the complex structure. We identify varnas (Brahmins and Kshatriyas) to be closely related to reference Eurasian populations, whereas tribal groups show no shared ancestry with them and conclude that they resided in India before migration from Eurasia happened. We identify probable migration routes from Mongolia through Central Asia, and another via Anatolia into the subcontinent. Tibeto-Burman speaking populations share some ancestry with populations from East Asia; on the other hand, Austro-Asiatic speakers did not share ancestry with other Mon-Khmer language speaking populations.
A. Bose; D.E. Platt; L. Parida; P. Paschou; P. Drineas, "Genetic variation reveals migrations into the Indian subcontinent and its influence on the Indian society." (October 2016).
Much of this clarifies what has already been strongly suspected and nothing her could overcome the impact of whatever ancient DNA results we expect to see from South Asia in the near future. But, the lack of recent Eurasian origins in South Asian tribal populations, while always believed to be indigenous according to conventional wisdoms, had shown early but not definitive genetic indicators of more recent origins followed by regression to a less advanced subsistence strategy. So, that result is notable.
Davidski at Eurogenes has a low opinion of the paper and in particular its Anatolian origin hypothesis, but I'm content to wait and see in this case.