Razib Khan confirms at the Gene Expression blog, with independent DIY autosomal genetic analysis, what previous linguistic, archaeological and Y-DNA evidence had already strongly suggested: the Austro-Asiatic language subfamily of South Asia, called the Munda languages, now spoken by about 9 million people in the Northeastern part of South Asia were intrusive to South Asia from Southeast Asia, rather than the other way around. The Austro-Asiatic languages are best known for the Vietnamese and Khmer (Cambodian) languages. These languages have their roots probably in the hills of southern Yunnan in China," between 4000 BCE and 2000 BCE. (As an aside, Recent Y-DNA phylogeny evidence also supports the proposition that people who speak the Hmong-Mien languages are descended from the population that now speaks Austroasiatic Mon-Khmer languages.
This is particularly notable because the speakers of the Munda languages are predominantly "tribal" peoples, and there has been a tendency to associate tribal peoples of South Asia with the most ancient layer of South Asia's population history. But, while there may be an ancient South Asian substrate in linguistically Munda populations of South Asia that is particularly strong in the matrilineally inherited mtDNA of the Munda peoples and to a still strong extent in the autosomal DNA of these peoples, there is a clear Y-DNA and autosomal contribution from Southeast Aisa.
The Austro-Asiatic languages are believed to have expanded with agriculture and in particular with early rice farming agriculture, so its likely time of arrival can be associated with the earliest evidence for rice farming in South Asia.
According to the Wikipedia account:
The earliest remains of the grain in the Indian subcontinent have been found in the Indo-Gangetic Plain and date from 7000–6000 BC though the earliest widely accepted date for cultivated rice is placed at around 3000–2500 BC with findings in regions belonging to the Indus Valley Civilization. Perennial wild rices still grow in Assam and Nepal. It seems to have appeared around 1400 BC in southern India after its domestication in the northern plains. It then spread to all the fertile alluvial plains watered by rivers.
The "tribal" character of the current populations probably reflects language shift among populations that continued to be sedentary rice farmers (mostly to either the Indo-Aryan languages, or to the Tibeto-Burman languages, although the genetic evidence suggests that most of the latter languages were transmitted mostly demically rather than via language shift, particularly on the Y-DNA side); and language conservation among populations that "reverted" to a hunting-gathering or pastoralist means of subsistence in areas that became unsuitable for rice farming. The less sedentary populations were less easily subjugated by the Indo-Aryans.
As discussed below, this puts the arrival of the Munda languages in India earlier than the Indo-European languages (by 1000 to 1500 years), even earlier than the Tibet-Burman languages (by considerably more centuries), and roughly contemporaneously with the proto-language era of Dravidian immediately prior to its division into subfamilies.
The Indo-European Languages of South Asia
It is already widely agreed that the Indo-European languages of South Asia (mostly the Indo-Aryan languages, which are derived from Sanskrit, but also the Iranian languages and Nuristani languages found in Western Pakistan) are intrusive to South Asia and arrived around 2000 BCE to 1500 BCE.
Of course, the widely spoken Indo-European lingua franca of South Asia, which is English, is a legacy of colonial rule by the British, starting in the 1700s via the British East India Company and ending in 1947, when South Asia gained its independence from the United Kingdom.
The Tibeto-Burman Language Of South Asia
Genetic evidence, cultural evidence and historical evidence likewise point to the Tibeto-Burman languages spoken in South Asia's Northern and Eastern borders as intrusive more recently than either the Austro-Asiatic languages or the Indo-European languages. And, these languages are spoken only on the very fringe of South Asia, rather than penetrating deeply into it.
The Dravidian Languages
The case of the Dravidian languages is less settled. With the exception of a pocket of Dravidian language speakers who speak the Brahui language of Pakistan, these languages are concentrated to the South and East of India. Moreover, while the evidence of place names suggests that Dravidian languages were once spoken over a much wider geographic range in Southern India and the far Southeast of Pakistan than they are today (presumably displaced by Indo-Aryan languages ca. 1500 BCE in these places), per Wikipedia:
The Brahui, Kurukh and Malto have myths about external origins. The Kurukh have traditionally claimed to be from the Deccan Peninsula, more specifically Karnataka. The same tradition has existed of the Brahui. They call themselves immigrants. Many scholars hold this same view of the Brahui such as L. H. Horace Perera and M. Ratnasabapathy.
Linguistic and genetic evidence, taken together, support an origin for Brahui sometime not long after 1000 CE, mostly via language shift rather than a demic migration of large numbers of Dravidian speakers from South Asia.
The 73 Dravidian languages are fairly closely related and there are attested versions of source languages for two of its main subfamilies by the 4th and 5th centuries respectively (Old Tamil ca. 300 BCE and Old Telegu ca. 400 BCE). There was probably a single proto-Dravidian language that an ancestral to all modern Dravidian languages around 1500 BCE-2500 BCE. This linguistically estimated dated coincides with the advent of farming in Southern India, an event known as the South Indian Neolithic.
Thus, whether Dravidian was an indigenous language of Paleolithic South Indians, or was instead intrusive like India's other major languages, the particular Proto-Dravidian language that is ancestral to all of the modern Dravidian languages probably did not start to expand more than about a thousand years before the Indo-Aryans began to make their way into Southern India.
Autosomal genetics and most uniparental genetic markers point to ancestral South Indian genetic origins for the core of the linguistically Dravidian people (in some cases with an Indo-European demic infusion into the highest caste populations that underwent language shift) that is far, far older than the proto-Dravidian language.
Any superstrate population associated with the Dravidian language must have been thin, and the only suggestive trace that I have identifed as a possible proto-Dravidian marker would be Y-DNA haplogroup T, which is most common in India geographically right where proto-Dravidian should have originated, although haplogroup T is now found in populations with various linguistic affiliations. Y-DNA haplogroup T is found in appreciable frequencies among Somolians, Egyptians and Mesopotamians, and at low frequencies throughout the early Neolithic area, but isn't particular common in the Indus River Valley area where Y-DNA haplogroup T's sister clade, Y-DNA haplogroup L, is more common.
The linguistic origins of Dravidian are unclear. Some of the more plausible linkages to Dravidian linguistically have been to the Elamite language, to the Uralic languages, and to the fringe members of the Niger-Congo linguistic family that show the simplifying impact of linguistic neighbors that speak languages from other language families.
The hypothesis that Proto-Dravidian was once the language of the Indus River Valley Civilization (mostly in modern Pakistan and the adjacent deserts of India which were once fertile river beds), has very little support in the modern distillation of the linguistic, genetic, and archaeological evidence. The Indus River Valley civilization area and the Dravidian area are genetically distinct (the deep Ancestral North Indian v. Ancestral South Indian divide). The archaeological evidence supports only a few border trading posts between the two civilizations. There is no real evidence of a Dravidian substrate in the earliest Rig Vedic Sanskrit texts (Dravidian influence comes only later through word borrowing and areal influences). And, the characteristic crops of the Dravidian linguistic area seem to be derived from Sahel African crops rather than from Fertile Crescent crops, as they are better adapted to the local seasons. This is one of the reason that Fertile Crescent agriculture didn't immediately spread to South India once it arrived in the Indus River Valley area about four thousand years before the South Indian Neolithic. There are cultural links in addition to crops to the culture of Sahel farmers.
This also tends to disfavor the Elamo-Dravidian theory of this language's origins, since that theory generally assumes that Dravidan reached India as an extension of the Harappan culture of the Indus River Valley which was, unlike South India, adjacent to Southern Iran where the Elamite language was spoken into the early historically attested era in the region.
There are few tiny, near moribund possible exception of some language isolates in the Andaman Islands (which show genetic linkage to South India) and the highlands of the Himalayas, sometimes grouped into a conjectural and residual Indo-Pacific language family.
But, otherwise, all of the languages of South Asia arrived there from outside South Asia within the last five thousand years or so, with the possible exception of one geographically tiny pre-Neolithic Dravidian language family dialect if that language is indigenous rather than intrusive within the last ten thousand years or so.
The language of Paleolithic India as of about 8000 BCE are probably entirely lost, as is the Harappan language, unless one takes the minority view that proto-Indo-European was Harappan rather than the majority view that proto-Indo-European was a language of the Pontic-Caspian steppe. The minority view is far less crazy than it is often decried as being, but still lacks the multidisciplinary evidence to support it that the Kurgan hypothesis of Indo-European language origns does.
The main argument for Harappan as proto-Indo-European is the absence of any easily discernible substrate component to the early Sanskrit writings, against a presumed proto-Indo-European background validated with early West Eurasian languages and Tocharian.
But, this should not be misconstrued to say that the major languages of South Asia involved demographic replacement. Genetically, South Asia's indigenous Paleolithic peoples appear to be more strongly represented than almost anywhere else in the world, as evidenced by private Y-DNA lineages, private mtDNA lineages and the nature and distributions of South Asia's autosomal genetics in relation to those elsewhere in Asia and Europe. The case the a majority of non-African modern humans lived in South Asia for much of the period from ca. 75,000 years ago until perhaps 30,000 years ago, is quite strong. There are clear genetic superstate populations and there is no longer any pure ancestral South Indian population (the Andamanese probably come closest). But, the ASI percentage of ancestry in much of India is quite high.