Monday, September 12, 2011

Genetic Links of Mon-Khmer and Hmong-Mien Peoples Confirmed


A PCA chart and branching tree cluster analysis of various Asian linguistic communities from the linked paper from PLOS that doesn't fully reproduce the conclusions reached by sublineage analysis in the cited paper.

An expanded study of Y-DNA population genetics in Southeast Asia by Chinese researchers, confirms that close genetic ties of Mon-Khmer (a.k.a. Austro-Asiatic when the Munda of India are also included) and Hmong-Mien peoples, at least in the patriline, focusing on the O3a3b-M7 Y-DNA haplogroup where the Mon-Khmer appearing at a basal position while Hmong-Mien and Tibeto-Burmese individuals with this hapologroup have subhaplogroups more on the fringes of this patriline tree. O3a3c1-M117, the dominant East Asian haplogroup shows a similar pattern. These haplogroups also corroborate the Southeast Asian origins of most of the East Asian Y-DNA patrilines.


The study also provides a helpful contextualizing recap of prior findings about the population genetic history of East Eurasia.

There are 147 Mon-Khmer languages are spoken by more than 90 million people, mostly in Southeast Asia, of which the dominant languages of Vietnam, Laos and Cambodia are the best known (there are isolated pockets elsewhere).

There are 38 Hmong-Mien languages spoken by about 13 million people, of whom 12 million are in Southern China and the rest are in Southeast Asia, in separated linguistic islands across that region.

Sino-Tibetan languages (e.g. Chinese and Tibetan and Burmese) and Tai-Kendai languages (e.g. Thai) are also spoken in Southeast Asia, but are believed for multiple reasons historical, archaeological and genetic to be arrivals from the North and East in the last 3,000 years. Thus, Mon-Khmer and Hmong-Mien language speakers are likely to be more closely related to the aboriginal peoples of Southeast Asia.

The paper notes that 57% of East Asian men have some Y-DNA derivative of O3-M175, with subhaplogroups O3-M122, O2a-M95 and O1a-M119 all clearly entering East Asia from the south. As study of Hainan aboriginal people also concludes that while O1a-M119 and O2a-M95 probably arrived in East Asia via a coastal southern route in the east of East Asia, that O3-M122 probably entered East Asia by another route that could be revealed by study of the subhaplogroups of O3-M122.  This study focused on exploring the origins of East Asian men whose patriline ancestors did not arrive in East Asia via a coastal southern route in the east of East Asia.

Key Conclusions

The abstract to the paper states:
These patterns indicate an early unidirectional diffusion from Southeast Asia into East Asia, which might have resulted from the genetic drift of East Asian ancestors carrying these two haplogroups through many small bottle-necks formed by the complicated landscape between Southeast Asia and East Asia. The ages of O3a3b-M7 and O3a3c1-M117 were . . . followed by the emergence of the ancestors of HM lineages out of MK and the unidirectional northward migrations into East Asia.
The study also briefly, and without much in the way of helpful illustrations with maps, speculates on the geographic conditions in pre-history that could have shaped these developments and the paths these peoples could have taken from Southeast Asia into China.

A genetically based conclusion that the Hmong-Mien peoples are an offshoot of the Mon-Khmer peoples, something that linguistic analysis has not reached consensus upon, is a finding of considerable importance in parsing out the prehistory of Southeast Asia, even though the hypothesis that Mon-Khmer and Hmong-Mien both belong to a macrolinguistic family (sometimes called Yangtzean) has existed for some time as one of several efforts to link the languages of South China and Southeast Asia into linguistic macrofamilies.

If anything, linguistic geography, showing the Hmong-Mien to be scattered, while the Mon-Khmer had retained a more compact distribution had been argued to show the Hmong-Mien's greater antiquity, although this evidence itself is not very strong. The state of the academic discussion of linguistic origins is summed up in the Wikipedia article on the concept of Urheimat to which I have contributed.

Tai-Kadai languages also appear to be spoken by populations closely related to Mon-Khmer and Hmong-Mien peoples, although not quite so closely related as those two populations are to each other.  The Austronesian language speakers, in contrast, who have their origins in the aboriginal peoples of Taiwan and possible their ancient coastal neighbors in South China, end up looking like something of an outgroup to the other three, and as an outgroup even to a group composed of the Mon-Khmer, Hmong-Mien, Tai-Kadai, and Sino-Tibetans, almost to the same extent as Altaic language speakers.

Details of the Y-DNA Haplogroup Findings 

In addition to various haplogroup O Y-DNA types, there were a number of individuals in multiple subsets of the HM and MK language groups who had Y-DNA haplogroups C and F (which are a clade shared with O which is derived) and D1 (which is more closely related to the dominant African Y-DNA haplogroup E).  Both C and D viewed to be among the earliest waves of modern human migration into Asia following the Out of Africa event. 

Y-DNA haplogroup DE* was tested for and not found. One Hmong-Daw individual was found with Y-DNA haplogroup Q1a1 out of 51 individuals in that community, and P* was found in just two individuals and a total combined sample of 1,652 HM and MK individuals, outlier results that might indicate that these haplogroups associated with North Asian populations wasn't part of the aboriginal population at all. O1a2-M110 which is associated with Thai-Kadai population was found in only only Palyu sample of the four dozen groups studied, and the Palyu have historically had Kadai influence sometime in the last 3,000 years.

Mutation Dating

Given recent findings that cast serious doubt on Y-DNA mutation rate dating, the absolute time period estimates for this migration don't bear mention, although the conclusion that the most common Y-DNA haplogroups of Southeast Asia, O2a-M95 (which accounts for 87.18% MK men in the study and 45.16% of HM men in the study and is also common in the Munda and fairly common in Sino-Tibetan, Tai-Kendai and Austroasiatic populations) and O3-M122 (of which the two O3 haplogroups studies are subtypes that appear at high frequency in Mon-Khmer and Hmong-Mien but low frequency in other groups), are much older than the studied subhaplogroups is credible, as it the conclusion that the studied subhaplogroups are similar in age.

Aren't Autosomal Whole Genome Studies Better?

Autosomal whole genome studies have their place.  They capture contributions from populations that patrilines and matrilines that make up only a small percentage of the genome miss and can be much more informative when sample sizes of populations old enough to have reached fixation of most of their population genetic gene frequenceis are studied.  As the Gene Expression blog has recently demostrated with do it yourself genetic studies of one or two genomes from Tutsi and Malagasy populations, the difference between N=0 and N=1 or N=2 in whole genomes is immense.  Studies of mtDNA and Y-DNA take much larger samples to be informative.  But, mtDNA and Y-DNA branching tree analysis (i.e. phylogeny) is much more informative for questions of cause and effect that can show the direcetion of genetic links in a way that autosomal studies cannot.

Combined Y-DNA, mtDNA and autosomal DNA, in addition to historical records, linguistics, geography, artifacts, archaeclimate data, ancient DNA, and DNA studies of species like domesticated animals and food crops that were fellow travelers with prehistorical populations, are best, of course.  And, Y-DNA alone always leaves open the question of whether there was a dominance or conquest settlement pattern, or patrilocal marriage patterns, that makes the Y-DNA patriline patterns significantly different from the mtDNA matriline patterns. 

Frequently, as noted in a new paper cited in a recent post by Dienekes, where there is a discrepency, the Y-DNA patterns show more migration over long distances into different geographic zones while the matriline patterns are more representative of the indigeneous population.  In an ideal world, we could link the Y-DNA, mtDNA and autosomal DNA patterns incorporating the other evidence to form a coherent picture of a rather complex story of prehistoric population migrations and admixtures, enhancing the credibility of all of the related lines of evidence.  Linguistic history tend to more closely follows patrilines than matrilines, so Y-DNA phylogeny is more informative as a means of confirming linguistic phylogenies.

But, Y-DNA studies of patrilines of secondary frequency are quite informative for the purposes for which they are used in this study.  Indeed, they are better for this purpose than autosomal data.  Autosomal data could corroborate the close genetic relationship of the Mon-Khmer and Hmong-Mien peoples (or confound it), but could not show which population was most likely to have descended from the other.  In contrast, an mtDNA study, if it confirmed the Y-DNA conclusion regarding the Mon-Khmer and Hmong-Mien peoples, would be particularly powerful and would also provide more accurate dating  of the splits than Y-DNA dating can provide due to its inherently problematic issues such as loci that mutate at different rates.


The main paper is: Cai X, Qin Z, Wen B, Xu S, Wang Y, et al. (2011) Human Migration through Bottlenecks from Southeast Asia into East Asia during Last Glacial Maximum Revealed by Y Chromosomes. PLoS ONE 6(8): e24282. doi:10.1371/journal.pone.0024282

The paper cited by Dienekes is: Peter Forster, Colin Renfrew (2011) Mother Tongue and Y Chromosomes, Science 9 September 2011: Vol. 333 no. 6048 pp. 1390-1391 doi: 10.1126/science.1205331                       


Maju said...

I must question the idea that an NJ tree in Y-DNA (or mtDNA) studies is more helpful than in autosomal ones. You say:

"Autosomal data could corroborate the close genetic relationship of the Mon-Khmer and Hmong-Mien peoples (or confound it), but could not show which population was most likely to have descended from the other".

I say that Y-DNA (or mtDNA) can't tell us either. Much less using a mere NJ tree. It may be suggestive (and in this case it is congruent with my idea of South to North general direction of migration) but it is evidence of nothing at all.

Much less it is any evidence of linguistic affinity, that's something that only Linguistics can address (if at all).


1. It is a pity that C has not been tested for its subclades, specially C3 and C5.

2. D(xD1) is found only among the Laven (Southern Laos), and this is IMO quite suggestive of the genetic flow for this haplogroup having gone through (yet unsampled) Burma, rather than the South China Sea's coast. In Hong-Shi 2008, D* appeared to have a Thailand orign and indeed only D1 was reported among Hmong-Mien (no Austroasiatics were sampled then), with D2 and D3 being almost specific of Japan and Tibet respectively. D1 could well be of Indochina origin, as well as D*.

Andrew Oh-Willeke said...

The papers with detailed Y-DNA haplogroup D phylogenies show that the subhaplogroups found in the Andaman Islands, India and Tibet form a common cluster (including D* in Tibet) for which Japan's D2 and D3 are outliers. North Asian Y-DNA hg D haplogroups are pretty much fringes of the Tibetan phylogeny. Y-DNA hg D in India is associated with ASI autosomal genetics.

Thus, I see D making a path from South Asia to Southeast Asia from which it seeds the Andaman Islands.

There is no Y-DNA hg D in Australian Aborigines or Melanesians (they have M, N and R by the maternal side, and C and MNOPS (a branch of K and F) in the paternal one), and it is not associated with elevated levels of Denisovian admixture autosomally. So, it can't be in the first wave of modern human migration into Southeast Asia and East Asia on the coastal route ca. 50,000 years ago. But, the Jomon Japanese arrive 30,000 years ago, presumably carrying the forebears of D2 and D3, and the Y-DNA hg D in Tibet probably dates to the pre-LGM era. It also seems likely that the female counterpart to all of the Y-DNA hg D populations was pretty much exclusive mtDNA hg M (x derived hgs of M).

I agree that genetic flow for D went through Burma rather than the South China Sea coast. I would put an origin for D1 in Eastern India or Burma. D* is trickier because this has to derive from DE which is found in both West Africa and IIRC Tibet.

My personal guess is that DE was a small secondary boat based Out of Africa wave of migration (40,000 years ago or so) that swiftly reached India and then was absorbed into local ASI populations in India and Southeast Asia outside of the Andaman Islands and Japan and Tibet, which had to seek islands or marginal territory for themselves since Asia was already inhabited.