Thursday, December 31, 2020

An Overview Of East Asian and Southeast Asian Historical Linguistics


There are nine main language families in Southeast Asia and East Asia: Japonic (Japanese and a couple of related languages spoken on remote Japanese languages), Korean, Ainu (a language isolate spoken by indigenous people of Northern Japan), Sino-Tibetan (including Chinese, the Tibetan languages and Burmese), Austronesian (also spoken in Polynesia), Thai-Kadai a.k.a. Kra-Dai (of which the language of Thailand is best known), Austroasiatic a.k.a. Mon-Khmer (the most famous of which are Vietnamese and the Munda languages of India and also including the Khmer languages of Cambodia) and Hmong-Mien (a language family of an important minority population in Southeast Asia and South China). 

Genetic and linguistic evidence, however, shows that the Hmong-Mien language family is an offshoot of the Austroasiatic language family. There is strong but not universally accepted evidence that Austroasiatic languages and Thai-Kadai languages are part of a larger macro-language family. There is strong but not universally accepted evidence that the Japonic and Korean languages are part of the same language family.

Sino-Tibetan languages have their roots in the North Chinese millet farming Neolithic revolution on the Yellow River. 

The other four language families have their roots in the Southern Chinese rice farming Neolithic revolution on the Yangtze River. The Austronesian and possibly the Thai-Kadai language families originate in the lower Yangtze River basin. The Austroasiatic, Hmong-Mien and possibly the Thai-Kadai language families originate in the middle Yangtze River basin.

It is likely that none of the major languages of Southeast Asia were spoken there prior to about 2200 BCE. Prior to that point, Southeast Asia was largely populated by Hoabinhian hunter-gatherer populations whose languages are largely lost.

Y-DNA evidence suggests that Austronesian language family speakers are an out group to the other four major Southeast Asian and East Asian language families (Sino-Tibetan, Austroasiatic, Hmong-Mien, Thai-Kadai), with people speaking those languages linked to Austronesians only at a greater time depth.[1] This could be a misleading signal driven by the admixture of Austronesians with Austro-Melanesian populations, however.

The World In 4100 BCE

In 4100 BCE, domesticated rice was not available anywhere in the world except (1) the Yangtze River basin and (2) in the Ganges River basin in India (where it was just a few hundred years old as a domesticated crop). It would be domesticated in South American about a century after that. 

At that time, domesticated millet was not available anywhere in South Asia, Southeast Asia or East Asia except in the Yellow River basin and its near vicinity.

The World In 3100 BCE 

In 3100 BCE, farming was absent, and Paleolithic hunter-gatherers were the sole inhabitants of Japan, Taiwan, mainland Southeast Asia, island Southeast Asia, Papuan New Guinea and the immediately adjacent islands, the Philippines, Australia, and all of South Asia to the east of the Indus river Valley, the now barren Sarasvati river basin and the Ganges river basin. 

None of the languages spoken in Southeast Asia (other than the Papuan languages) or Taiwan or the Philippines today were spoken there. The only language now spoken in in the Japanese islands that existed then was Ainu.

None of the languages now spoken in South Asia or Iran today were spoken there, with the possible exception of the Dravidian languages. The Dravidian languages, if spoken at all, were confined to the small tribes of hunter-gatherers who would adopt farming for the first time in Southern India in the South Indian Neolithic Revolution. 

Oceania from Hawaii and Easter Island to Guam and the Mariana Islands to Fiji and Tonga and the Cook Islands and New Zealand, and also Antarctica, were places where no primate, let alone a modern human, had ever set foot.

In Africa, Bantu expansion had not yet begun, so pretty much all of Africa to the South of the Congo River basin and the jungles of the Congo, and much of East Africa, was inhabited only by hunter-gatherers, and racial and linguistic diversity in Africa was much greater. Likewise, the ethnogenesis of the Berber people had not yet occurred.

In Europe, people with the new predominant Northern European phenotypes (e.g. people with blond hair and blue eyes) did not exist and Indo-European languages were confined to Eastern Europe. The percentage of ancestry of modern Irish people traceable to 3100 BCE is close to negligible as it underwent near total population replacement.

The Ainu Language

The Ainu language is derived from one of the languages of the Jomon hunter-gathers of Japan who inhabited the area from the paleolithic era. It was probably influenced by or related to Paleo-Siberian languages. All of these languages are now nearly moribund.

The Japonic and Korean Languages

The Japanese languages arose when the Yayoi people migrated to Japan from Korea ca. 400 BCE to 300 BCE, derived from the pre-existing language, probably a sister language to Korean, spoken by these people. The deep origins of the Korean language in pre-history aren't well understood.

The Austronesian Languages

The Austronesian expansion from Taiwan began about 3000 BCE, and this expansion of one of the indigenous languages of Formosa is arguably really what defines this language family.[2] The indigenous Formosan languages are probably all derived from the Neolithic Dapenkeng culture that abruptly appeared and quickly spread around the coast of the island around 4000 BCE to 3000 BCE (displacing prior Negrito hunter-gatherer populations), only preceding the migration of one of those cultures to other islands by within the margin of error of available dating methods.[3] The particular archeological culture of mainland South China from which this culture was derived is unresolved, in part because archaeological data from that time period is sparse and undeveloped.[4] 

The Sino-Tibetan Languages
[T]he main ancestry of high-altitude Tibeto-Burman speakers originated from the ancestors of Houli/Yangshao/Longshan ancients in the middle and lower Yellow River basin, consistent with the common North-China origin of Sino-Tibetan language and dispersal pattern of millet farmers.[5]
This conclusion is contrary to many 20th century and early 21st century proposals (putting a homeland in Northeast India or Southern China) but now probably represents conventional wisdom in the field.[5]

So, the Sino-Tibetan languages date the North Chinese Neolithic Revolution which was millet farming and independent in origin from South Chinese and Southeast Asian rice farming. The earliest archaeological culture in this region is the Nanzhuangtou culture which started around 8500 BCE, but it isn't clear that that culture was in linguistic continuity with the cultures that gave rise to the Sino-Tibetan language family. Two major studies in 2019 favor this model but assign a more recent origin to the language family than the first Neolithic culture of the region:
Zhang et al. (2019) performed a computational phylogenetic analysis of 109 Sino-Tibetan languages to suggest a Sino-Tibetan homeland in northern China near the Yellow River basin. The study further suggests that there was an initial major split between the Sinitic languages and the Tibeto-Burman languages approximately 4,200 to 7,800 years ago [2200 BCE to 5800 BCE] (with an average of 5,900 years ago [3900 BCE]), associating this expansion with the Yangshao culture and/or the later Majiayao culture. Sagart et al. (2019) also performed another phylogenetic analysis based on different data and methods to arrive at the same conclusions with respect to the homeland and divergence model, but proposed an earlier root age of approximately 7,200 years ago [5200 BCE], associating its origin with the late Cishan and early Yangshao culture.[6]
The Northern millet farmers and Southern Rice farmers started to integrate into a common culture in which the Han Chinese component eventually became dominant around 3500 BCE.[7]

Other Southeast And East Asian Language Families

Rice was domesticated in Southern China around 7400 BCE.[8] But that doesn't mean that the initial rice domesticating culture was in linguistic continuity with the first Austro-Asiatic populations. And the time depth of the various language of the region is muddy. 
There are two most likely centers of domestication for rice as well as the development of the wetland agriculture technology. 
The first, and most likely, is in the lower Yangtze River, believed to be the homelands of the pre-Austronesians and possibly also the Kra-Dai, and associated with the Kauhuqiao, Hemudu, Majiabang, Songze, Liangzhu, and Maqiao cultures. It is characterized by pre-Austronesian features, including stilt houses, jade carving, and boat technologies. Their diet were also supplemented by acorns, water chestnuts, foxnuts, and pig domestication.

The second is in the middle Yangtze River, believed to be the homelands of the early Hmong-Mien-speakers and associated with the Pengtoushan, Nanmuyuan, Liulinxi, Daxi, Qujialing, and Shijiahe cultures. Both of these regions were heavily populated and had regular trade contacts with each other, as well as with early Austroasiatic speakers to the west, and early Kra-Dai speakers to the south, facilitating the spread of rice cultivation throughout southern China.

By the late Neolithic (3500 to 2500 BC), population in the rice cultivating centers had increased rapidly, centered around the Qujialing-Shijiahe culture and the Liangzhu culture. Liangzhu and Shijiahe declined abruptly in the terminal Neolithic (2500 to 2000 BC). With Shijiahe shrinking in size, and Liangzhu disappearing altogether. This is largely believed to be the result of the southward expansion of the early Sino-Tibetan Longshan culture. ... This period also coincides with the southward movement of rice-farming cultures to the Lingnan and Fujian regions, as well as the southward migrations of the Austronesian, Kra-Dai, and Austroasiatic-speaking peoples to Mainland Southeast Asia and Island Southeast Asia. A genomic study also indicates that at around this time, a global cooling event (the 4.2 k event) led to tropical japonica rice being pushed southwards, as well as the evolution of temperate japonica rice that could grow in more northern latitudes.[7]  
The Tai-Kadai Languages

The Tai-Kadai languages are now most widely spoken in the mainland Southeast Asian country of Thailand, but, this is actually the most recent language family to arrive in Southeast Asia from Southern China.
The high diversity of Kra–Dai languages in Southern China points to the origin of the Kra–Dai language family in Southern China. The Tai branch moved south into Southeast Asia only around 1000 CE.[9]
Genetically, the biggest difference between the Tai-Kadai people within Southeast Asia, and the Austroasiatic populations of Southeast Asia, both of whom have origins in the early Neolithic rice farmers of the Yangtze River basin of Southern China, is that Austroasiatic people, having arrived earlier, have more Hoabinhian hunter-gatherer admixture.[10]

There is strong, but not universally accepted evidence that the Thai-Kadai language family and the Austronesian language family are part of a larger macro-language family.[11]

The Austro-Asiatic and Hmong-Mien Languages

The Austroasiatic language family is probably derived from Southern Chinas Neolithic Rice farmers who trace their culture origins to the domestication of Chinese rice in an independent domestication event. Ancient DNA suggests that this is was the first of the language families of modern Southeast Asia to be spoken there by people who resided there.

The expansion from South China to Southeast Asia took place around 4,000 years ago and close in time to the spread of the Austronesians in Southeast Asia. They displaced Hoabinhian hunter-gatherer populations in Southeast Asia.[12] 

Ancient DNA shows populations genetically similar to modern Austro-Asiatic populations in Vietnam, Laos, and mainland Malaysia by 2200 BCE. [13] But the archaeological record is thin in the relevant time period from the South Chinese Neolithic revolution to 2200 BCE, so dating it is tricky. Still, the oldest ancient DNA may have been from close to the time that the language family arrived there since:
The spread of japonica rice cultivation to Southeast Asia started with the migrations of the Austronesian Dapenkeng culture into Taiwan between 3500 and 2000 BC (5,500 BP to 4,000 BP). The Nanguanli site in Taiwan, dated to ca. 2800 BC, has yielded numerous carbonized remains of both rice and millet in waterlogged conditions, indicating intensive wetland rice cultivation and dryland millet cultivation. A multidisciplinary study using rice genome sequences indicate that tropical japonica rice was pushed southwards from China after a global cooling event (the 4.2k event) that occurred approximately 4,200 years ago.[13] 

A genetically based conclusion that the Hmong-Mien peoples are a comparatively recent offshoot of the Mon-Khmer peoples, something that linguistic analysis has not reached consensus upon, is a finding of considerable importance in parsing out the prehistory of Southeast Asia, even though the hypothesis that Mon-Khmer and Hmong-Mien both belong to a macrolinguistic family (sometimes called Yangtzean) has existed for some time as one of several efforts to link the languages of South China and Southeast Asia into linguistic macrofamilies. An expanded study of Y-DNA population genetics in Southeast Asia by Chinese researchers, confirms that close genetic ties of Mon-Khmer (a.k.a. Austro-Asiatic when the Munda of India are also included) and Hmong-Mien peoples, at least in the patriline, focusing on the O3a3b-M7 Y-DNA haplogroup where the Mon-Khmer appearing at a basal position while Hmong-Mien and Tibeto-Burmese individuals with this hapologroup have subhaplogroups more on the fringes of this patriline tree. O3a3c1-M117, the dominant East Asian haplogroup shows a similar pattern.[1] 


[1] Cai X, et al. "Human Migration through Bottlenecks from Southeast Asia into East Asia during Last Glacial Maximum Revealed by Y Chromosomes." PLoS ONE 6(8): e24282 (2011). doi:10.1371/journal.pone.0024282 

[3] Wikipedia article on History of Taiwan: Early settlement

[6] Wikipedia article on Sino-Tibetan languages: Homeland citing Zhang, Menghan; Yan, Shi; Pan, Wuyun; Jin, Li (2019), "Phylogenetic evidence for Sino-Tibetan origin in northern China in the Late Neolithic", Nature, 569 (7754): 112–115, and Sagart, Laurent; Jacques, Guillaume; Lai, Yunfan; Ryder, Robin; Thouzeau, Valentin; Greenhill, Simon J.; List, Johann-Mattis (2019), "Dated language phylogenies shed light on the history of Sino-Tibetan", Proceedings of the National Academy of Sciences of the United States of America, 116 (21): 10317–10322.

[7] Wikipedia article on Rice: Origins in China

[9] Wikipedia article on Kra-Dai languages

[11] Wikipedia article on Austro-Tai languages.

[13] Wikipedia article on Rice: Southeast Asia


ramones1986 said...

If Austroasiatic and Hmong-Mien were confirmed to be related genetically, that would be intriguing.

In relation to Sino-Tibetan originating in northern China, I wonder if there were some clans (not necessarily the Di-Qiang) migrated northward to present-day Mongolia and adapted the steppe lifestyle.

andrew said...

It is intriguing and in my humble opinion, pretty much established.

I think that your second conjecture is doubtful.

ramones1986 said...

Well, I thought it would be implausible/incorrect at best, but at least I shared it.

Just forget about that, by the way. ✌️

Samuel Andrews said...

Awesome Post

DDeden said...

Did you intend to claim that the central African /Congo rainforest was uninhabited before 5ka?

ramones1986 said...

By the way, will there be any update/s on the origins of both Japonic and Koreanic languages?

andrew said...

@DDeden Not uninhabited, but lacking farmers or herders..

@ramones1986 I recently updated my conjectures page to note the conclusion that Japonic and Koreanic languages are very likely sister languages in the same family even if the larger Altaic family does not hold, but may or may not have a post spelling it out for a while. It is also remarkable how very, very little Jomon/Ainu substrate influence there is on the Japonic languages there is, despite the population genetic contribution of the Jomon to the modern Japanese being something on the order of 1/4 to 1/2 despite being a conquered hunter-gatherer population (there are residual non-Ainu impacts in NE Japan which was taken ca. 1000 CE).

andrew said...

Revised to address points in comments.

DDeden said...

WRT Congo, ok, I agree.

WRT Ainu, I strongly disagree that Ainu stemmed from Jomon. Ainu oral history differentiates themselves from the early occupants. They did converge, resulting in mixture, unfortunately genetic sampling of mixed descendants reported to be Ainu has produced confusion. I think that Old Japanese had a Jomon substrate, but not sure if identifiable.

BTW Hmong trace genetic influence in modern Japanese indicate some contact in the past.

DDeden said...

Siberian DNA research

Abstract: We present genome-wide data from 40 individuals dating to c.16,900 to 550 years ago in northeast Asia. We describe hitherto unknown gene flow and admixture events in the region, revealing a complex population history. While populations east of Lake Baikal remained relatively stable from the Mesolithic to the Bronze Age, those from Yakutia and west of Lake Baikal witnessed major population transformations, from the Late Upper Paleolithic to the Neolithic, and during the Bronze Age, respectively. We further locate the Asian ancestors of Paleo-Inuits, using direct genetic evidence. Last, we report the most northeastern ancient occurrence of the plague-related bacterium, Yersinia pestis. Our findings indicate the highly connected and dynamic nature of northeast Asia populations throughout the Holocene.

Science Advances

ramones1986 said...

Specific link, please...

ramones1986 said...
This comment has been removed by the author.
ramones1986 said...


So, you think that Ainu came from the Okhotsk culture (northeast), as suggested in this paper:


DDeden said...

DDeden said...

I can't be so specific, but yes, I think Ainu came from Amur and met Jomon in Japan. I have conjectured that the Ainu of Japan had been linked to the Aynu of the Tarim Basin, based on the town Khotan there and the fact that the word for village in Ainu is kotan, and the somewhat caucasian appearance of both Ainu and Aynu.

DDeden said...

Note that the map shows the Okhotsk (Ainu) barley route had passed not far from the northern Tarim Basin where Khotan (Aynu) was located.

andrew said...

Thanks for all the input. I'll look into it.

AlanL said...

So if the current Tibetans are post-Neolitihic migrants, when and from whom did they pick up the (Denisovan?) high altitude adaptations? From previous hunter-gatherers? Or Denisovan yetis still extant in the last 5000 years?

andrew said...


Previous hunter-gathers.

Keep in mind too that the trajectory of gene frequencies for genes that convey selective fitness are very different from fitness neutral genes that are merely ancestry informative. In a tiny number of instances of introgression will fairly rapidly rise to a high percentage of the population in the former case. Small introgressions have a high probability of leaving the gene pool entirely or being diluted to a small percentage in the latter case.