Friday, October 25, 2024

New Direct Dark Matter Search Exclusions From Lux-Zeplin

The Lux-Zeplin direct dark matter detection experiment continues to squeeze the parameter space of WIMP dark matter, which has for all practical purposes, been ruled out.

We report results of a search for nuclear recoils induced by weakly interacting massive particle (WIMP) dark matter using the LUX-ZEPLIN (LZ) two-phase xenon time projection chamber. This analysis uses a total exposure of 4.2 ± 0.1 tonne-years from 280 live days of LZ operation, of which 3.3 ± 0.1 tonne-years and 220 live days are new. A technique to actively tag background electronic recoils from 214Pb β decays is featured for the first time. Enhanced electron-ion recombination is observed in two-neutrino double electron capture decays of 124Xe, representing a noteworthy new background. After removal of artificial signal-like events injected into the data set to mitigate analyzer bias, we find no evidence for an excess over expected backgrounds. World-leading constraints are placed on spin-independent (SI) and spin-dependent WIMP-nucleon cross sections for masses ≥9 GeV/c^2. The strongest SI exclusion set is 2.1×10^−48 cm^2 at the 90% confidence level at a mass of 36 GeV/c^2, and the best SI median sensitivity achieved is 5.0×10^−48 cm^2 for a mass of 40 GeV/c^2.
J. Aalbers, et al., "Dark Matter Search Results from 4.2 Tonne-Years of Exposure of the LUX-ZEPLIN (LZ) Experiment" arXiv:2410.17036 (October 22, 2024).

The cross-section of interaction of a neutrino with a nucleon is a little less than 10^-38 cm^2. The maximum cross-section of dark matter particles with masses from 9 GeV to 10,000 GeV in light of the latest Lux-Zeplin data is 10^-45 cm^2 (i.e. ten million times smaller), and for masses of 11 GeV to 150 GeV it is 10^-47 cm^2 (i.e. a billion times smaller). This is far below the threshold for dark matter candidates such as Higgs portal, Z portal, W portal, and millicharged dark matter candidates. Those thresholds were already passed in 2018.

Basically, if 9 GeV to 10 TeV mass dark matter particles exist, they have to have be "sterile", i.e. have no non-gravitational interactions with ordinary matter. 

By comparison, the heaviest atom currently known, Oganesson (element 118) has a mass of about 274 GeV, and the heaviest fundamental particle, the top quark, has a mass of about 173 GeV. Any hypothetical stable hexaquark candidate (and there is no credible evidence than any hadron other than the proton and neutron are ever stable) would still be under 50 GeV and would have a strong cross-section of interaction with ordinary matter. The top of exclusion range reaches up to the mass of a large molecule with several hundred atoms in it, while the bottom of the exclusion range is about half the mass of a water molecule.

Particle physics experiments place tight complementary bounds on the exclusion range for dark matter particles that have a cross section of interaction with ordinary matter that is even a small fraction of the weak force interaction, from the meV mass scale of neutrino masses to hundreds of GeVs. The collective experience of particle physics is particularly compelling in the mass range from the mass of an electron (511 keV) to half of the mass of the Z boson (about 45 GeV) and the Higgs boson (about 62.5 GeV), which has been throughly explored experimentally for decades.

Basically, in light of these experimental non-detections of dark matter, any dark matter particle with a mass on the meV scale or up has to have no non-gravitational interactions with ordinary matter strong enough to be discernible experimentally. This direct detection dull results do not rule out, however, a fifth force interaction between dark matter particles and other dark matter particles, a category of dark matter which is called self-interacting dark matter (SIDM).

The correlations between ordinary matter distributions and inferred dark matter distributions, and the shape of inferred dark matter halos, however, strongly disfavors both "sterile" and SIDM dark matter particle candidates.

A variety of other data disfavors heavier dark matter particle candidates (including wave-like dynamics from astronomy data, and the non-detection of the gravitational effects of compact objects in a sufficient quantity in a mass range that covers basically everything more massive than an asteroid), such as primordial black hole dark matter candidates.

A viable dark matter particle candidate must be both very low in mass and have some sort of interaction with ordinary matter beyond gravity (and must reproduce the radial acceleration relation over many orders of magnitude of galaxy sizes, and must also replicate the external field effect of MOND). 

Collectively, these points argue strongly in favor of a gravitational explanation for dark matter phenomena rather than a dark matter particle theory.

Looking For Planet X

There are some hints from the orbits of known objects in the solar system that there might be a Planet X out there that has not been discovered. A recent study by Siraj et al., analyzes a bigger data set (orbits of 51 objects v. 11 in previous studies) to work out where this Planet X should be based upon these hints. It concludes that:

We find that the unseen planet parameters that best fit the data are a mass of m(p) = 4.4 ± 1.1M⊕, a semimajor axis of a(p) = 290 ± 30 AU, an eccentricity of e(p) = 0.29 ± 0.13, and an inclination of i(p) = 6.8 ± 5.0∘ (all error bars are 1σ). 

Only 0.06% of the Brown & Batygin (2021) reference population produce probabilities within 1σ of the maximum within our quadrivariate model, indicating that our work identifies a distinct preferred region of parameter space for an unseen planet in the solar system. If such an unseen planet exists, it is likely to be discovered by LSST.

So, we are looking for a planet about 3-5 times the size of Earth, at the fringe of the solar system past Neptune (about 9-12 times as far away from the Sun as Neptune), in a rather strongly elliptical orbit comparable to Mercury whose orbit has e = 0.205 or Pluto whose orbit has e = 0.248 (as opposed to the nearly circular orbit of all of the other planets and most of the moons) but not nearly as non-circular as the well-known comets, and quite close to the plane in which the other planets of our Solar System orbit the Sun. It also can't have an exceptionally reflective surface (i.e. high albedo), since otherwise we would have seen it already. This narrows down the places we should look for it, and what we are looking for, dramatically.

As a reference: The distance from the Sun to Mercury is 0.39 AU, to Venus is 0.72 AU, to Earth is 1.00 AU, to Mars is 1.52 AU, to Jupiter is 5.20 AU, to Saturn is 9.54 AU, to Uranus is 19.22 AU, and to Neptune is 30.06 AU. Pluto averages 39 AU and ranges from 30-49 AU from the Sun. 

The mass of the Sun is roughly 333,000 times that of Earth. The gas giants have masses ranging from Jupiter at about 318 Earth masses to Saturn at about 95 Earth masses to Neptune at about 17 Earth masses to Uranus at about 14.5 Earth masses. Other of the other planets, dwarf planets, moons, and other objects in our solar system are less massive than Earth. So, Planet X, if it exists, would have a mass significantly greater than Earth, but significantly smaller than Uranus.

It might be a gas dwarf, an ice giant, a super-Earth, or  a super-puff.

The mass range suggests a radius larger than the 6,371 km  of Earth and less than the 24,622 km of Neptune (which has a smaller radius than Uranus despite being more massive than Uranus), realistically, a radius in the ballpark of 9,000 to 18,000 km, unless it is a super-puff (which would have a radius larger than Neptune despite having a much smaller mass).

The rather precise parameters for a potential Planet X means that it should be seen or ruled out in a matter of not all that many years. If it is ruled out, then another hypothesis for the hints attributed to Planet X needs to be established.

The paper lays out the hints that suggest a Planet X:

There is a long history of theoretically proposed planets in the outer solar system, dating back to the mid-1800s. Recently, the structure of the distant Kuiper belt has led to speculation regarding the possibility of an unseen planet. Some of these recent studies have been motivated by apparent clustering of distant trans-Neptunian objects (TNOs) in various orbital parameters, including longitude of perihelion (ϖ), longitude of the ascending node (Ω), argument of perihelion (ω ≡ ϖ−Ω) and inclination relative to the ecliptic (i). An unseen planet in the outer solar system could potentially shepherd the orbits of distant TNOs into clustered configurations. The observational search for such a planet has, to date, been unsuccessful. 

There is an ongoing debate over whether the claimed clustering of distant TNOs is real or spurious, perhaps arising from observational selection effects or limited statistics. Shankman et al. (2017) and Bernardinelli et al. (2020) could not conclude that distant TNOs were clustered in Dark Energy Survey (DES) data alone, and Bernardinelli et al. (2022) similarly could not conclude that distant TNOs were clustered in Outer Solar System Origins Survey (OSSOS) data alone. Brown & Batygin (2019) and Napier et al. (2021) examined larger samples of TNOs and reached opposite conclusions about the statistical significance of clustering in the orbital elements. In addition to clustering in angular orbital elements, an unseen planet could also produce a population of high-inclination Centaurs. 

Brown & Batygin (2021) ran a suite of 121 n-body simulations testing various parameters for an additional planet in the solar system. Each simulation contained several tens of thousands of test particles, whose orbital parameters were compared to the ϖ, Ω, and i distributions of 11 distant TNOs to identify a preferred region of parameter space for the additional planet. Additionally, Batygin et al. (2024) argued that an unseen planet may produce a population of Neptune-crossing TNOs significantly more consistent with the observed population of such objects than if no unseen planet were present. 

In this paper, we re-examine the question of whether or not the current distribution of distant TNO detections suggests clustering in ϖ, Ω, and i. Furthermore, we explore the parameter space of hypothetical unseen planets and ask what parameters are most likely given the current state of observations. 

A novel feature of this work is that we determine the long-term stability for a large set of distant TNOs– such stability is crucial for evaluating the plausibility of an unseen planet because it takes ∼ 1 Gyr for such a planet to induce clustering amongst TNOs. Using this information and new discoveries of distant TNOs in addition to a broader range of allowed semimajor axes we are able to expand the sample of TNOs that we search for clustering to 51 objects.

Finding a planet this big, so long after the last major discovery of a large object in the Solar System, would be remarkable.

The Top Quark Mass From ATLAS 13 TeV Data

summary and review of ATLAS 13 TeV data at the Large Hadron Collider (LHC) from a talk presented at a recent conference, explains this direct measurement of the top quark mass from fully leptonic decays. It notes that: "this top-mass extraction turns out to be the most precise single measurement from the reconstruction of the top-decay products, i.e. m(t) = 174.41 ± 0.39 (stat.) ± 0.66 (syst.) ± 0.25 (recoil)." Combining all three sources of uncertainty in quadrature, this is 174.41 ± 0.81 GeV. The brief paper recaps an earlier disclosure of this measurement published on June 5, 2023.

This top quark mass is at the high end of recent top quark measurements. The Particle Data Group value is 172.57 ± 0.29 GeV (determined by inflating uncertainty estimates by a factor of 1.5 because they otherwise wouldn't all be reasonably consistent). But this result, which is included in the PDG world average, is still consistent with the world average as it is just 1.2 sigma above the PDG value.

This result is close to the Tevatron combined measurement from 2016 of 174.30 ± 0.64 GeV. But seven CMS experiment measurements (and one different ATLAS measurement from 2019) over the time period from 2016 to 2023 measure values ranging from 171.77 ± 0.37 GeV to 173.06 ± 0.84 GeV which drag down the world average.

A determination of the top quark pole mass from cross-section measurements is close to the world average from direct measurements and is competitive in precision. The world average top quark pole mass from cross-section measurements is 172.4 ± 0.7 GeV.

The inverse error average weighted average top quark mass from direct measurements and cross-section measurements combined (which is arguably more robust) is 172.52 GeV, with an uncertainty of a bit less than ± 0.29 GeV (probably something like ± 0.25 GeV).

I suspect that the true value is probably about 173 GeV.

Thursday, October 24, 2024

Korean, Japanese, Vietnamese, and Cantonese

Korean and Cantonese (a.k.a. Yue Chinese) share some similarities because Cantonese preserves more ancient pronunciations than other Chinese languages, and Korean was influenced by Chinese at a time when these ancient pronunciations were used for words that it borrowed from Chinese. Japanese loan words from Chinese also sound more like Cantonese than other Chinese languages for the same reason. Specifically, Cantonese and these loanwords all have pronunciations that were used by Middle Chinese.

Presumably, then, these loan word pronunciations in Korean and Japanese date to the time period from roughly the 6th to 10th centuries, or perhaps as late as a couple of centuries later, when Middle Chinese was spoken. In the case of Korean, this corresponds to Old Korean. In the case of Japanese, this corresponds to Old Japanese and early Middle Japanese. 

Prior to Old Korean and Old Japanese, which were written with Chinese characters, neither Korean languages or Japanese languages had a written form, and the Chinese loan words were also probably mostly absent. There were three languages or dialects in Korea, before the language of the unified Silla Kingdom that united the southern two-thirds of the peninsula and lead to the extinction of the two sister languages spoken by the kingdoms it conquered. The Proto-Japonic language that preceded Old Japanese probably arrived in Japan from Korea. "Most scholars believe that Japonic was brought to northern Kyushu from the Korean peninsula around 700 to 300 BC by wet-rice farmers of the Yayoi culture and spread throughout the Japanese archipelago, replacing indigenous languages." This was during the Korean Iron Age and prior to the proto-Three Kingdoms period. In this time period:
Gojoseon was the first Korean kingdom, located in the north of the peninsula and Manchuria, later alongside the state of Jin in the south of the peninsula. . . . 

The historical Gojoseon kingdom was first mentioned in the Chinese record in a text called Guanzi, attributed to 7th century BCE. By about the 4th century BC, Gojoseon had developed to the point where its existence was well known in China, and around this time, its capital moved to Pyongyang.

In 194 BC, the King of Gojoseon was overthrown by Wi Man (also known as Wei Man), a Korean-Chinese refugee from the Han vassal state of Yan. Wi Man then established Wiman JoseonIn 128 BC, Nan Lü (南閭), a leader of Ye who was receiving pressure from Wiman Joseon, surrendered to the Han dynasty and became the Canghai CommanderyIn 108 BC, the Chinese Han dynasty defeated Wiman Joseon and installed four commanderies in the northern Korean peninsula. . . . 

Around 300 BC, a state called Jin arose in the southern part of the Korean peninsula. Very little is known about Jin, but it established relations with Han China and exported artifacts to the Yayoi of Japan. Around 100 BC, Jin evolved into the Samhan confederacies.

Many smaller states sprang from the former territory of Gojoseon such as BuyeoOkjeoDongyeGoguryeo, and Baekje. The Three Kingdoms refer to GoguryeoBaekje, and Silla, although Buyeo and the Gaya confederacy existed into the 5th and 6th centuries respectively.


Modern Korean is derived from Middle Korean:
Middle Korean is the period in the history of the Korean language succeeding Old Korean and yielding in 1600 to the Modern period. The boundary between the Old and Middle periods is traditionally identified with the establishment of Goryeo in 918, but some scholars have argued for the time of the Mongol invasions of Korea (mid-13th century). Middle Korean is often divided into Early and Late periods corresponding to Goryeo (until 1392) and Joseon respectively. It is difficult to extract linguistic information from texts of the Early period, which are written using adaptations of Chinese characters. The situation was transformed in 1446 by the introduction of the Hangul alphabet, so that Late Middle Korean provides the pivotal data for the history of Korean.

Given this timing, it would appear that the bulk of Chinese loan words into Korean were received in the Old Korean time period, which coincided with Middle Chinese. And, since the earliest Korean writing is in Chinese characters and dates to about the 6th or 7th centuries, prior to that time, Korean may not have had a written language at all. Since written Chinese is much older than that, it makes sense that Chinese loan words into Korean date to around the same time that Korean started to be written with Chinese characters.

Old Korean (North Korean name: 고대 조선어; South Korean name: 고대 한국어) is the first historically documented stage of the Korean language, typified by the language of the Unified Silla period (668–935).

The boundaries of Old Korean periodization remain in dispute. Some linguists classify the sparsely attested languages of the Three Kingdoms of Korea as variants of Old Korean, while others reserve the term for the language of Silla alone. Old Korean traditionally ends with the fall of Silla in 935. This too has recently been challenged by South Korean linguists who argue for extending the Old Korean period to the mid-thirteenth century, although this new periodization is not yet fully accepted. This article focuses on the language of Silla before the tenth century.

Old Korean is poorly attested. Due to the paucity and poor quality of sources, modern linguists have "little more than a vague outline" of the characteristics of Old Korean. The only surviving literary works are a little more than a dozen vernacular poems called hyangga. Hyangga use hyangchal writing. Other sources include inscriptions on steles and wooden tablets, glosses to Buddhist sutras, and the transcription of personal and place names in works otherwise in Classical Chinese. All methods of Old Korean writing rely on logographic Chinese characters, used to either gloss the meaning or approximate the sound of the Korean words. Thus, the phonetic value of surviving Old Korean texts is opaque. Its phoneme inventory seems to have included fewer consonants but more vowels than Middle Korean. In its typology, it was a subject-object-verb, agglutinative language, like both Middle and Modern Korean. However, Old Korean is thought to have differed from its descendants in certain typological features, including the existence of clausal nominalization and the ability of inflecting verb roots to appear in isolation.

Despite attempts to link the language to the putative Altaic family and especially to the Japonic languages, no links between Old Korean and any non-Koreanic language have been uncontroversially demonstrated. . . . 
For what it is worth, I have almost no doubts about the linguistic relatedness of the Korean and Japanese languages, and I see their connection to at least parts of the larger Altaic family as at least more likely than not. Some scholars suspect that Japanese is a descendants of one of the two pre-Old Korean languages of Korea that have since died.
Old Korean is generally defined as the ancient Koreanic language of the Silla state (BCE 57–CE 936), especially in its Unified period (668–936). Proto-Koreanic, the hypothetical ancestor of the Koreanic languages understood largely through the internal reconstruction of later forms of Korean, is to be distinguished from the actually historically attested language of Old Korean.

Old Korean semantic influence may be present in even the oldest discovered Silla inscription, a Classical Chinese-language stele dated to 441 or 501. Korean syntax and morphemes are visibly attested for the first time in Silla texts of the mid- to late sixth century, and the use of such vernacular elements becomes more extensive by the Unified period.

Initially only one of the Three Kingdoms of Korea, Silla rose to ascendancy in the sixth century under monarchs Beopheung and Jinheung. After another century of conflict, the kings of Silla allied with Tang China to destroy the other two kingdoms—Baekje in 660, and Goguryeo in 668—and to unite the southern two-thirds of the Korean Peninsula under their rule. This political consolidation allowed the language of Silla to become the lingua franca of the peninsula and ultimately drove the languages of Baekje and Goguryeo to extinction, leaving the latter only as substrata in later Korean dialects. Middle Korean, and hence Modern Korean, are thus direct descendants of the Old Korean language of Silla.

Little data on the languages of the other two kingdoms survive, but most linguists agree that both were related to the language of Silla. Opinion differs as to whether to classify the Goguryeo and Baekje languages as Old Korean variants, or as related but independent languages. Lee Ki-Moon and S. Roberts Ramsey argue in 2011 that evidence for mutual intelligibility is insufficient, and that linguists ought to "treat the fragments of the three languages as representing three separate corpora". Earlier in 2000, Ramsey and Iksop Lee note that the three languages are often grouped as Old Korean, but point to "obvious dissimilarities" and identify Sillan as Old Korean "in the truest sense". Nam Pung-hyun and Alexander Vovin, on the other hand, classify the languages of all three kingdoms as regional dialects of Old Korean. Other linguists, such as Lee Seungjae, group the languages of Silla and Baekje together as Old Korean while excluding that of Goguryeo. The LINGUIST List gives Silla as a synonym for Old Korean while acknowledging that the term is "often used to refer to three distinct languages".

In a similar exercise the different stages of the evolution of the Japanese language can be compared to the era when Middle Chinese was spoken. These corresponds to Old Japanese, when Chinese characters were used to make it a written language, and Early Middle Japanese.

Prehistory

Proto-Japonic, the common ancestor of the Japanese and Ryukyuan languages, is thought to have been brought to Japan by settlers coming from the Korean peninsula sometime in the early- to mid-4th century BC (the Yayoi period), replacing the languages of the original Jōmon inhabitants, including the ancestor of the modern Ainu language. Because writing had yet to be introduced from China, there is no direct evidence, and anything that can be discerned about this period must be based on internal reconstruction from Old Japanese, or comparison with the Ryukyuan languages and Japanese dialects.

Old Japanese

The Chinese writing system was imported to Japan from Baekje around the start of the fifth century, alongside Buddhism. The earliest texts were written in Classical Chinese, although some of these were likely intended to be read as Japanese using the kanbun method, and show influences of Japanese grammar such as Japanese word order. The earliest text, the Kojiki, dates to the early eighth century, and was written entirely in Chinese characters, which are used to represent, at different times, Chinese, kanbun, and Old Japanese. As in other texts from this period, the Old Japanese sections are written in Man'yōgana, which uses kanji for their phonetic as well as semantic values.

Based on the Man'yōgana system, Old Japanese can be reconstructed as having 88 distinct morae. Texts written with Man'yōgana use two different sets of kanji for each of the morae now pronounced き (ki), ひ (hi), み (mi), け (ke), へ (he), め (me), こ (ko), そ (so), と (to), の (no), も (mo), よ (yo) and ろ (ro).[7] (The Kojiki has 88, but all later texts have 87. The distinction between mo1 and mo2 apparently was lost immediately following its composition.) This set of morae shrank to 67 in Early Middle Japanese, though some were added through Chinese influence. Man'yōgana also has a symbol for /je/, which merges with /e/ before the end of the period.

Several fossilizations of Old Japanese grammatical elements remain in the modern language – the genitive particle tsu (superseded by modern no) is preserved in words such as matsuge ("eyelash", lit. "hair of the eye"); modern mieru ("to be visible") and kikoeru ("to be audible") retain a mediopassive suffix -yu(ru) (kikoyu → kikoyuru (the attributive form, which slowly replaced the plain form starting in the late Heian period) → kikoeru (all verbs with the shimo-nidan conjugation pattern underwent this same shift in Early Modern Japanese)); and the genitive particle ga remains in intentionally archaic speech.

Early Middle Japanese

Early Middle Japanese is the Japanese of the Heian period, from 794 to 1185. It formed the basis for the literary standard of Classical Japanese, which remained in common use until the early 20th century.

During this time, Japanese underwent numerous phonological developments, in many cases instigated by an influx of Chinese loanwords. These included phonemic length distinction for both consonants and vowels, palatal consonants (e.g. kya) and labial consonant clusters (e.g. kwa), and closed syllables. This had the effect of changing Japanese into a mora-timed language.

Late Middle Japanese

Late Middle Japanese covers the years from 1185 to 1600, and is normally divided into two sections, roughly equivalent to the Kamakura period and the Muromachi period, respectively. The later forms of Late Middle Japanese are the first to be described by non-native sources, in this case the Jesuit and Franciscan missionaries; and thus there is better documentation of Late Middle Japanese phonology than for previous forms (for instance, the Arte da Lingoa de Iapam). Among other sound changes, the sequence /au/ merges to /ɔː/, in contrast with /oː/; /p/ is reintroduced from Chinese; and /we/ merges with /je/. Some forms rather more familiar to Modern Japanese speakers begin to appear – the continuative ending -te begins to reduce onto the verb (e.g. yonde for earlier yomite), the -k- in the final mora of adjectives drops out (shiroi for earlier shiroki); and some forms exist where modern standard Japanese has retained the earlier form (e.g. hayaku > hayau > hayɔɔ, where modern Japanese just has hayaku, though the alternative form is preserved in the standard greeting o-hayō gozaimasu "good morning"; this ending is also seen in o-medetō "congratulations", from medetaku).

Late Middle Japanese has the first loanwords from European languages – now-common words borrowed into Japanese in this period include pan ("bread") and tabako ("tobacco", now "cigarette"), both from Portuguese.

Modern Japanese

Modern Japanese is considered to begin with the Edo period (which spanned from 1603 to 1867). Since Old Japanese, the de facto standard Japanese had been the Kansai dialect, especially that of Kyoto. However, during the Edo period, Edo (now Tokyo) developed into the largest city in Japan, and the Edo-area dialect became standard Japanese. Since the end of Japan's self-imposed isolation in 1853, the flow of loanwords from European languages has increased significantly. The period since 1945 has seen many words borrowed from other languages—such as German, Portuguese and English. Many English loan words especially relate to technology—for example, pasokon (short for "personal computer"), intānetto ("internet"), and kamera ("camera"). Due to the large quantity of English loanwords, modern Japanese has developed a distinction between [tɕi] and [ti], and [dʑi] and [di], with the latter in each pair only found in loanwords. . . . 
The language experienced a massive influx of Sino-Japanese vocabulary after the introduction of Buddhism in the 6th century and peaking with the wholesale importation of Chinese culture in the 8th and the 9th centuries. The loanwords now account for about half the lexicon. They also affected the sound system of the language by adding compound vowels, syllable-final nasals, and geminate consonants, which became separate morae. Most of the changes in morphology and syntax reflected in the modern language took place during the Late Middle Japanese period (13th to 16th centuries). . . . 
There is fragmentary evidence suggesting that now-extinct Japonic languages were spoken in the central and southern parts of the Korean peninsula. Vovin calls these languages Peninsular Japonic and groups Japanese and Ryukyuan as Insular Japonic.

The most-cited evidence comes from chapter 37 of the Samguk sagi (compiled in 1145), which contains a list of pronunciations and meanings of placenames in the former kingdom of Goguryeo. As the pronunciations are given using Chinese characters, they are difficult to interpret, but several of those from central Korea, in the area south of the Han River captured from Baekje in the 5th century, seem to correspond to Japonic words. Scholars differ on whether they represent the language of Goguryeo or the people that it conquered.

Traces from the south of the peninsula are very sparse: The Silla placenames listed in Chapter 34 of the Samguk sagi are not glossed, but many of them can be explained as Japonic words.
  • Alexander Vovin proposes Japonic etymologies for two of four Baekje words given in the Book of Liang (635).
  • A single word is explicitly attributed to the language of the southern Gaya confederacy, in Chapter 44 of the Samguk sagi. It is a word for 'gate' and appears in a similar form to the Old Japanese word to2, with the same meaning.
  • Vovin suggests that the ancient name for the kingdom of Tamna on Jeju Island, tammura, may have a Japonic etymology tani mura 'valley settlement' or tami mura 'people's settlement'. . . . 
According to Shirō Hattori, more attempts have been made to link Japanese with other language families than for any other language. None of the attempts has succeeded in demonstrating a common descent for Japonic and any other language family.

The most systematic comparisons have involved Korean, which has a very similar grammatical structure to Japonic languages. Samuel Elmo Martin, John Whitman, and others have proposed hundreds of possible cognates, with sound correspondences. However, Alexander Vovin points out that Old Japanese contains several pairs of words of similar meaning in which one word matches a Korean form, and the other is also found in Ryukyuan and Eastern Old Japanese, suggesting that the former is an early loan from Korean. He suggests that to eliminate such early loans, Old Japanese morphemes should not be assigned a Japonic origin unless they are also attested in Southern Ryukyuan or Eastern Old Japanese. That procedure leaves fewer than a dozen possible cognates, which may have been borrowed by Korean from Peninsular Japonic.

There was also contemporaneous Chinese linguistic influence in Vietnam, with all three waves of Chinese loan words more or less corresponding to the Tang Dynasty of China:

During the Tang dynasty (618–907), Chinese writing, language and culture were imported wholesale into Vietnam, Korea and Japan. Scholars in those countries wrote in Literary Chinese and were thoroughly familiar with the Chinese classics, which they read aloud in systematic local approximations of Middle Chinese. With those pronunciations, Chinese words entered Vietnamese, Korean and Japanese in huge numbers.

The plains of northern Vietnam were under Chinese control for most of the period from 111 BC to AD 938. After independence, the country adopted Literary Chinese as the language of administration and scholarship. As a result, there are several layers of Chinese loanwords in Vietnamese. The oldest loans, roughly 400 words dating from the Eastern Han, have been fully assimilated and are treated as native Vietnamese words. Sino-Vietnamese proper dates to the early Tang dynasty, when the spread of Chinese rime dictionaries and other literature resulted in the wholesale importation of the Chinese lexicon.

Isolated Chinese words also began to enter Korean from the 1st century BC, but the main influx occurred in the 7th and 8th centuries after the unification of the peninsula by Silla. The flow of Chinese words into Korean became overwhelming after the establishment of civil service examinations in 958.

Japanese has two well-preserved layers and a third that is also significant: 
  • Go-on readings date to the introduction of Buddhism to Japan from Korea in the 6th century. Based on the name, they are widely believed to reflect pronunciations of Jiankang in the lower Yangtze area during the late Northern and Southern dynasties period. However, this cannot be substantiated, and Go-on appears to reflect an amalgam of different Chinese varieties transmitted through Korea.
  • Kan-on readings are believed to reflect the standard pronunciation of the Tang period, as used in the cities of Chang'an and Luoyang. It was transmitted directly by Japanese who studied in China.
  • Tōsō-on readings were introduced by followers of Zen Buddhism in the 14th century and are thought to be based on the speech of Hangzhou. . . . 
In contrast, vocabulary of Chinese origin in Thai, including most of the basic numerals, was borrowed over a range of periods from the Han (or earlier) to the Tang.

Since the pioneering work of Bernhard Karlgren, these bodies of pronunciations have been used together with modern varieties of Chinese in attempts to reconstruct the sounds of Middle Chinese. They provide such broad and systematic coverage that the linguist Samuel Martin called them "Sino-Xenic dialects", treating them as parallel branches with the native Chinese dialects. The foreign pronunciations sometimes retain distinctions lost in all the modern Chinese varieties, as in the case of the chongniu distinction found in Middle Chinese rime dictionaries. Similarly, the distinction between grades III and IV made by the Late Middle Chinese rime tables has disappeared in most modern varieties, but in kan-on, grade IV is represented by the Old Japanese vowels i1 and e1 while grade III is represented by i2 and e2.

Vietnamese, Korean and Japanese scholars also later each adapted the Chinese script to write their languages, using Chinese characters both for borrowed and native vocabulary. Thus, in the Japanese script, Chinese characters may have both Sino-Japanese readings (on'yomi) and native readings (kun'yomi). Similarly, in the chữ Nôm script used for Vietnamese until the early 20th century, some Chinese characters could represent both a Sino-Vietnamese word and a native Vietnamese word with similar meaning or sound to the Chinese word, but would often be marked with a diacritic when the native reading was intended. However, in the Korean mixed script, Chinese characters (hanja) are only used for Sino-Korean words. The character-based Vietnamese and Korean scripts have since been replaced by the Vietnamese alphabet and hangul respectively, although Korean does still use Hanja to an extent.

The Tang Dynasty was a big deal in the history of China (and incidentally, coincided with the early Islamic Empire and the Migration Period in the West):


The Tang empire in 661, at its greatest extent.

The Tang dynasty (/tɑːŋ/, [tʰǎŋ]; Chinese: 唐朝), or the Tang Empire, was an imperial dynasty of China that ruled from 618 to 907, with an interregnum between 690 and 705. It was preceded by the Sui dynasty and followed by the Five Dynasties and Ten Kingdoms period. Historians generally regard the Tang as a high point in Chinese civilization, and a golden age of cosmopolitan culture. Tang territory, acquired through the military campaigns of its early rulers, rivaled that of the Han dynasty.

The Li family founded the dynasty after taking advantage of a period of Sui decline and precipitating their final collapse, in turn inaugurating a period of progress and stability in the first half of the dynasty's rule. The dynasty was formally interrupted during 690–705 when Empress Wu Zetian seized the throne, proclaiming the Wu Zhou dynasty and becoming the only legitimate Chinese empress regnant. The devastating An Lushan Rebellion (755–763) led to the decline of central authority in the dynasty's latter half. Like the previous Sui dynasty, the Tang maintained a civil-service system by recruiting scholar-officials through standardized examinations and recommendations to office. The rise of regional military governors known as jiedushi during the 9th century undermined this civil order. The dynasty and central government went into decline by the latter half of the 9th century; agrarian rebellions resulted in mass population loss and displacement, widespread poverty, and further government dysfunction that ultimately ended the dynasty in 907.

The Tang capital at Chang'an (present-day Xi'an) was the world's most populous city for much of the dynasty's existence. Two censuses of the 7th and 8th centuries estimated the empire's population at about 50 million people, which grew to an estimated 80 million by the dynasty's end. From its numerous subjects, the dynasty raised professional and conscripted armies of hundreds of thousands of troops to contend with nomadic powers for control of Inner Asia and the lucrative trade-routes along the Silk Road. Far-flung kingdoms and states paid tribute to the Tang court, while the Tang also indirectly controlled several regions through a protectorate system. In addition to its political hegemony, the Tang exerted a powerful cultural influence over neighboring East Asian nations such as Japan and Korea.
Cantonese preservation of ancient Chinese pronunciations is mentioned, for example, by the Encyclopedia Britannica, which states that:
Cantonese preserves more features of Ancient Chinese than do the other major Chinese languages; its various dialects retain most of the final consonants of the older language and have at least six tones, in contrast to the four tones of Modern Standard Chinese, to distinguish meaning between words or word elements that have the same arrangement of consonant and vowel sounds. The language has fewer initial consonants than Modern Standard Chinese and about twice as many distinctively different syllables. 
According to Wikipedia, per the link above:
Cantonese is the traditional prestige variety of Yue Chinese, a Sinitic language belonging to the Sino-Tibetan language family. It originated in the city of Guangzhou (formerly known as Canton) and its surrounding Pearl River Delta [ed. a.k.a. the Guangdong–Hong Kong–Macao Greater Bay Area].
Cantonese is regarded as an integral and inextricable component of the cultural identity of its native speakers across a vast expanse of southeastern China, Hong Kong, and Macau, as well as in overseas communities. In mainland China, Cantonese is the lingua franca of the Chinese province of Guangdong (being the majority language of the Pearl River Delta) and neighbouring areas such as Guangxi. It is also the dominant and co-official language of Hong Kong and Macau. Furthermore, Cantonese is widely spoken among overseas Chinese in Southeast Asia (most notably in Vietnam and Malaysia, as well as in Singapore and Cambodia to a lesser extent) and the Western world.

Despite the considerable overlap in vocabulary between Cantonese and Mandarin, as well as other varieties of Chinese, these Sinitic languages are not mutually intelligible. This is due to a combination of factors, including phonological differences and variations in grammar and vocabulary.


 

Thus, the similarity exists despite the significant geographic distance between the areas where Cantonese has its origins and is most widely spoken on one hand, and Korea and Japan on the other, and despite the fact that less similar topolects of Chinese exist between these two regions.

The Chinese language began to be spoken on a widespread basis in the Pearl River Delta region starting around 214 BCE when ethically Han Chinese people began to migrate to the region in large numbers after it was conquered by the Qin Dynasty. According to the same link:

Successive waves of immigration followed at times of upheaval in Northern and Central China, such as the collapse of the Han, Tang and Song dynasties. The most popular route was via the Xiang River, which the Qin had connected to the Li River by the Lingqu Canal, and then into the valley of the Xi Jiang. A secondary route followed the Gan River and then the Bei Jiang into eastern Guangdong. Yue-speakers were later joined by Hakka speakers following the North River route, and Min speakers arriving by sea.

After the fall of Qin, the Lingnan area was part of the independent state of Nanyue for about a century, before being incorporated into the Han empire in 111 BC. After the Tang dynasty collapsed, much of the area became part of the state of Southern Han, one of the longest-lived states of the Five Dynasties and Ten Kingdoms, between 917 and 971.

Large waves of Chinese migration throughout succeeding Chinese dynasties assimilated huge numbers of Yue aborigines, with the result that today's Southern Han Chinese Yue-speaking population is descended from both groups. 
The colloquial layers of Yue varieties contain elements influenced by the Tai languages formerly spoken widely in the area and still spoken by people such as the Zhuang and Dong.

The port city of Guangzhou lies in the middle of Pearl River Delta, with access to the interior via the Xi, Bei, and Dong rivers, which all converge at the delta. It has been the economic centre of the Lingnan region since Qin times, when it was an important shipbuilding centre. By 660, it was the largest port in China, part of a trade network stretching as far as Arabia. During the Southern Song, it also became the cultural centre of the region. Like many other Chinese varieties it developed a distinct literary layer associated with the local tradition of reading the classics. The Guangzhou dialect (Cantonese) was used in the popular Yuèōu, Mùyú and Nányīn folksong genres, as well as Cantonese opera. There was also a small amount of vernacular literature, written with Chinese characters extended with a number of non-traditional characters for Cantonese words.

Guangzhou became the centre of rapidly expanding foreign trade after the maritime ban was lifted, with the British East India Company establishing a chamber of commerce in the city in 1715. The ancestors of most of the Han Chinese population of Hong Kong came from Guangzhou after the territory was ceded to Britain in 1842. As a result, Hong Kong Cantonese, the most widely spoken language in Hong Kong and Macau, is an offshoot of the Guangzhou dialect. . . . 
Yue varieties are among the most conservative of Chinese varieties regarding the final consonants and tonal categories of Middle Chinese, so that the rhymes of Tang poetry are clearer in Yue dialects than elsewhere. However they have lost several distinctions in the initial consonants and medial vowels that other Chinese varieties have retained.
Initials and medials

In addition to aspirated and unaspirated voiceless initials, Middle Chinese had a series of voiced initials, but voicing has been lost in Yue and most other modern Chinese varieties apart from Wu and Old Xiang. In the Guangfu, Siyi and Gao–Yang subgroups, these initials have yielded aspirated consonants in the level and rising tones, and unaspirated consonants in the departing and entering tones. These initials are uniformly unaspirated in Gou–Lou varieties and uniformly aspirated in Wu–Hua.

In many Yue varieties, including Cantonese, Middle Chinese /kʰ/ has become [h] or [f] in most words; in Taishanese, /tʰ/ has also changed to [h], for example, in the native name of the dialect, "Hoisan". In Siyi and eastern Gao–Yang, Middle Chinese /s/ has become a voiceless lateral fricative [ɬ].

Most Yue varieties have merged the Middle Chinese retroflex sibilants with the alveolar sibilants, in contrast with Mandarin dialects, which have generally maintained the distinction. For example, the words 將; jiāng and 張; zhāng are distinguished in Mandarin, but in modern Cantonese they are both pronounced as jēung.

Many Mandarin varieties, including the Beijing dialect, have a third sibilant series, formed through a merger of palatalized alveolar sibilants and velars, but this is a recent innovation, which has not affected Yue and other Chinese varieties. For example, 晶, 精, 經 and 京 are all pronounced as jīng in Mandarin, but in Cantonese the first pair is pronounced jīng, while the second pair is pronounced gīng. The earlier pronunciation is reflected in historical Mandarin romanizations, such as "Peking" for Beijing, "Kiangsi" for Jiangxi, and "Tientsin" for Tianjin.

Some Yue speakers, such as many Hong Kong Cantonese speakers born after World War II, merge /n/ with /l/, but Taishanese and most other Yue varieties preserve the distinction. Younger Cantonese speakers also tend not to distinguish between /ŋ/ and the zero initial, though this distinction is retained in most Yue dialects. Yue varieties retain the initial /m/ in words where Late Middle Chinese shows a shift to a labiodental consonant, realized in most Northern varieties of Chinese as [w]. Nasals can be independent syllables in Yue words, e.g. Cantonese 五; ńgh; 'five', and 唔; m̀h; 'not', although Middle Chinese did not have syllables of this type.

In most Yue varieties (except for Tengxian), the rounded medial /w/ has merged with the following vowel to form a monophthong, except after velar initials. In most analyses velars followed by /w/ are treated as labio-velars.

Most Yue varieties have retained the Middle Chinese palatal medial, but in Cantonese it has also been lost to monophthongization, yielding a variety of vowels.

Final consonants and tones

Middle Chinese syllables could end with glides /j/ or /w/, nasals /m/, /n/ or /ŋ/, or stops /p/, /t/ or /k/. Syllables with vocalic or nasal endings could occur with one of three tonal contours, called 平; 'level', 上; 'rising', or 去; 'departing'. Syllables with final stops were traditionally treated as a fourth tone category, the entering tone 入; rù, because the stops were distributed in the same way as the corresponding final nasals.

While northern and central varieties have lost some of the Middle Chinese final consonants, they are retained by most southern Chinese varieties, though sometimes affected by sound shifts. They are most faithfully preserved in Yue dialects. Final stops have disappeared entirely in most Mandarin dialects, including the Beijing-based standard, with the syllables distributed across the other tones. For example, the characters 裔, 屹, 藝, 憶, 譯, 懿, 肄, 翳, 邑, and 佚 are all pronounced yì in Mandarin, but they are all distinct in Yue: in Cantonese, yeuih, ngaht, ngaih, yīk, yihk, yi, yih, ai, yāp, and yaht, respectively.

Similarly, in Mandarin dialects the Middle Chinese final /m/ has merged with /n/, but the distinction is maintained in southern varieties of Chinese such as Hakka, Min and Yue. For example, Cantonese has 譚; taahm and 壇; tàahn versus Mandarin tán, 鹽; yìhm and 言; yìhn versus Mandarin yán, 添; tìm and 天; tìn versus Mandarin tiān, and 含; hàhm and 寒; hòhn versus Mandarin hán.

Middle Chinese is described in contemporary dictionaries as having four tones, where the fourth category, the entering tone, consists of syllables with final stops. Many modern Chinese varieties contain traces of a split of each of these four tones into two registers, an upper or yīn register from voiceless initials and a lower or yáng register from voiced initials. Most Mandarin dialects retain the register distinction only in the level tone, yielding the first and second tones of the standard language (corresponding to the first and fourth tones in Cantonese), but have merged several of the other categories. Most Yue dialects have retained all eight categories, with a further split of the upper entering tone conditioned by vowel length, as also found in neighbouring Tai dialects. A few dialects spoken in Guangxi, such as the Bobai dialect, have also split the lower entering tone.

Middle Chinese, meanwhile, is first attested around 500 CE by Shen Yue:

Middle Chinese (formerly known as Ancient Chinese) or the Qieyun system (QYS) is the historical variety of Chinese recorded in the Qieyun, a rime dictionary first published in 601 and followed by several revised and expanded editions. The Swedish linguist Bernhard Karlgren believed that the dictionary recorded a speech standard of the capital Chang'an of the Sui and Tang dynasties. However, based on the preface of the Qieyun, most scholars now believe that it records a compromise between northern and southern reading and poetic traditions from the late Northern and Southern dynasties period. This composite system contains important information for the reconstruction of the preceding system of Old Chinese phonology (early 1st millennium BC).

The fanqie method used to indicate pronunciation in these dictionaries, though an improvement on earlier methods, proved awkward in practice. The mid-12th-century Yunjing and other rime tables incorporate a more sophisticated and convenient analysis of the Qieyun phonology. The rime tables attest to a number of sound changes that had occurred over the centuries following the publication of the Qieyun. Linguists sometimes refer to the system of the Qieyun as Early Middle Chinese and the variant revealed by the rime tables as Late Middle Chinese.

The dictionaries and tables describe pronunciations in relative terms, but do not give their actual sounds. Karlgren was the first to attempt a reconstruction of the sounds of Middle Chinese, comparing its categories with modern varieties of Chinese and the Sino-Xenic pronunciations used in the reading traditions of neighbouring countries. Several other scholars have produced their own reconstructions using similar methods.

The Qieyun system is often used as a framework for Chinese dialectology. With the exception of Min varieties, which show independent developments from Old Chinese, modern Chinese varieties can be largely treated as divergent developments from Middle Chinese. The study of Middle Chinese also provides for a better understanding and analysis of Classical Chinese poetry, such as the study of Tang poetry
. . . 
The tone system of Middle Chinese is strikingly similar to those of its neighbours in the Mainland Southeast Asia linguistic areaproto-Hmong–Mien, proto-Tai and early Vietnamese—none of which is genetically related to Chinese. Moreover, the earliest strata of loans display a regular correspondence between tonal categories in the different languages. In 1954, André-Georges Haudricourt showed that Vietnamese counterparts of the rising and departing tones corresponded to final /ʔ/ and /s/, respectively, in other (atonal) Austroasiatic languages. He thus argued that the Austroasiatic proto-language had been atonal, and that the development of tones in Vietnamese had been conditioned by these consonants, which had subsequently disappeared, a process now known as tonogenesis. Haudricourt further proposed that tone in the other languages, including Middle Chinese, had a similar origin. 
Other scholars have since uncovered transcriptional and other evidence for these consonants in early forms of Chinese, and many linguists now believe that Old Chinese was atonal.

Around the end of the first millennium AD, Middle Chinese and the southeast Asian languages experienced a phonemic split of their tone categories. Syllables with voiced initials tended to be pronounced with a lower pitch, and by the late Tang dynasty, each of the tones had split into two registers conditioned by the initials, known as the "upper" and "lower". When voicing was lost in most varieties (except in the Wu and Old Xiang groups and some Gan dialects), this distinction became phonemic, yielding up to eight tonal categories, with a six-way contrast in unchecked syllables and a two-way contrast in checked syllables. Cantonese maintains these tones and has developed an additional distinction in checked syllables, resulting in a total of nine tonal categories. However, most varieties have fewer tonal distinctions. For example, in Mandarin dialects the lower rising category merged with the departing category to form the modern falling tone, leaving a system of four tones. Furthermore, final stop consonants disappeared in most Mandarin dialects, and such syllables were reassigned to one of the other four tones.

Changes from Old to Modern Chinese

Middle Chinese had a structure similar to many modern varieties, especially conservative ones like Cantonese, with largely monosyllabic words, little or no derivational morphology, three tones, and a syllable structure consisting of initial consonant, glide, main vowel and final consonant, with a large number of initial consonants and a fairly small number of final consonants. Without counting the glide, no clusters could occur at the beginning or end of a syllable.

Old Chinese, on the other hand, had a significantly different structure. There were no tones, a smaller imbalance between possible initial and final consonants, and many initial and final clusters. There was a well-developed system of derivational and possibly inflectional morphology, formed using consonants added onto the beginning or end of a syllable. The system is similar to the system reconstructed for Proto-Sino-Tibetan and still visible, for example, in Classical Tibetan; it is also largely similar to the system that occurs in the more conservative Austroasiatic languages, such as modern Khmer.

The main changes leading to the modern varieties have been a reduction in the number of consonants and vowels and a corresponding increase in the number of tones (typically through a Pan-East-Asiatic tone split that doubled the number of tones and eliminated the distinction between voiced and unvoiced consonants). That has led to a gradual decrease in the number of possible syllables. Standard Mandarin has only about 1,300 possible syllables, and many other varieties of Chinese even fewer (for example, modern Shanghainese has been reported to have only about 700 syllables). The result in Mandarin, for example, has been the proliferation of the number of two-syllable compound words, which have steadily replaced former monosyllabic words; most words in Standard Mandarin now have two syllables.

Middle Chinese continues to be attested through at least 1150 CE. Old Chinese meanwhile, is much older:

Old Chinese, also called Archaic Chinese in older works, is the oldest attested stage of Chinese, and the ancestor of all modern varieties of Chinese. The earliest examples of Chinese are divinatory inscriptions on oracle bones from around 1250 BC, in the Late Shang period. Bronze inscriptions became plentiful during the following Zhou dynasty [ed. 1046 BCE to 256 BCE]. The latter part of the Zhou period saw a flowering of literature, including classical works such as the Analects, the Mencius, and the Zuo Zhuan. These works served as models for Literary Chinese (or Classical Chinese), which remained the written standard until the early twentieth century, thus preserving the vocabulary and grammar of late Old Chinese.

The transition from Middle Chinese to modern Chinese topolects appears to date from sometime between the 10th and 13th centuries.

After the fall of the Northern Song dynasty and subsequent reign of the Jurchen Jin and Mongol Yuan dynasties in northern China, a common speech (now called Old Mandarin) developed based on the dialects of the North China Plain around the capital. The 1324 Zhongyuan Yinyun was a dictionary that codified the rhyming conventions of new sanqu verse form in this language. Together with the slightly later Menggu Ziyun, this dictionary describes a language with many of the features characteristic of modern Mandarin dialects.

Up to the early 20th century, most Chinese people only spoke their local variety. Thus, as a practical measure, officials of the Ming and Qing dynasties carried out the administration of the empire using a common language based on Mandarin varieties, known as 官话; 官話; Guānhuà; 'language of officials'. For most of this period, this language was a koiné based on dialects spoken in the Nanjing area, though not identical to any single dialect. By the middle of the 19th century, the Beijing dialect had become dominant and was essential for any business with the imperial court.

In the 1930s, a standard national language (国语; 國語; Guóyǔ), was adopted. After much dispute between proponents of northern and southern dialects and an abortive attempt at an artificial pronunciation, the National Language Unification Commission finally settled on the Beijing dialect in 1932. The People's Republic founded in 1949 retained this standard but renamed it 普通话; 普通話; pǔtōnghuà; 'common speech'. The national language is now used in education, the media, and formal situations in both mainland China and Taiwan.

In Hong Kong and Macau, Cantonese is the dominant spoken language due to cultural influence from Guangdong immigrants and colonial-era policies, and is used in education, media, formal speech, and everyday life—though Mandarin is increasingly taught in schools due to the mainland's growing influence.

Influence

Historically, the Chinese language has spread to its neighbors through a variety of means. Northern Vietnam was incorporated into the Han dynasty (202 BCE – 220 CE) in 111 BCE, marking the beginning of a period of Chinese control that ran almost continuously for a millennium. The Four Commanderies of Han were established in northern Korea in the 1st century BCE but disintegrated in the following centuries. Chinese Buddhism spread over East Asia between the 2nd and 5th centuries CE, and with it the study of scriptures and literature in Literary Chinese. Later, strong central governments modeled on Chinese institutions were established in Korea, Japan, and Vietnam, with Literary Chinese serving as the language of administration and scholarship, a position it would retain until the late 19th century in Korea and (to a lesser extent) Japan, and the early 20th century in Vietnam. Scholars from different lands could communicate, albeit only in writing, using Literary Chinese.

Although they used Chinese solely for written communication, each country had its own tradition of reading texts aloud using what are known as Sino-Xenic pronunciations. Chinese words with these pronunciations were also extensively imported into the Korean, Japanese and Vietnamese languages, and today comprise over half of their vocabularies. This massive influx led to changes in the phonological structure of the languages, contributing to the development of moraic structure in Japanese and the disruption of vowel harmony in Korean.

Borrowed Chinese morphemes have been used extensively in all these languages to coin compound words for new concepts, in a similar way to the use of Latin and Ancient Greek roots in European languages. Many new compounds, or new meanings for old phrases, were created in the late 19th and early 20th centuries to name Western concepts and artifacts. These coinages, written in shared Chinese characters, have then been borrowed freely between languages. They have even been accepted into Chinese, a language usually resistant to loanwords, because their foreign origin was hidden by their written form. Often different compounds for the same concept were in circulation for some time before a winner emerged, and sometimes the final choice differed between countries. The proportion of vocabulary of Chinese origin thus tends to be greater in technical, abstract, or formal language. For example, in Japan, Sino-Japanese words account for about 35% of the words in entertainment magazines, over half the words in newspapers, and 60% of the words in science magazines.

Vietnam, Korea, and Japan each developed writing systems for their own languages, initially based on Chinese characters, but later replaced with the hangul alphabet for Korean and supplemented with kana syllabaries for Japanese, while Vietnamese continued to be written with the complex chữ Nôm script. However, these were limited to popular literature until the late 19th century. Today Japanese is written with a composite script using both Chinese characters called kanji, and kana. Korean is written exclusively with hangul in North Korea, although knowledge of the supplementary Chinese characters called hanja is still required, and hanja are increasingly rarely used in South Korea. As a result of its historical colonization by France, Vietnamese now uses the Latin-based Vietnamese alphabet.

Why Is Cantonese linguistically conservative?

The Sino-Tibetan language family has its origins in Northern China, and not Southern China. Prior to the arrival of the Han Chinese in 214 BCE, languages belonging the Thai language family thrived where Cantonese is now spoken. Other language families now spoken mostly in Southeast Asia also had their origins in Southern China. 

Presumably the preservation of ancient pronunciations in Cantonese reflects the general linguistic principle the historic features of languages tend to be more strongly preserved on frontiers of a language's range, than near the core of its range. See also Appalachian English (which is the modern English dialect closest to Shakespearian English), New Mexican Spanish (which is the only place that some archaic features of the Spanish language have survived), and the Icelandic language (which is closest to Old Norse, the Germanic proto-language).

The Chinese Min languages, which have some features that predate Middle Chinese, are likewise Chinese languages that developed on the frontier of the range of Chinese languages.

Many Min languages have retained notable features of the Old Chinese language, and there is linguistic evidence that not all Min varieties are directly descended from Middle Chinese of the SuiTang dynasties. Min languages are believed to have a significant linguistic substrate from the languages of the inhabitants of the region before its sinicization. . . .

The Min homeland of Fujian was opened to Han Chinese settlement by the defeat of the Minyue state by the armies of Emperor Wu of Han in 110 BC. The area features rugged mountainous terrain, with short rivers that flow into the South China Sea. Most subsequent migration from north to south China passed through the valleys of the Xiang and Gan rivers to the west, so that Min varieties have experienced less northern influence than other southern groups. As a result, whereas most varieties of Chinese can be treated as derived from Middle Chinese—the language described by rhyme dictionaries such as the Qieyun (601 AD)—Min varieties contain traces of older distinctions. Linguists estimate that the oldest layers of Min dialects diverged from the rest of Chinese around the time of the Han dynasty. However, significant waves of migration from the North China Plain occurred: 
Jerry Norman identifies four main layers in the vocabulary of modern Min varieties: 
  1. A non-Chinese substratum from the original languages of Minyue, which Norman and Mei Tsu-lin believe were Austroasiatic.
  2. The earliest Chinese layer, brought to Fujian by settlers from Zhejiang to the north during the Han dynasty (compare Eastern Han Chinese).
  3. A layer from the Northern and Southern dynasties period, which is largely consistent with the phonology of the Qieyun dictionary (Early Middle Chinese).
  4. A literary layer based on the koiné of Chang'an, the capital of the Tang dynasty (Late Middle Chinese).
Laurent Sagart (2008) disagrees with Norman and Mei Tsu-lin's analysis of an Austroasiatic substratum in Min. The hypothesis proposed by Jerry Norman and Mei Tsu-lin arguing for an Austroasiatic homeland along the middle Yangtze has been largely abandoned in most circles and left unsupported by the majority of Austroasiatic specialists. Rather, recent movements of analyzing archeological evidence, posit an Austronesian layer, rather than an Austroasiatic one.
As a footnote, I have argued repeatedly that the highly divergent features of the Anatolian language relative to the time when these languages are first attested and when the first archaeological evidence of Indo-European cultural litmus tests in Anatolia appear, is due to language contact with languages such as the pre-Hittite Hattic language which is very different from the substrate Indo-European languages of Europe. 

But another possibility is that the Anatolian languages preserved archaic features of the language family because they were located at a language frontier of the Indo-European linguistic range. Again, not because they have so much more time depth than the Indo-European languages of Europe, but because they were isolated on the frontier from developments in the Indo-European linguistic community elsewhere, which were more connected to each other, that led those languages to collectively evolve in similar ways.