Dispatches From Turtle Island: January 2015

Wednesday, January 28, 2015

Reviewing the State of QCD

A nice review article sums up the state of QCD (the part of the Standard Model pertaining to the strong force). Craig D. Roberts, "Hadron Physics and QCD: Just the Basic Facts" (January 26, 2015). The abstract is as follows (typo in original corrected):

With discovery of the Higgs boson, the Standard Model of Particle Physics became complete. Its formulation is a remarkable story; and the process of verification is continuing, with the most important chapter being the least well understood. Quantum Chromodynamics (QCD) is that part of the Standard Model which is supposed to describe all of nuclear physics and yet, almost fifty years after the discovery of quarks, we are only just beginning to understand how QCD moulds the basic bricks for nuclei: pions, neutrons, protons. QCD is characterized by two emergent phenomena: confinement and dynamical chiral symmetry breaking (DCSB), whose implications are extraordinary. This contribution describes how DCSB, not the Higgs boson, generates more than 98% of the visible mass in the Universe, explains why confinement guarantees that condensates, those quantities that were commonly viewed as constant mass-scales that fill all spacetime, are instead wholly contained within hadrons, and elucidates a range of observable consequences of confinement and DCSB whose measurement is the focus of a vast international experimental programme.

In theory, QCD is a solved problem.

We believe that we know the exact form of all of the equations and we have decent experimental measurements of all of the relevant Standard Model fundamental physical constants that go into those equations (some of which are redundant degrees of freedom): the strong force coupling constant, the two electroweak coupling constants, the six quark masses, the three charged lepton masses, the W boson, Z boson, and Higgs boson masses, the Higgs field vacuum expectation value, the four parameters of the CKM matrix, and a few general purpose constants not particular to the Standard Model like the speed of light, Planck's constant, and pi. We would like more precisely measurements of all of them, but that only limits the precision of the calculations we can do in the Standard Model.

(The other fundamental physical constants, the three absolute neutrino masses, the four parameters of the PMNS matrix, Newton's constant, and the cosmological constant, are not pertinent to QCD.)

But the promise of QCD to explain all of nuclear physics from first principles has yet to be realized in practice. This is mostly because the mathematics is hard, largely because (1) higher order terms in the relevant infinite series approximations of the equation's predictions are more material, (2) the self-interactions of gluons greatly complicate the equations, and (3) some of the key physical constants, such as the light quark masses and the strong force coupling constant, aren't known with great precision.

These obstacles are interrelated. Self-interactions are one of the reasons that higher order terms are more relevant, and the difficulty involved in doing the calculations is one of the reasons that the fundamental physical constants inferred from observable like hadron masses aren't very precisely known.

For example, the rest masses of the proton, the neutron, and the electron are all known with eight significant digit accuracy, and we know the masses of many of the baryons and mesons predicted by QCD and the quark model to four to six significant digits (often +/- 1 MeV in absolute terms and at least seven baryons and fifteen mesons, disproportionately made up of hadrons that include only up, down and strange quarks for which fundamental mass determinations are particularly imprecise, have absolute mass measurements of better than +/- 0.15 MeV precision).

But, the strong force coupling constant and the mass of the proton and neutron calculated from first principles using QCD are only known with an accuracy of about 1% (about 10 MeV).

Similarly, the least accurately known mass of a hadron made up only of up and down quarks (the spin-3/2 delta baryon), has an experimental measurement accuracy of about 0.2% (about 2 MeV) and a theoretically predicted mass (calibrated using the proton and neutron masses) that differs from the experimental value by roughly 3% (about 30 MeV). The linked paper in this paragraph explains at some length why these calculations are so difficult.

Tuesday, January 27, 2015

Observed CP Violation In Hadron Decays Still Consistent With Standard Model

Background on CP violation in the Standard Model

The Standard Model of Particle Physics includes a three by three matrix called the CKM matrix, that determines the probability that an up quark that emits a W+ boson will transform into each of the three possible down type quarks, and the probability that a down type quark that emits a W- boson will transform into each of the three possible up type quarks (the probability is equal to the square of the relevant entry in the matrix).

It is true by definition that this matrix can be fully explained with four parameters and that there are an infinite number of ways to do this, although only two or three ways of doing so are commonly used. One common approach is to use three real valued mixing angles and one imaginary valued CP violation phase.

The CP violation phase makes the probability of a CKM matrix with a term containing larger in the case of a particular starting point quark to a particular finishing point quark than the same transition in the reverse direction (i.e. it is equivalent to time asymmetry). This is the only source of CP violation in the Standard Model. Any other kind of observed CP violation would constitute "New Physics" beyond the Standard Model.

When applied to actual weak force decays of hadrons, the Standard Model predicts that baryons, vector mesons and charged mesons will have generally quite small CP violation which current experiments will struggle to detect except in the case of the heaviest B mesons, but that there will be significant observable CP violation in the decay of neutral pseudoscalar mesons, i.e., neutral kaons, neutral D mesons, and neutral B mesons (with and without strange quark components respectively).

In fact, five sigma evidence of CP violation at Standard Model predicted rates (the particle physics gold standard for calling a phenomena a "discovery" rather than a mere potential statistical fluke) has been observed in neutral kaons, neutral B mesons (with and without strange quark components), and even in charged B mesons.

The Latest Experimental Results For Neutral D Mesons

But, CP violation has not yet been observed in neutral D mesons, according to the most LHC results announced in a preprint today. The new results continue to show CP violations that are consistent to within experimental error with zero in neutral D meson decays (consistent with past attempts to measure the same thing that have been ongoing since the charm quark was first discovered).

This non-observation of CP violation, however, turns out to confirm the Standard Model, rather than contradicting it. This is because the expected amount of CP violation in D meson decays is on the order of 0.1% of the decay rates or less (although this rate hasn't successfully been calculated very accurately because QCD calculations is very difficult). But, current experimental measurements of these decay rates are insufficiently precise to make a statistically significant detection of this subtle amount of CP violation.

Why Is CP Violation Hard To See In D Mesons?

Why is CP violation so uncommon in the decays of neutral D mesons, compared to other neutral mesons?

Among the four mixing systems (K0, D0, Bd and Bs) the D0 system is in a sense unique. The mixing mechanism relies on internal d-type quarks; due to the smaller mass of the b quark compared to the t quark, the kinematics of the dispersive part of the mixing amplitude are not completely dominated by the heavy third generation quark. Furthermore, due to the specific structure of the Cabibbo-Kobayashi-Maskawa couplings, the absorptive part Γ12 will feature an extremely efficient Glashow-Iliopoulos-Maiani (GIM) mechanism. We will discuss this in detail later on and show that it leads to a suppression of the leading contribution by several orders of magnitude.

Put another way, CP violation is generally greater in heavier mesons, and the fact that B mesons are much heavier than D mesons makes the CP violation in D mesons easier to see.

Neutral kaons are an exception to this rule because they are really a linear combination of two different kinds of mesons (the K long and the K short), which decay at very different rates (the decay rates differ by a factor of 500) and we are seeing the CP violation in the disparity between K long and K short decays (rather than within the subgroup of K long, or K short decays), whereas in D and B mesons, we are measuring CP disparities in the decays of a single kind of meson with a single decay rate.

Prospects For Future Discoveries Of CP Violation In D Mesons

Experimental measurements need to be on the order of 100 times more precise than existing ones to be confidently expected to directly or indirectly observe CP violation in D meson decays (and perhaps the decays of some other kinds of hadrons) at the levels that the Standard Model predicts theoretically when the Standard Model's CP violating phase is fitted to the observed levels of CP violation in B meson and neutral kaon decays.

Even the full LHC data set, the LHC is predicted to be just barely be able to discovery CP violation in D meson decays if it can see them at all, given its expected sensitivity of about 10 times more precision than the results reported in preprint form today. Direct observation of this phenomena predicted by the Standard Model would require a new collider that is much more powerful than the LHC (which the most ambitious fundamental physics experiment in human history).

Thus, while further LHC measurements are expected to put even tighter bounds on beyond the Standard Model physics models, they are not expected to directly observe CP violation in neutral D meson decays.

Of course, any beyond the Standard Model physics that enhances CP violation in neutral D mesons by a factor of 10 to 100 or so, will be detected at the LHC.

Implications for BSM physics

While direct or indirect confirmation that there is CP violation in neutral D meson decays does not yet exist, the experimental measurements to date can rule out any beyond the Standard Model theory that predicts a significant enhancement of CP violation in neutral D mesons relative to the Standard Model expectation greater than about 0.3%, or that otherwise deviates materially from the Standard Model expectation for CP violation in the instances where CP violation has been observed.

CP violation is particularly sensitive to the existence of undiscovered heavy quarks and these measurements provided some of the earliest predictions of the charm and top quarks. The non-detection of elevated levels of CP violation, therefore, tends to suggest particularly strongly that there really are just three generations of quarks, rather than four or more.

Sunday, January 25, 2015

Polynesian Ancient DNA

A review of ancient DNA tests from Polynesia reveals that the human samples date from early contact with Europeans and unsurprisingly is essentially the same as modern samples. But, modern human samples from the region and animal DNA samples from Polynesia have shed light on the subject. The following translation of the linked blog post per Google translate states (emphasis added) that:

Archaeology has shown that the population of Oceania is made from Oceania close and not from America. This has been validated by the first genetic studies that have highlighted the Polynesian mitochondrial pattern. Thus the first human settlement of the area dated from 50,000 to 30,000 years has been associated with mitochondrial haplogroups M, O, P and S in Australia, and P, Q and some specific branches of M (M27, M28 and M29) in New Guinea and Near Oceania, and the Y chromosome haplogroups K, M and C.

Population genetics of Polynesia has also shown that the origin of this population is in Southeast Asia and more specifically on the island of Taiwan. And mitochondrial haplogroup B4a1a1 is characteristic of these populations and defined the Polynesian motif. The ancestral haplogroup B4a1a is clearly of Asian origin, and is located in Taiwan and the Southeast Asian islands. The Y chromosome haplogroup O also corresponds to the dispersion of the Austronesian Pacific.

Genetic studies on the whole genome were also performed. So they have shown that Asian ancestry remains low among populations of New Guinea (under 20%). By cons in Polynesian populations, Asian ancestry is 87% while the descent near Oceania reached only 13%. . . .

The first approach was to study the animals and plants associated with human migrations. Thus the first animal study was the Pacific rat that served food in the area. This animal does not exist in Taiwan and therefore had to be incorporated in the culture Lapita its diffusion route. Mitochondrial analysis of Pacific rats did not show a single original home, but several distinct populations. The pig genetic studies have shown an origin in Vietnam. Chicken bones were found on Lapita sites in Oceania near and far. The mitochondrial DNA testing on old remnants of chickens showed that there were at least two different strains of chickens. Surprisingly, chicken bones were also found in Chile in pre-Columbian sites before the arrival of Europeans. Mitochondrial DNA of these Chilean remains belong to the same lines as the remains of the Oceania Lapita culture, which seems to show that the Polynesians reached America before the Europeans.

So, there is new ancient DNA confirmation from pre-Columbian chicken bones in Chile of Polynesian contact with South America. But, pre-Columbian Polynesian contact still dates only to sometime in the vicinity of 800 CE to 1200 CE, three to seven centuries before Columbus arrives, and at about the same time, plus or minus a couple of centuries, as Lief Erikson reached North America from Iceland with a similarly minor amount of sustained impact.

Near Oceania, in this context, refers to Pacific Islands settled before the Austronesians arrived, a region that includes the island of Yap discussed in the previous post at this blog (which is point number 12 on the map in this 1998 open access genetics paper).

Saturday, January 24, 2015

Do Glottal Consonants Tell Important Deep Historical Linguistic Tales?

Glottal consonants, also called laryngeal consonants are consonants articulated with the glottis. They come in three subtypes: ejectives, implosive, and glottalized resonants (also explained here).

Were Glottal Consonants Present In Proto-Indo-European?

One of the leading theories regarding the Proto-Indo-European language is that it contained laryngeal consonants which were lost in all of the successor Indo-European language families except the Anatolian languages, but whose loss explains grammatical irregularities in the daughter language families. Further, some loan words from Proto-Indo-European into the Uralic languages may have carried traces of lost laryngeal consonants.

Hittite and the other Anatolian languages are the only Indo-European languages where at least some of them are attested directly and consistently as consonantal sounds. Otherwise, their presence is to be inferred mostly through the effects they have on neighboring sounds, and on patterns of alternation that they participate in. When a laryngeal is attested directly, it is usually as a special type of vowel and not as a consonant.

The simultaneous loss of laryngeal consonants in all of the daughter language families but the Anatolian languages after they broke away from Proto-Indo-European (e.g. Sanskrit, Iranian, Tocharian, Germanic, Balkan, Italic, Celtic and Greek), however, might arguably be less plausible than a substrate or areal influence limited to the Anatolian languages and Eastern Armenian, as the substrate languages of Anatolia were probably part of the modern Caucasian language families.

Substrate influences from a radically different substrate language family than other Indo-European languages may also explain why the Anatolian languages seem so lexically divergent from the other Indo-European languages.

The time depth of Tocharian's connection to Proto-Indo-European (ca. 3500 BCE or earlier) is particularly notable in this regard, since the extinct Anatolian languages are themselves not attested before 2000 BCE when there is historically attested evidence of a major Hittite expansion, around the same time that Indo-European Greek started to replace the pre-Greek languages of the Aegean, and around the same time that archaeological evidence of Sanskrit associated civilizations appears in the Cemetery H culture of Northwest India.

In contrast, the expansion of Indo-European languages into Central and Eastern Europe as part of the Corded Ware culture dates to around 3500 BCE (most of these languages were replaced by Slavic languages in the 1st millennium CE in an event that also had a meaningful demic component) and one or two of the Corded Ware languages, in turn, probably provided a source for the Germanic, Italic and Celtic branches of the Indo-European language family, that started to expand into Europe in the very late Bronze Age and Iron Age (ca. 1300 BCE).

I am personally inclined to think that laryngeal consonants in Proto-Indo-European, if they were present at all, probably derive from and were limited to loan words from neighboring North Caucasian language, that were quickly shed upon being borrowed by the Indo-Europeans who lacked that phoneme and did not manage to reproduce it as the words were adopted in their own language, and in some cases passed these words, in turn, along to the Uralic languages with similar traces of the laryngeal consonants that were once present in the Caucasian sources for the wanderwort.

What Else Do Glottal Consonants Suggest About Historical Linguistics?

In General

1. Glottal consonants, like click phonemes, appear to be easier for a language to lose than to gain in the absence of strong areal or substrate influences. Every language family where they are found has members that lack them. And, there are large entire language families (counting language isolates as language families) that seem to lack them, or lack them with a small number of easily explained outliers:

* Modern Indo-European languages (except Sindhi and Eastern Armenian)
* Tocharian
* Basque
* Uralic
* narrow Altaic (i.e. Turkish and Mongolian languages)
* Siberian languages (with one single exception in one subfamily probably due to areal effects from North America)
* Inuit
* Tibeto-Burmese (with one small exception in Burma probably due to substrate influence)
* Austronesian (with one small exception on Yap island probably due to substrate influence)
* Papuan languages (with one exception)
* Australian aboriginal languages
* Munda languages (which probably lost implosives in its parent language due to substrate influence)
* Semitic (with two small exceptions in Ethiopia probably due to substrate influences and one exception on an island off Arabia due to Ethiopian areal influence)
* Berber
* Coptic
* Dravidian languages
* Many of the Native American language families of Eastern North America.

In Africa

2. Implosives are found in a large enough share of the Niger-Congo languages and Nilo-Saharan languages to suggest that they were present in the proto-languages of each and may have a common origin.

3. Ejectives are found in seven of the Khoisan languages broadly defined as click consonant languages that are not a part of another major language family (one of which also uses implosives and one of which also uses glottalized resonants), and the Zulu language of South Africa which a Bantu language with a local substrate that used the Khoisan click consonants. It is fair to guess that ejectives were present in a proto-language that is the parent to all of the click languages of Africa. This language probably originated in East Africa or Ethiopia and subsequently migrated to South Africa.

4. Ejectives are found in all of the language families centered in Ethiopia: two Ethio-Semitic languages, four Omotic languages (two of which use implosives as well), and three Cushitic languages, in Ethiopia, Kenya and Tanzania respectively (each of which also use implosives).

The Chadic languages show linguistic and population genetic signs of being derived in substantial part from the Cushitic language. Two Chadic languages use both ejectives and implosives.

Taking the evidence as a whole, it seems likely that the Proto-Cushitic language, the Proto-Chadic language derived from a Cushitic language, and the Proto-Omotic languages all used both ejectives and implosives (possibly due to a substrate from a sister language family to the Khoisan languages which the click consonants were lost).

5. The Ethio-Semitic languages that have ejective consonants probably do so due to substrate influence that had either already lost the implosives, or lost them at the time of Ethio-Semitic language shift. These languages did not exist more than 3500 years ago and replaced languages that were probably similar to Cushitic and/or Omotic.

6. The presence of ejectives and glottalized resonants in the South Arabian Semitic language on the island of Soqotri is probably an areal effect from Ethiopian Cushitic influence (the Omotic languages of Ethiopia, the Chadic languages, and the Khoisan languages of East Africa are all comparatively geographically remote) sometime in the last 3500 years.

7. Implosives and ejectives are found in three Ethiopian Nilo-Saharan languages, with the ejectives probably due to substrate or areal effects since the ejectives are absent in other Nilo-Saharan languages, and Nilo-Saharan languages are a relatively late arrival to Ethiopia compared to the Afro-Asiatic languages found there (apart from the Ethiosemitic languages).

Similarly, the presence of glottal resonants in one Nilo-Saharan language spoken near the trinational boundary of Uganda, the Democratic Republican of Congo and South Sudan, in addition to implosives in that language is probably due to substrate or areal effects.

8. Glottal consonants are not found in the Northern Afro-Asiatic languages: Berber, Coptic (i.e. Ancient Egyptian), or any Semitic languages other than Soqotri and the Ethio-Semitic languages, and hence were probably absent from all of these proto-languages.

But, contrary to this hypothesis, the earliest attested Semitic language, Akkadian, which is well attested in writing, appears to have had an implosive glottal consonant. There also appears to be temporal and geographic variation in the use of glottal consonants in different dialects of Arabic (e.g. with some dialects losing a previously attested glottal stop).

Likewise, liturgical Coptic, now used mostly by the Ethiopian Orthodox church, has both implosive and ejective glottal consonants, although it is not clear if the fact that the Coptic language is now used liturgically by people who have those consonants in their native living languages means that it is a reliable indicator of the consonants present in the ancient Egyptian language thousands of years before it was used in Ethiopia. Since the hieroglyphic writing system was not purely phonetic, this is not easy to determine despite the abundance of available written materials in Coptic.

It could be that the Afro-Asiatic languages that became Berber, Coptic and Semitic originally had a full range of glottal consonants but lost them as these languages were eventually adopted by common people learning them as second languages in regions where these consonants did not exist in now lost substrate languages of North Africa and the Near East.

In The New World

9. The presence of glottal consonants in Na-Dene languages appears to arise from substrate or areal effects of other pre-existing Native American languages. The Na-Dene are heavily admixed with the founding population of the Americas and have been present in the Americas long enough for this to be plausible. The lack of glottal consonants in all but one Siberian language very close to North America, also suggests that Yenesian is the parent language family (as opposed to a back migrating language family from North America, a conclusion also supported by Ket and Na-Dene population genetics).

Another possible substrate influence is that the Yenesian Ket language has a sex based grammatical gender system, while the Na-Dene language, like all of the Native American languages in the area where the Na-Dene languages are spoken, does not.

The linguistic distinction between non-sex based gender systems or more than three genders for male, female and neuter (a common feature, for example, of Niger-Congo languages, Papuan languages, abd Australian Aborginal languages), however, and noun cases that are not called genders (which are numerous, for example in Caucasian and Dravidian languages) which are present in many other languages, is a distinction without a difference in my opinion, that obscures possible relationships between languages based merely on regional conventions about how grammatical features are named.

10. The proto-Amerind languages ca. 20,000 years ago probably had ejective glottal consonants and glottalized resonants, both of which are widespread in the Americas, but some of which were lost in subsequent daughter languages or language families.

Implosive glottal consonants are found in nine Native American languages: alone in two Brazilian and one Bolvian language; together with glottalized resonants in one in Northern Californian language, one language on the Yucatan Peninsula in Southern Mexico, and one language over the Southern Mexican border in Guatamala; and with both glottalized resonants and ejectives in three more Native American languages, one in Washington State, one in Southern Mexico, and one in Brazil near the Bolivian border.

This pattern tends to favor a hypothesis in which there was population structure between the Pacific route subpopulation of the founding population of the Americas which had all three kinds of glottal consonants, which its daughter languages preserved in various combinations, while losing others. In contrast, the North American Native Americans who took an Eastern route and then migrated West back towards the Pacific across North America, either was from a subpopulation that never had implosive glottal consonants, or lost them early on.

The non-Pacific route Native Americans are also notable for having mtDNA X2a, which is not present in Southern coastal route intermediate populations between West Asia and the Bering Strait, unlike most of the other Y-DNA and mtDNA clades found in Native Americans. One could imagine a scenario in which the Bering land bridge is home to an mtDNA X2a rich population with stronger North Asian influences that loses the implosive glottal consonants of the Native American founding population, and another one on the Bering land bridge without mtDNA X2a or the same amount of North Asian influences that retains implosive glottal consonants.

In Eurasia

11. Languages with ejectives are found in ten languages of the Caucasus, both North Caucasian and South Caucasian, as well as geographically adjacent Eastern Armenian (an Indo-European language clearly experiencing an areal effect).

This is one of many distinguishing features of the languages of the Caucasus Mountains that point to the Northwest Caucasian languages, the Northeast Caucasian languages (a.k.a. Nakh-Daghestanian), and the South Caucasian languages (a.k.a. Kartvellian) all having a deep common origin.

These languages are the only extant plausible source for glottal consonant influences on Proto-Indo-European of any kind, if there was one.

The Caucasian languages could have developed glottal consonants independently in deep linguistic history (the early Neolithic at least), or these consonants could have been present in an early Out of Africa population which lost implosive glottal consonants and glottal resonants, but unlike other Eurasian languages retained ejective consonants.

12. Implosives are found in the Indo-European Sanskrit derived Sindhi language of Southern Pakistan. This could be a residual result of linguistic borrowing through maritime trade with Ethiopia (which dates back to the Copper Age, at least) or trade with Southeast Asia (which is probably at least 1000 years old) or both.

13. Ejective consonants in the Churkotko-Kamachatkan language called Itelmen spoken on the Kamchatkan Penninsula adjacent to the Bering strait are probably an areal effects involving back migration from, or contact with, Alaskan languages with these consonants are common.

14. The presence of ejective glottal consonants in Korean is probably the hardest to explain of the data points. Korean is known to probably derive from somewhere to the north of the Korean pennisula and it has been widely hypothesized that Korean's ultimate linguistic roots are in the Altaic language family from a homeland which autosomal genetic evidence suggests is also home to the modern Eurasians most closely related to the Native American founding population. An independent innovation in the Korean language itself is also possible.

15. While my primary source (WALS) does not identify any Japonic languages with glottal consonants, it appears that the Japanoic North Ryukyuan languages, such as the language of Okinawa, do have them and that these languages, in general, preserve features found in Old Japanese that are absent in modern Japanese. The fact that the North, rather than the South Ryukyuan languages have these consonants also suggests (in accord with other lines of evidence regarding the prehistory and ancient history of these islands) that glottal consonants in the North Ryukyuan likely derive from the language spoken by the Yaoyi migrants to Japan, rather than an areal influence from the island of Formosa (Taiwan) or Southern China, of some kind.

If both proto-Japanese and Korean both had glottal consonants, then the glottal consonants in these languages probably indicates a shared common origin (reasonably enough since the Japanese language almost certainly arrived in Japan via Korea) that explains these consonants in both languages (and disfavors North American areal effects).

16. Implosives alone area also found in ten languages of Southeast Asia, one of Southeast China, one of Taiwan, one of Indonesia, and one of Papua New Guinea. Glottalized resonants are found together with implosive glottal consonants in two Mon-Khmer languages of Vietnam (Khmu' and Sedang) and one Tai-Kadai language of Southwest China (Sui), Glottalized resonants alone are also found in two languages of Southeast Asia, one spoken in Burma (Chin) and one spoken in Malaysia (Semelai).

All of these probably have a common origin, particularly in light of the likelihood that all Southeast Asian, South Chinese and Formosan languages have genetic ties to each other that are distinct from Tibeto-Burmese ties. The presence of these consonants in Tibeto-Burmese languages is likely due to substrate influences as Tibeto-Burmese languages are comparatively recent arrivals in the region.

17. As discussed below, the use of ejective glottal consonants and glottal resonant consonants in Yapese is probably either an independent innovation arise after thousands of years of pre-Austronesian isolation in the island of Yap, or a deep substrate influence of an extremely conservative language feature lost everywhere else in successive language sweeps from tens of thousands of years ago.

18. Notably, it is unclear if the extinct Sumerian language had glottal consonants or not. If it did, then the apparent presence of glottal consonants in the Semitic Akkadian language that replaced Sumerian in Mesopotamia may have been a substrate influence particular to Akkadian, rather than a feature of the proto-Semitic language that withered away over time.

Overlapping Language Feature Distribution

The geographic distribution of languages with tone systems is similar, although not identical, to the geographic distribution of languages with glottal consonants. Both are most common in sub-Saharan Africa, Southeast Asia, and the Americas. But, the Chinese dialect family uses tone, while it does not utilize glottal consonants. As used in this sense:

Tone is the term used to describe the use of pitch patterns to distinguish individual words or the grammatical forms of words, such as the singular and plural forms of nouns or different tenses of verbs.

Post-Script on Yapese.

The Yapese language of the people of the island of Yap in the Pacific Ocean between Papua New Guinea and the Philippines fits clearly within the Oceanic subfamily of the Austronesian languages, but is a language isolate within that family. The fact that it makes widespread use of ejective glottal consonants and glottal resonants, as well as is used of a VSO word order unlike superficially similar Admiralty Island languages distinguish it.

There are no other Papuan or Austronesian languages with these consonants. But, like other Admiralty Island languages, it does appear to have a Papuan substrate, perhaps from a population that may have reached this island before the Austronesians arrived, although not with maritime technology sufficient to maintain ongoing ties to their place of origin.

On the whole, this looks like a case where this pre-Austronesian population was so isolated for so long that it either (1) retained highly conservative first wave hominin phonemes lost everywhere else in the region in one or more subsequent wave of migration or loss of language complexity, or (2) innovated independently in phonology in a manner not suppressed by influences from other cultures from whom it was isolated for so long.

[Yapese] belongs to the Austronesian languages, more specifically to the Oceanic languages. It has been suggested that Yapese may be one of the Admiralty Islands languages, though Ethnologue lists it as a language isolate within the Oceanic languages. The glottal stop is a leading feature of Yapese. Words beginning with a vowel letter (with a few grammatical exceptions) begin with a glottal stop. Adjacent vowels have the glottal stop between them. There are many word-final glottal stops.

Yapese is also notable for its VSO word order, in contrast with the SVO word order of most Admiralty Island languages. The same link notes that:

The Oceanic languages were first shown to be a language family by Sidney Herbert Ray in 1896 and, besides Malayo-Polynesian, they are the only established large branch of Austronesian languages. Grammatically, they have been strongly influenced by the Papuan languages of northern New Guinea, but they retain a remarkably large amount of Austronesian vocabulary.

There are about thirty Admiralty Island languages, with Yapenese and Nguluwan as outliers that might or might not fit. Many "of the Manus languages in the Admiralty Island language family have no bilabial trill or prenasalised consonants, but the Baluan-Pam language of that family does not and does have a glottal frictive consonant (h), but it is a very marginal phoneme in that language.

Yap is home to about 11,000 people today and probably far fewer in pre-modern times, and is known for its use of stone coins ranging in size from 1.4 inches to 12 feet in diameter. Per the Wikipedia article on Yap:

Yap was initially settled by ancient migrants from the Malay Peninsula, the Indonesian Archipelago, New Guinea, and the Solomon Islands. The people of Yap's outer islands are descendants of Polynesian settlers, and as such have significant ethnic dissimilarities from the people of Yap proper. Their culture and languages (Ulithian and Woleaian) are closely related to those of the neighboring islands of Chuuk. . . Yapese society is based on a highly complex caste system involving at least seven tiers of rank. Historically, the caste rank of an entire village could rise or fall in comparison to other villages depending on how it fared in inter-village conflicts. Winning villages would rise in rank as a part of a peace settlement, while losing villages would have to accept a decline in comparative rank. In many cases lower ranked villages were required to pay tribute to higher ranked villages. Further, dietary taboos might be imposed on lower ranking villages, i.e., they might be prohibited from harvesting and eating the more desirable fish and animals of the sea. Further, within each village each family had its own rank comparative to the others.

Until the arrival of the German colonizers, the caste ranking system was fluid and the ranks of villages and families changed in response to inter-village intrigues and confrontations. In the late 19th century, however, the German colonial administration "pacified" Yap and enforced a prohibition against violent conflict. The caste ranking of each village in modern Yap thus remains the same as it was when the system was frozen in place by the Germans. The result of the freeze left Yap with three highest ranks of the villages of Teb, Gachpar, and Ngolog. The village of Teb from the municipality of Tomil remains as the highest of the three. The first recorded sighting of Yap by Europeans came during the Spanish expedition of Álvaro de Saavedra in 1528. Its sighting was also recorded by the Spanish expedition of Ruy López de Villalobos on 26 January 1543, who charted them as Los Arrecifes ("the reefs"). At Yap the Villalobos' expedition received the same surprising greeting as previously in Fais Island from the local people approaching the ships in canoes: "Buenos días Matelotes!". Again, "Good day sailors!" in perfect sixteenth century Spanish evidencing previous presence of the Spaniards in the area. The original account of this story is included in the report that the Augustinian Fray Jerónimo de Santisteban, travelling with the Villalobos' expedition, wrote for the Viceroy of New Spain, while in Kochi during the voyage home.[13] Yap also appeared in Spanish charts as Los Garbanzos (The Chickpeas in Spanish) and Gran Carolina (Great Caroline in Spanish).

Nguluwan, per Wikipedia is: "a "mixed" language spoken on an atoll of that name between Yap and Palau. The grammar and lexicon are Yapese, but the phonology has been affected by Ulithian, and speakers are shifting to that language."

If I were looking for unusual genes outside of Africa, the people of the island of Yap would be a particularly fruitful place to look.

Appendix

Languages with Ejectives

Ejectives Without Other Glottal Consonants

Languages with ejectives are widespread throughout North America and South America.

Languages with ejectives are found in only two Asia languages, Korean and Itelmen, a Chukotko-Kamchatkan language spoken on the Kamchatkan Peninsula, adjacent to the Bering Strait.

Languages with ejectives are found in ten languages of the Caucasus, both North Caucasian and South Caucasian, as well as geographically adjacent Eastern Armenian.

Languages with ejectives are found in two of the many Ethio-Semitic languages (Tigre and Amharic), two of the Ethiopian Omotic languages (Kefa and Dizi), and a Nilo-Saharan Ethiopian language (Berta), as well as two cypto-Khoisan languages of Tanzani (Hadza and Sandewe), three Khoisan languages of Southern Africa.

Ejectives With Glottalized Resonants

Glottalized resonants are found together with ejective glottal consonants in seventeen Native American languages (eleven near the Pacific Coast of the Unites States; one near Canada's Arctic coast (Slave Lake, a Na-Dene language); three of which are in the Southern United States (Acoma, Wichita and Yuchi); and two near the Pacific Coast of South America (Jebero and Wichi).

Glottalized resonants are also found together with ejective glottal consonants in three other languages, one is a Khoisan language of South Africa, one is the South Arabian Semitic Soqotri language of an island off the Arabian coast. The third is found on the island of Yap in Micronesia called Yapese.

Glottalized Resonants Alone

Glottalized resonants alone are found in found in two Native American languages, one in Southern Mexico (Chinantec) near where Mazahua is spoken, and one in Brazil near the Bolivian border (Wari) near where Nambikuara is spoken.

Glottalized resonants alone are also found in two languages of Southeast Asia, one spoken in Burma (Chin) and one spoken in Malaysia (Semelai).

Implosives

Implosives Alone

Implosives alone are found in three languages of South America, two in Brazil and one in Boliva.

Implosives alone area also found in ten languages of Southeast Asia, in one of Southeast China, in one of Taiwan, in one of Indonesia, in one of Papua New Guinea, and in one of Southern Pakistan (Sindhi).

Implosives alone are also found in many Niger-Congo of West Africa, Central Africa, and many Nilo-Saharan languages East Africa (as well as a couple of languages attributed to the small Kadu language family sometimes classified as Niger-Congo and sometimes classified as Nilo-Saharan.

Implosives With Glottalized Resonants

Glottalized resonants are found together with implosive glottal consonants in two Mon-Khmer languages of Vietnam (Khmu' and Sedang) and one Tai-Kadai language of Southwest China (Sui), as well as in one Nilo-Saharan language spoken near the trinational boundary of Uganda, the Democratic Republican of Congo and South Sudan (Lugbara).

Implosives With Ejectives

Implosives and Ejectives Without Glottalized Resonants

Implosives and ejectives are found in three Native American languages (one in Northern California, one on the Yucatan Peninsula in Southern Mexico and one just over the Southern Mexican border in Guatamala).

Implosives and ejectives are also found in eleven African languages: two Chadic (Hausa and Kotoko), a Khoisan language in Southern Africa (Deti), a Bantu language in Southern Africa (Zulu), the Cushitic Oromo language in Ethiopia, a Cushitic language in Tanzania (Iraqw), a Cushitic language in Kenya (Dahalo), two Nilo-Saharan languages (Komo and Ik) near the South Sudan-Ethiopian border, two Omotic language of Ethiopia (Kullo and Hamer).

Ejectives With Glottalized Resonants and Implosives:

Ejectives, Glottalized Resonants and Implosives are all found together only in three Native American languages, one in Washington State (Lushootseed), one in Southern Mexico (Mazahua), and one in Brazil near the Bolivian border (Nambikuara).

Friday, January 23, 2015

The Economics Of Caste Formation and Maintenance In India

An interesting new economics article attempts to elucidate economic forces that would lead to the formation and continued survival of India's hereditary caste system.

While the suggestions of the factors at play are highly speculative, and the mathematical analysis included in the paper is really mere sophisticated window dressing for a set of ideas that can be just as viably explained in word, the speculation is nontheless interesting.

One of the lessons that economics teaches us is that actions that appear to be economically irrational are often due to our inferior understanding of the details of the transaction. Few pervasive and stable economic institutions are actually economically irrational, although the tradition narrative for why any particular economic or political institutions works is often mostly or completely wrong.

Indeed, it isn't implausible that the system was devised for reasons entirely different than those stated in this article, but survived because the economic logic set forth there works.

The article is Chris Bidner and Mukesh Eswaran, "A Gender-Based Theory of the Origin of the Caste System of India." (December 11, 2012). The abstract is as follows:

This paper proposes a theory of the origins of India’s caste system by explicitly recognizing the productivity of women in complementing their husbands’ skills. We explain the emergence of caste and also the core features of the caste system: its hereditary nature, its insistence on endogamy (marriage only within castes), and its hierarchical character. We demonstrate why the caste system requires the oppression of women to be viable: punishments for violations of endogamy are more severe for women than for men. When there are such violations, our theory explains why hypergamy (women marrying up) is more acceptable than hypogamy (women marrying down). Our model also speaks to other aspects of caste, such as notions of purity, pollution, commensality restrictions, and arranged/child marriages. We also suggest what made India’s caste system so unique and durable. Finally, our theory shows that, contrary to claims made by the most dominant anthropological theory, economic considerations were of utmost importance in the emergence of the caste system.

Also, while it isn't very relevant to the conclusion, which is simply including background for an economics analysis as opposed to discussing history or anthropology per se, the paper's discussion of the history of caste set forth in the introduction is not well supported by the overall academic literature in the subject. The introduction states (footnotes omitted) that:

Historians of caste since the 19th Century had long argued that the caste system arose after an Aryan invasion from the north-west around 1,500 BCE after which the victors imposed an oppressive system on the vanquished. This was a conjecture based on references in the Rig Veda, the earliest of Hindu scriptures, to an Aryan race. However, this claim has been largely discredited in recent decades. There is no archeological evidence of any such invasion; the Vedic culture, which started after 1,500 BCE and which spawned the caste system, seems to have been an indigenous innovation of an earlier culture at Harappa [Shaffer (1984), Shafer and Lichtenstein (2005)]. Recently, genetic evidence has also confirmed that there could not have been any large scale infusion of genes into India since 3,500 BCE [e.g. Sahoo et al (2006)]. Since both archaeological and genetic evidence firmly imply that the caste system of India was an entirely indigenous development—not one foisted by foreign invaders—it therefore has to be explained in these terms

In fact, there is overwhelming and solid archaeological, linguistic, genetic and legendary history evidence for an Aryan invasion by proto-Indo-Iran people around 1900-1800 BCE that had a profound impact on Hindu Indian ethnogenesis.

This transition is marked archaeologically by the arrival of a new class of metal goods (e.g. the very earliest iron goods and iron working and substantial new volumes of Bronze goods), pottery techniques (e.g., the Black and Red Ware culture was a transitional one showing Indo-European influences), chariot technologies, and burial methods (e.g., a shift from burial to cremation in the Cemetery H culture) that first appeared on the Eastern European steppe that corroborate passages in the Rig Veda such as RV 10.15.14. This demonstrates that the Rig Veda is appropriately viewed as legendary history, even though it is a religious text with fictional elements.

Linguistically, Indo-Aryan invasion is marked by the emergence of Sanskrit, an Indo-European language that is the source for all of the other Indo-Aryan languages of India in much the same way that Latin is the source of the Romance languages of Europe, or Old Norse is the source for the North Germanic languages of Europe. Sanskrit is undeniably derived from the Indo-European languages, which are overwhelmingly believed to have originated outside India. And, in the prehistoric, preliterate era, it was impossible for language shift to occur without the presence of a substantial superstrate population to bring it to a new people. The time depth of Sanskrit derived languages fits the Indo-Aryan hypothesis well.

Genetically there are strong signs of an influx of people with West Eurasian affinities to India at about the right time depth (e.g. Y-DNA R1a1a1b) that are common in other Indo-European populations are found in India, and their frequencies are greater in the Brahmin ruling class, and in populations that speak Indo-European as opposed to Dravidian or Tibeto-Burmese languages. There is also a discernible distinction between an Ancestral North Indian and Ancestral South Indian autosomal composition of autosomal DNA in India. While only some of that distinction is attributed to migration ca. 1900 BCE-1800 BCE and a subsequent expansion over centuries to the rest of India, there is strong evidence of major admixture between the two populations at about the right time.

Indo-Aryans probably contributed less than 20% of the ancestry of Northwest Indians who speak Indo-Aryan languages (concentrated more heavily in Brahmins) and less as one moves South, to a continent that already contained genetic distinction between the Harappan North and the non-Harappan South. But, the fact that the genetic evidence does not support the theory that there was wholesale replacement of the bulk of the population of South Asia (which clearly didn't happen), does not mean that there is not genetic evidence to support a substantial demic migration of linguistically Indo-European Indo-Aryans who predominantly became the new ruling class in most of India.

The genetic impact of Indo-Aryans on India, for example, is greater than the genetic impact of the Turks on the country now known as Turkey (which was about 8%), yet the Turk superstrate culture clearly had a profound cultural impact on Anatolia which had been largely Hellenistic culturally immediately before that transition (from the time of the conquests of Alexander the Great until the 8th century CE).

Overall, the evidence supports the arrival of an Indo-European superstrate population known as the Aryans around 1900-1800 BCE in Northwest India which expanded into much of India, and became a superstrate ruling class that had profound cultural impact on India.

There are three main points upon which there is lack of clarity.

1. How much of the transition was imposed by force as opposed to accepted voluntarily by indigeneous people of India?

The Harappan civilization collapsed on its own prior to the advent of the Indo-Aryans in connection with the 4.2 kiloyear climate event, an arid period that was accompanied by the drying up and disappearance Saravasti River of the Rig Vedic epics around which much earlier Harappan civilization was organized as ruins recovered in the ancient, now dry, riverbanks reveal.

It could be that the survivors of Harappan civilization left in disarray welcomed these new rulers, and it is true that there isn't much archaeological evidence for heavy military conflict for a sustained period at the time of Harappan-Aryan transition, or it could be that they were conquered militarily in a manner sufficiently decisive and swift to leave few archaeological traces. There is no serious doubt, however, that the Indo-Aryans formed the core of a new ruling class, first in Northwest India, and over a few centuries, over much more of India.

2. How much cultural influence did the Harappan substrate have on the Indo-European culture brought by the Indo-Aryans?

Certainly, some aspects of the indigeneous Harappan civilization of the Indus River Valley contributed materially to the culture of the Indo-Aryan invaders who subjected the majority of indigeneous Indias whom they ruled. For example, we know that curry, the staple Indian recipe, is of Harappan origins.

While Hinduism has elements of historically documented Indo-European paganism also found in Greek, Italic, Celtic, Hittite and Germanic societies in the West, and commonalities with the Old Persian religion documented in the ancient Iranian scripture known as the Avesta, which combined can be used to infer the proto-Indo-European religious system, it is certainly clear that substrate Harappan influences materially impacted the religion that came to be known as Hinduism. For example, Hinduism has less human-like deities than other Indo-European religions, probably due to Harappan substrate influence, and the use of the psycho-active substance "Soma" has a less central role in other Indo-European religions and is probably a case of substrate influence. The sacred cow taboo of India is another feature of Hinduism not shared by other Indo-Europeans.

One of the leading explanations of the formation of the caste system in India sees the Brahmin priestly caste as one invented as a way to graft an Indo-European ruling caste (probably male dominated and taking local wives from prominent families in many cases), onto a pre-existing caste system that had previously consisted only of the other three of the four varnas which Wikipedia describes as: the Kshatriya (those with governing functions), the Vaishya (agriculturalists, cattle rearers and traders) and the Shudra (who serve the other varna).

This interpretation is supported by the fact that Brahmin's in India are more similar to Indo-Europeans genetically than members of other varna in India.

One might imagine a pre-Aryan caste system of Northwest India with a hereditary aristocracy (seen in ancient and feudal societies across the world), a hereditary class of freeholders and merchants (perhaps viewed as Harappan citizens), and a hereditary class of serfs (perhaps made up of ethnically distinct non-Harappan Dravidians conquered by Harappans prior to Indo-Aryan invasion, perhaps mostly as Harappans fleeing their homeland where their primary river system dried up relocated to the Northeast, an archaeologically established migration). Dalits aka "untouchables" may have been hunter-gather populations or other less technologically advanced farmers or herders conquered after the Indo-Aryan era formation of the four varna system.

There is certainly no archaeological evidence that supports the existence of India's four varna plus Dalit caste system during the pre-Indo-Aryan Harappan era.

Efforts to discern the nature of the Harappan language from their proto-linguistic system of seals, or the non-Indo-European substrate influences in Sanskrit, have largely failed so far.

3. How did Hinduism and Indo-Aryan genetic influences that are particularly common in Brahmins extend to areas that now, or in the historic era, were Dravidian speaking?

Harappan civilization prior to the arrival of the Indo-Aryans ca. 1900-1800 BCE, did not extend beyond Northwest India, which is where Indo-Aryan influence on South Asia commenced.

Now, the Hindu religion is found throughout India, and Brahmin's even in Dravidian speaking areas show heightened levels of Indo-Aryan genetic contributions.

One possibility is that there was a missionary effort to Dravidian areas carried out by Brahmin's after the Indo-Aryan invasion that successfully secured acceptance of their priestly highest caste role and the Hindu religious and caste system voluntarily in Dravidian areas, but that this missionary effort was insufficient to secure the language shift that the initial Indo-Aryan invasion did.

Another possibility is that the Indo-Aryan invasion, over time, conquered all but a small pocket of Southeast India, instituted Hinduism there and wiped out the existing languages, and then was retaken in part during an expansionist Dravidian counter-campaign that recaptured some, but not all, of India that had never been Harappan, after the Hindu religious and caste system had been put in place there. But, the religious and caste elements survived the reconquest of these areas by Dravidians.

This second theory would also help to explain the shallow time depth of the Dravidian language family (indicating a common proto-language as recently as 500 BCE with others arguing for dates in the range of 1100-700 BCE), that has supposedly had many thousands of years to develop indigeneously. Most of the languages in that family would have been wiped out in the Indo-Aryan invasion, leaving the remaining language family all derived from the Dravidian dialects spoken in the small pocket of Southeast India that managed to resist the Indo-Aryans and then expanded into areas where other autochronous Indian languages were once spoken and then were wiped out by the Indo-Aryans.

It is hard to find a more parsimonious explanation for the lack of Dravidian linguistic variation in a theory in which Dravidian is the thousands of years old ancestral language of India, in which it is derived from the Harappan language, or in which it arose locally or from abroad around the time of the South Indian Neolithic ca. 2500 BCE. The linguistic variation of the Dravidian languages even have less time depth than that of the Indo-Aryan languages of South Asia, despite strong circumstantial evidence that they were spoken in India before the Indo-European languages.

Hat Tip: Marginal Revolution.

Monday, January 19, 2015

Ancient Jomon DNA

Japan's Y-DNA profiles suggest that a bit less than half of the patriline genetic ancestry of Japan is from the Jomon people who predominantly shared a subset of Y-DNA haplogroup D that is unique to Japan and deeply diverged from other Y-DNA D found mostly in Tibet, the Andaman Islands and Northeast Asia.

The rest of Japan's patrilineal ancestry (outside the Ainu indigeneous minority population) is largely traceable to historic era migration from Korea and China which brought the Japanese language to the island chain and gave birth to what became the modern Japanese culture as these rice growing, horse riding, metal sword wielding new comers (the Yaoyi) admixed with the sedentary fishing populations that preceded them (a remarkably high percentage of ancestral survival for a pre-Neolithic population whose language leaves almost no trace in the modern Japanese language).

Partial samples of ancient Jomon autosomal DNA seem to largely confirm this analysis.

The abstract of a symposium paper soon to be presented on the subject states:

Hideaki Kanzawa-Kiriyama, Nuclear Genome Analysis of Ancient Japanese Archipelago Humans

The Jomon period, characterized by chord-marked potteries, lasted from ~16,000 to less than 3,000 years before present (YBP), and abundant human skeletal remains have been excavated from shell mounds and other sites throughout the Japanese Archipelago. However, their genetic origin and the relationships with modern populations are largely unknown. Here we determined 10% and 80% of the genomic DNA sequences from two Jomon individuals, excavated at Yugura cave site, Nagano, and Shitsukariabe cave site, Aomori, respectively, and compared their genome sequences with worldwide populations. We found a unique genetic position of the Jomon people who had diverged before the diversification of most of present-day East Eurasian populations including East Eurasian Islanders. This indicates that Jomon people were a basal population in East Eurasia and genetically isolated from other East Eurasians for long time. However, their genetic affinities to modern East Eurasians are uneven. The heterogeneity might be a hint to clarify human migration and gene flow in East Eurasia after the divergence of Jomon ancestors.

I have some doubts about the sequencing. There is a good case to be made that the Jomon arrive after the first wave of modern humans in East Asia, rather than before, and settle in places like Tibet, the Andaman Islands, and Japan because it is all that is left that hasn't been claimed at that time, and because they have superior maritime capabilities compared to their contemporaries.

Background on the issue of Jomon DNA can be ground here. An ancient Jomon DNA study from 2009 is here. A 2013 study is here. Realistically, about 43% of Japanese Y-DNA and a third of Japanese mtDNA is attributable to Jomon ancestors, with more in Northern Japan and less in Western Japan, consistent with the history of Yaoyi migration in Japan starting ca. 1000 BCE, or perhaps a century or two later, with full control of the Japanese islands secured only many centuries after that point.

A new paper also documents archaeologically, the arrival of Fertile Crescent crops like barley and wheat strengthen agriculture in Tibet ca. 1600 BCE that previously relied on crops such as broomcorn and foxtail millet, and this cemented the much older human presence there. Sheep may also have arrived in Tibet around this time.

The Case For A Copper Age Indo-European Expansion

Marnie at the Linear Population Model blog takes issue with the widely held view that Y-DNA R1a arrived in Europe as a result of a copper age expansion of Indo-Europeans into the regions of Europe where it is found today. Her post is a reaction specifically to a strong defense of this position in the comments to a recent Eurogenes blog post.

Linear Population Model is a no comments blog for the understandable reason that it takes too much time to moderate blog comments, so rather than posting comments there, I'll engage with some of the concerns raised by Marnie here at this blog.

Marine opens up the issue with this premise:

You may know that the Eurogenes blog (along with some very prominent researchers) has been heavily and stubbornly promoting the theory of a very sudden invasion of Europe during the Copper Age from either the Ponto-Caspian Steppe or the Central Asian Steppe.

After setting up the position expressed by Davidiski at the Eurogenes blog fairly accurately and discussing some other relevant data, Marnie gets to the bottom line issue he is raising:

So what's with thinking that R1 lineages were simply confined to the Central Eurasian steppe for 20,000 years (and not to Europe until the Copper age)?

There definitely is a shift of modern Europeans between the Mesolithic and today. I agree with that.

But I don't agree that the shift happened due only to replacement from the Ponto-Caspian or Central Eurasian Steppe and only during the Copper Age.

It's just as likely that the genetic shift of Europeans is due to Western Europeans fusing with populations of Finland, Scandinavia, the Baltic, Balkans (including Greece), Central Europe, the Ukraine, Russia, Anatolia, and the Levant, starting in the Mesolithic and continuing to the present day. It would have been a complex process, with waves of people possibly moving both in and out of the Steppe and in and out of Europe, since the R/Q/P split.

The most important data supporting the Eurogenes position come from several data sets of Central and Eastern European ancient DNA (e.g. Germany and Hungary).

Ancient DNA samples from this region cluster into three groups.

1. Samples that precede the Neolithic revolution (the Neolithic revolution begins when the first farmers appears in this part of Europe; this initial Neolithic archaeological culture is often called the "Linear Pottery Culture" or LBK for short) including Upper Paleolithic and Mesolithic samples and also samples from European hunter-gatherer populations in the region (as determined based upon the archaeological context of the ancient DNA) who were contemporaneous with the first farmers of the region.

These samples are predominantly mtDNA U4 and U5, and the handful of autosomal samples that are available are very distinct from those of subsequent farming populations which derive only part of their ancestry from European hunter-gather like sources. There are very few pre-Neolithic European samples of ancient Y-DNA available, but all that are available are Y-DNA I2.

While there is a decent case that Southern Europe saw some enrichment in its mtDNA diversity in the Mesolithic era, I don't think that the case is nearly as strong for what became LBK cultural zone in the early Neolithic era.

2. Samples from farmers of the early Neolithic era (i.e. from the beginning of the LBK culture until roughly the start of the Corded Ware culture in the Copper Age).

These samples have a minor component of mtDNA haplogroups similar to those found in the Mesolithic era and among hunter-gatherers contemporaneous with them, but with many new mtDNA haplogroups that aren't found in older samples. The dominant Y-DNA haplogroup is G2.

3. Samples from the Copper Age (roughly coincident with the arrival of the Corded Ware culture) to the present in this region.

From the Copper Age onward, the ancient DNA strongly resembles the modern population genetics of the region. It is highly enriched in mtDNA haplogroup H relative to the early Neolithic where it was uncommon, or the Mesolithic era where mtDNA H was absent.

The frequency of Y-DNA G2 is dramatically reduced and Y-DNA I2 is also quite uncommon, while, Y-DNA R1a (and really a very specific subset of Y-DNA R1a which a most recent common ancestor estimated to date from the Copper Age by admittedly imperfect mutation rate dating methods) is predominant. In some regions, there is a blend of Y-DNA R1b (and again actually a specific European subset of R1b with similar estimate time depth), the predominant Y-DNA type of much of Western Europe and areas near the Southern Baltic Coast of Europe, near the biogeographic boundary between R1a predominance and R1b predominance that roughly corresponds to the historical ranges of the Bell Beaker people (or cultures in continuity with them) in the West, and with the Corded Ware people (or cultures in continuity with them) in the East, during the Copper and Bronze Ages in Continental Europe.

The limited ancient autosomal DNA fits this clustering with a shift from the LBK DNA profile towards the modern profile represented by a new ancestry component that is detected, appearing in ancient DNA starting around the Copper Age.

There is also Bronze Age ancient DNA and physical anthropology data showing genetic continuity and continuity in physical appearance and bone structure of people from the Corded Ware region of Europe all of the way to the Tarim Basin at the fringe of the East Asian highlands of greater China, where mummies reveal individuals who were Eastern European in their genes, coloration and bone structure, who we know spoke a long lost Indo-European language called Tocharian until around the time of an expansion of linguistically Altaic peoples across the Eurasian steppe all of the way to Europe itself.

Evidence from the time depth of language families in Europe also tends to support this analysis.

In the interests of not letting this post get too stale, I am posting it now. But, I hope to enrich it with additional links and references over the next few days.

UPDATE: A new ancient DNA find reported at Eurogenes appears to further support this hypothesis.

Kennewick Man Has Native American DNA And Confirms Paradigm

Preliminary results of ancient DNA tests being conducted in Denmark on nearly complete Paleo-American remains from ca. 7000 BCE to 6900 BCE, known as Kennewick man, discovered the banks of the Columbia River in the state of Washington in 1996 CE, confirm that he had an ordinary Native American DNA profile, despite having an inferred facial structure with features more typical of contemporary Europeans. Legal disputes delayed the result.

The genetic results confirm the prevailing paradigm on the settlement of the New World by modern humans. While there were some instances of pre-Columbian human contact with the Americas after the arrival of its pre-Holocene era founding population and the closing of the Bering land bridge, the exceptions were few, are increasingly well understood, and didn't have much of a demic impact on the majority of Native Americans.

Specifically, there were three major pre-Columbian waves of post-founding circumpolar area migration to North America in the time period since 3500 BCE, and somewhere between two and a half dozen or so minor instances of contact in the thousand years before the arrival of Columbus that had much more minor demographic impact.

There is no evidence of any post-founder population human contact with the New World in the time period for roughly nine thousand years from 12,500 BCE to 3,500 BCE. The Kennewick man ancient DNA appears to confirm this hypothesis, despite coming from an area that experienced multiple waves of much later circumpolar migration.

South America has only a subset of the genetic diversity found in North America (due to modest population structure in the founding population), and was not impacted by any of the circumpolar migrations that had major demographic impact. It may have experienced a small number of instances of contract in the last 1500 year or so with Pacific origins that had very minor demographic impact.

UPDATE: Razib tells basically the same story but makes explicit in a footnote, a point that I was merely careful not to get affirmatively wrong without mentioning in the first draft of this post.

In contrast,Kennewick Man is likely to belong to the first ur-North Americans, who arrived as a relatively small population from Berengia ~15,000 years ago. This is the overwhelming majority of indigenous ancestry, and south of the Rio Grande basically the totality.*

* From my Twitter exchanges with Pontus Skoglund I believe there is some population structure in the founding “First American” group, though not a great deal.

Some of the relevant papers (see also here) are:

[1] Erika Tamm, et al., "Beringian Standstill and Spread of Native American Founders" (PLOS One 2007)

Noting three minor mtDNA clades found in North America but not South America (X2a, D2 and D3) in addition to the four ubiquitous founding mtDNA clades (A2, B2, C1 and D1) and discussing alternative viable scenarios for First American migration. Two new founding mtDNA clades also present in South America C4c and D4h3 are identified, and C1b, C1c and C1d are identified as founding subclades while C1a is Asian. They hypothesize that A2a and C1a in Asia are back migrations from the New World and that the D2 (found in Inuits and the Na-Dene) and D3 (fond only in Inuits) are subsequent late circumpolar arrivals in the New World. X2a is envisioned as part of the founding population despite its lack of a South American presence. Thus, the study imagines nine founding mtDNA lineages in the Native American founding population, all of which, except X2a, are found on both continents.

Specifically, haplogroup D2 consists of two sister clades, one found only in Siberia (D2b) and the other found in northernmost Eskimos, Chukchi, Aleut, and Athapaskans (D2a). While sub-haplogroup D2a is shared between ethno-historically close related Beringian Aleuts and Eskimos, its sister clade D2b is spread among populations from distantly related linguistic groups (Tungusic, Turkic, Mongolic). A close relationship of matrilineal ancestry between individuals from different linguistic groups may be due to an overlap of geographic range of their ancestors approximately at the time of the Pleistocene-Holocene boundary. Alternatively, some populations may have received the D2b variant through more recent gene flow. It is also worthwhile to note the absence of D2 in all other Native American populations, suggesting that D2 diversified in Beringia after the initial migration into the Americas had occurred. Haplogroup D3 may have also reached America through more recent genetic exchange. It is spread in Nganasans, Mansi, Evenks, Ulchi, Tuvas, Chukchi and Siberian Eskimos and recently reported in Greenland and Canadian Inuit populations, but absent in other Native Americans.

Surprisingly, we also found a Native American sub-type of haplogroup A2 among Evenks and Selkups in southern and western Siberia. Previously, this HVS I motif is reported in one Yakut-speaking Evenk in northwestern Siberia. A novel demographic scenario of relatively recent gene flow from Beringia to deep into western Siberia (Samoyedic-speaking Selkups) is the most likely explanation for the phylogeography of haplogroup A2a, which is nested within an otherwise exclusively Native American A2 phylogeny.

[2] Sijia Wang, et al., "Genetic Variation and Population Structure in Native Americans" (PLOS Genetics 2007). The abstract of the paper notes:

We observe gradients both of decreasing genetic diversity as a function of geographic distance from the Bering Strait and of decreasing genetic similarity to Siberians—signals of the southward dispersal of human populations from the northwestern tip of the Americas. We also observe evidence of: (1) a higher level of diversity and lower level of population structure in western South America compared to eastern South America, (2) a relative lack of differentiation between Mesoamerican and Andean populations, (3) a scenario in which coastal routes were easier for migrating peoples to traverse in comparison with inland routes, and (4) a partial agreement on a local scale between genetic similarity and the linguistic classification of populations.

In autosomal genetics:

To search for signals of similarity to Siberians in the Native American populations, we used a supervised cluster analysis in which Native Americans were distributed over five clusters. Four of these clusters were forced to correspond to Africans, Europeans, East Asians excluding Siberians, and Siberians (Tundra Nentsi and Yakut), and the fifth cluster was not associated with any particular group a priori. Most Native American individuals were seen to have majority membership in this fifth cluster, and considering their estimated membership in the remaining clusters, Native Americans were genetically most similar to Siberians. A noticeable north-to-south gradient of decreasing similarity to Siberians was observed. . . . Genetic similarity to Siberia is greatest for the Chipewyan population from northern Canada and for the more southerly Cree and Ojibwa populations. Detectable Siberian similarity is visible to a greater extent in Mesoamerican and Andean populations than in the populations from eastern South America

With regarding to languages:

In the neighbor-joining tree, a reasonably well-supported cluster (86%) includes all non-Andean South American populations, together with the Andean-speaking Inga population from southern Colombia. Within this South American cluster, strong support exists for separate clustering of Chibchan–Paezan (97%) and Equatorial–Tucanoan (96%) speakers (except for the inclusion of the Equatorial–Tucanoan Wayuu population with its Chibchan–Paezan geographic neighbors, and the inclusion of Kaingang, the single Ge–Pano–Carib population, with its Equatorial–Tucanoan geographic neighbors). Within the Chibchan–Paezan and Equatorial–Tucanoan subclusters several subgroups have strong support, including Embera and Waunana (96%), Arhuaco and Kogi (100%), Cabecar and Guaymi (100%), and the two Ticuna groups (100%). When the tree-based clustering is repeated with alternate genetic distance measures . . . higher-level groupings tend to differ slightly or to have reduced bootstrap support. However, local groupings such as Cabecar and Guaymi, Arhuaco and Kogi, Aymara and Quechua, and Ticuna (Arara) and Ticuna (Tarapaca) continue to be supported (100%). This observation of strongly supported genetic relationships for geographically proximate linguistically similar groups coupled with smaller support at the scale of major linguistic groupings is also seen in Native American mitochondrial data.

[3] D. Reich, et al., "Reconstructing Native American Population History" (Nature 2012). The abstract notes that:

Native Americans descend from at least three streams of Asian gene flow. Most descend entirely from a single ancestral population that we call ‘First American’. However, speakers of Eskimo–Aleut languages from the Arctic inherit almost half their ancestry from a second stream of Asian gene flow, and the Na-Dene-speaking Chipewyan from Canada inherit roughly one-tenth of their ancestry from a third stream. We show that the initial peopling followed a southward expansion facilitated by the coast, with sequential population splits and little gene flow after divergence, especially in South America. A major exception is in Chibchan speakers on both sides of the Panama isthmus, who have ancestry from both North and South America.

[4] Fagundes, et al. "Mitochondrial Population Genomics Supports a Single Pre-Clovis Origin with a Coastal Route for the Peopling of the Americas" (American Journal of Human Genetics 2008) (noting similarity between diversity and TMRCA date for X2a and the other four founding clades of mtDNA in the Americas). This paper notes that:

Our results strongly support the hypothesis that haplogroup X, together with the other four main mtDNA haplogroups, was part of the gene pool of a single Native American founding population; therefore they do not support models that propose haplogroup-independent migrations, such as the migration from Europe posed by the Solutrean hypothesis. We infer that haplogroup X experienced a more limited expansion in intensity than the former four haplogroups, and this is compatible with its current very limited distribution. Outside America, haplogroup X has always been found in small frequencies. In Europe, it usually makes up less than 5% of mtDNA diversity. In Siberia, it has been described in only a few populations, none of which currently inhabit eastern Siberia. It is likely that this haplogroup is absent in eastern Siberian populations because of drift effects, which impact rare variants more strongly. Thus, its probability of being lost through random effects would be high. In support for this hypothesis, we note that current Siberia and Native American sequences belonging to the haplogroup X are distantly related, suggesting that the intermediate lineages have been lost. Finally, it is noteworthy that haplogroup X is not the only one of the Native American haplogroups that is more frequent in the New World than in Siberia; haplogroups A and B also show this pattern.

In the Americas, a likely explanation for the observation that haplogroup X has a much more restricted distribution would be that if we assume it was relatively rare in the founding population, then it could have been lost by successive founder effects and genetic drift as the expansion wave moved southward. Actually, it was recently shown that the probability that an allele (e.g., a founding haplotype) survives and expands spatially and in frequency by ‘‘surfing’’ on the wave of a range expansion depends on its presence in the wave of expansion, which in turn depends largely on its proximity to the edge of the wave. Therefore, using this framework, one could conceive that haplogroup X may have ‘‘failed’’ to expand simply as a result of its location in the expansion wave and/or its low initial frequency. A similar explanation may be used to account for the existence of other similarly rare haplogroups in the Americas, such as the ‘‘cayapa’’ subhaplogroup D, as well as the distribution of some rare Y chromosome haplogroups, without the need to postulate independent colonization events.

In addition, the existence of additional, rare founding haplotypes agrees well with the moderate bottleneck estimated here. Such strong and old demographic expansion inferred from our data might also indicate that this was the most important time frame in which major changes in haplogroup composition could occur. Interestingly, two studies with ancient DNA samples scattered over most of the Holocene suggested regional continuity in the frequency of mtDNA haplogroups, indicating that in these populations drift has not played a major role in more recent times.

The fact that the five most common Native American mtDNA haplogroups display similar diversity patterns strongly indicates that they have not been much affected by natural selection. Because human mtDNA does not recombine, directional selection upon a specific substitution would favor the haplotype in which this variant occurs, mimicking a demographic expansion. It is very unlikely that in all haplogroups specific variants that would be favored by natural selection with similar intensity would have occurred by chance and at a similar time. Therefore, our results strongly indicate that the diversity pattern in Native American mtDNA results from a demographic expansion in the founding population in which all founding haplotypes were present.

FOOTNOTE: The distribution of X2 in Europe is a decent genetic hint that the South Caucasus is a likely source of the Bell Beaker people.

Friday, January 16, 2015

Lognormal Distributions

When I was in college, I had a proposal for a senior honors thesis in my major (mathematics) regarding issues relevant to a wide variety of social science issues, regarding what the expected amount of inequality in a data set is when the data set is comprised of points like individual income that have a normal distribution, and one multiplies the income amount by the probability of having that income to get a baseline expected level if inequality to which actual data can be compared.

Rather than doing this, social scientists usually use tools like the GINI index that measures inequality in a scale that uses only the extremes of perfect equality and perfect inequality (everything is concentrated in one person) reference points without any acknowledgement that these are wildly unrealistic assumptions and that it is possible to look at what would be expected with more realistic assumptions like a normal distribution of income.

My proposal was denied, in part, I think, because I don't think I conveyed its value and intellectual depth to my adviser. (Honestly, this is probably one of the two or three most disappointing moments in my entire higher educational career, looking back on it.) But, if I had, I would have soon discovered that this was deeply related to the mathematics of lognormal distributions.

Thursday, January 15, 2015

Quick Dark Matter and Dark Energy Hits

* The Xenon 100 direct dark matter detection experiment has put strict new constraints on potential dark matter masses and cross-sections of interactions, including limitations on axion dark matter. It has not observed any dark matter signal.

* The DarkSide50 direct dark matter detection experiment has been calibrated and is ready to start making high precision observations.

* A new way of quantifying how weak lensing in general relativity would be modified by modifications to gravity laws has been devised.

* Modified gravity theories better explain the fact that lots of galaxies are bulgeless or have only pseudo-bulges naturally, while dark matter models, generically, expect almost all galaxies to have true bulges.

* Astronomy data is consistent with a cosmological constant in the formulas of general relativity, but is arguably not consistent with any form of dark energy fluid, because it would have to violate some very basic axioms applicable due to the laws of thermodynamics to any such fluid. As the conclusion to this paper explains:

We have shown that the thermal and mechanical stability conditions forbid the existence of negative pressure fluids with a constant EoS parameter which excludes the vacuum energy as a candidate to explain the cosmic acceleration. We also show that the observational data are in conflict with the thermodynamic constraints that a general dark energy fluid with a time-dependent EoS parameter must satisfy. This result suggest that adding dark energy to the content of the Universe may not be the answer to the cosmic acceleration problem.

We must noting that, although our analysis excludes the vacuum energy, this does not represent the end of the cosmological constant. A bare geometrical Λ-term remains in the game if interpreted as a constant of the nature whose value must be determined by observations.

However, what happens to the vacuum energy? Is it null? If so, why? We know that the vacuum energy has a significant role in the quantum world but, should it play any significant role in the Universe at large scales? Can we add the quantum vacuum energy so naively to classical general relativity field equations? We know that the fluid description works for relativistic and non relativistic matter but, can we describe the vacuum energy simply as a fluid? Perhaps only a quantum theory of gravity can provide an answer to these issues.

Beyond the issues raised above a question still remains: is the geometrical cosmological constant the explanation to the accelerated expansion? The Λ-term certainly is the simplest solution but nobody can guarantee that it is the true answer. Thus, finding out deviations of the cosmological term will remain as one of the hottest theoretical investigation lines concerning cosmic acceleration. If the dark energy is out of the game, approaches such as the kinematic method developed in can be a useful tool to search for such deviations.

Obviously, we can sacrifice the thermodynamical stability conditions to keep the dark energy hypothesis alive. For example, if we relax the mechanical stability condition, but keep the thermal stability we have, from (28),that w ≥ −1 saving vacuum energy and quintessence. If additionally we give up the thermal stability, phantom fields (w < −1) are also allowed. However, we do not think that this is a good way to address the problem.

* An effort is made to fit gamma ray emission data from the Milky Way and its nearby dwarf galaxies to a self-interacting dark matter model.

UPDATE January 19, 2015: Resonaances has a nice chart on spin-dependent dark matter exclusions in direct searches by the Ice Cube experiment, and the ATLAS experiment at the LHC has new exclusions on dark matter production there, which a more strict for bosonic matter and less strict for fermionic matter. The ATLAS exclusion is also a de facto exclusion of heavy fertile neutrinos up to 150 GeV.

Physics Word Of The Day: Pretzelosity

I honestly had no idea what this word meant in the context of particle physics until today, but it sounds yummy:

pretzelosity

The Wikitionary definition, while accurately identifying this as a term from mathematics and physics, when it says that: "the condition of having the knotted form of a pretzel", while not being completely misleading, is also not very helpful for purposes of understanding what the word means as it is used in practice in physics. In physics, at least, this term is used exclusively in a much more narrow and specific sense.

Basically, pretzelosity is one of a number of properties that describe the structure of a composite particle made up of more than one quark at the moment that something moving at close to the speed of light slams into it. These properties, collectively, are a hadron's parton distribution function.

Background: What Are Parton Distribution Functions?

A "parton distribution function" (PDF), in particle physics, is a formula or chart usually fitted with large empirical data sets, that sets forth the probability that a particle moving at close to the speed of light will hit a particular particle in a hadron when it flies into the hadron. This is more complicated than it seems because "virtual particles" within the hadron have a material probability of being hit, even though we don't normally think of those virtual particles as being present in a particular hadron at all.

(I spent a good part of my downtime while waiting to be a speech and debate tournament judge at a recent tournament hosted by my daughter's high school, perusing the open access 194 page "Handbook of perturbative QCD" that is basically the standard introduction to the world of parton distribution functions for people first learning about this obscure, but practically important, corner of the particle physics world, like incoming graduate assistants working at particle colliders for the first time.)

In an ideal world, one could use the Standard Model's QCD Lagrangian (i.e. the equation of that governs the strong force, one of the four fundamental forces of nature) and the fundamental constants of the Standard Model to calculate the parton distribution functions of a hadron exactly.

But, in reality, the math is too hard to get more than a qualitative idea about what patron distribution functions should look like from first principles, even for near genius physicists who got perfect math SAT scores and then spent many years studying advance math, even with the help of computers (although not for want of increasingly fruitful attempts to do so such as this recent study).

In practice, these PDFs are based instead upon voluminous data gathering from actual particle collisions at high speeds (similar in size to the student's t test reference chart at the back of your college statistics textbook) which is then fitted to smooth mathematical formulas that mere particle physics graduate students can understand. These formulas, in turn, are then used to predict how similar future collisions will behave with the exquisite precision that the Standard Model is famous for providing.

A PDF is a bit of a cheat when it comes to the question of the excellent accuracy of the Standard Model. But, PDF formulas still do leverage a comparatively modest amount of data into a much more vast number of potential predictions, even if they don't manage it with the comparative elegance of the one long, but compact formula, and the couple dozen fundamental physical constants, which give rise to them as the Standard Model should if we could really do all of the QCD calculations that it calls for properly in a reasonable amount of time.

Back To Pretzelosity

To a good approximation, under the kinds of conditions where this property of composite particles made up of quarks is studied, the following relationship holds true:

helicity − transversity = pretzelosity

The word pretzelosity is used in the following sentence from a recent physics preprint:

In addition, we examine the model relation between the orbital angular momentum and pretzelosity, and find it is violated in the axial-vector case.

There are another fifteen other papers in the arxiv abstract database, most of which are published in respectable academic journals or physics conference proceedings, that use the term, the earliest of which was published in 2008 by Avakian, et al., which defines the term more exactly:

The leading twist transverse momentum dependent parton distribution function h⊥1T , which is sometimes called “pretzelosity,” is studied.

A Couple More Background Concepts

Helicity is a concept of a particle analogous to spin.

Traverse momentum is momentum at a right angle to the direction the incoming particle hitting the hadron is moving (which is the axial or longitudinal direction).

Visualizing Pretzelosity

The pretzelosity of a particle is basically the extent to which the probabilities that a relativistic particle hitting a composite particle made of quarks will hit some particular subparticle of it is influenced by the interaction between the target particle's spin and its momentum in the direction at right angles to the motion of the particle hitting it.

Imagine the hadron as a baseball being thrown from the pitchers mound to home base that is spinning as a fast moving electron zooms towards it from the direction of first base. The helicity of the baseball is related to the spinning of the baseball the electron moves towards it. The traverse momentum of the baseball is the direction and speed of the baseball as it moves towards home base multiplied by the baseball's mass. The extent to which the way the fast moving electron's interactions with the baseball/hadron are influenced by the interactions of these components of the baseball/hadron's motion is its pretzelosity.

To understand why someone might use the term pretzelosity for this property, imagine the line through space that a single point on the surface of the baseball/hadron would trace as it moved towards home base. For those of you who are spatial visualization challenged, this line would be basically pretzel shaped. This would be even more true in the case of a particle accelerator, like the LHC, where rather than moving in straight lines, high powered magnets cause the colliding particles move at extremely high speeds around a great big circular tube enclosing an area the size of a small city, many times each second, until they hit each other.

What We Know About Pretzelosity

Pretzelosity, in practice, turns out for reasons of fundamental physics, to have a much smaller impact on what will be hit in the target hadron than lots of other factors that influence that outcome, to an extent that is well defined mathematically. For example, in the case of one QCD observable measured at the HERMES experiment, the effect was possibly as large as 1%-2% of the total observed result, although the measured results were consistent with a 58% chance that pretzelosity actually had zero impact. Up quark components and down quark components of a hadron have different (and approximately opposite) pretzelosity.

The pretzelosity of a composite particle made of quarks is closely related to the nature of the motion of the subcomponents of a particle around the composite particle's center (their "orbital angular momentum"), although no one has described in a usable formula the exact relationship between pretzelosity and orbital angular momentum that is observed in nature.

Pretzelosity is one part of understanding the "proton spin puzzle." The proton spin puzzle, in turn, is the mystery of how the spin of the proton ends up being equal to the sum of the spin of its quarks, even though experiments seem to show that almost none of the overall proton's spin, which to oversimplify is made up of three quarks held together by gluons, actually comes from the quarks when their individual spins were measured one at a time and then averaged.

The proton spin puzzle is one of the important unsolved problems in physics, so pretzelosity is actually part of something that is kind of a big deal to physicists trying to understand the mysteries of the universe, even if it is a bit of an obscure and hard to understand and explain concept.

Pages