Dispatches From Turtle Island: October 2015

Friday, October 23, 2015

Deur et al make breakthrough link between non-perturbative and perturbative QCD

Alexandre Deur is a physicist at Jefferson Labs whom I have previous praised for his work on graviton self-interaction based alternative to dark matter analogous to the effects of gluon self-interaction in QCD (the part of the Standard Model of Particle Physics describing strong force interactions between quarks), which is his primary research area.

But, he's no slouch at his day job either, as a recent new paper that put him on the path to being a major innovator in the field rather than one cog of hundreds or thousands of physicists collaborating in conducting Big Science experiments (where he has most of his publications).

Deur and several colleagues posted a ground breaking preprint this week that links the fundamental scale constant of perturbative QCD (applied mostly to high energy collisions, a.ka. ultraviolet QCD, analytically (i.e. with equations) to the fundamental scale constant of non-perturbative QCD, which is used mostly to study the properties of quarks confined in hadrons that are not interacting at such high energies a.k.a. infrared QCD. This is a big deal because, in practice, ultraviolet QCD research is much more expensive than infrared QCD research. This allows comparatively cheap low energy research (millions and tens of millions of dollar experiments in many cases for a related set of experiments and including lots of data that has already been collected) to boost expensive high energy research (costing in the billions and tens of billions of dollars for each sustain experimental program each of which is forging into unknown territory about which we have no prior data).

Both the strength of high energy strong force interactions, described by perturbative QCD, and the lion's share of the masses of hadrons comprised of lighter quarks (which in turn provides the lion's share of the mass of the ordinary matter (as opposed to dark matter and dark energy) in the universe, both ultimately flow from the strong force coupling constant and the strength of the strong force color charge of quarks (which is identical in magnitude for all quarks).

But, in practice, physicists use the physical constant "lamda_s" to do many of the strong force calculations in perturbative QCD, and use the physical constant "kappa", to do many of the strong force calculations involved in establishing hadron masses from first principles in non-perturbative QCD. Both of these approaches are phenomenological approximations in their respective domains of applicability of the exact equations of QCD which are believed to be known, but are too mathematically intractable to calculate with directly.

This breakthrough means that experiments from perturbative QCD can now be used to provide the key QCD physical constants used to calculate hadron masses, while hadron mass measurement, in turn, can be used to determine the key QCD physical constant for making calculations in perturbative QCD. Previously, the two constants had to be measured separately in practice, even though everyone knew that there must be some analytical relationship between them.

The results are largely consistent with experimental data from both regimes (although there is one calculation that has a two sigma tension between the theoretical prediction determined in this manner and the experimental data), and the uncertainties are predominantly due to issues involved in calculating a numerical approximation of equations with infinite numbers of terms.

The actual accuracy with which the physical constants involved are known, roughly 3%-6%, isn't terribly impressive. But, in the long run, there is a clear path to reducing theoretical uncertainty by simply putting more resources from more powerful supercomputers on the problem so that the theoretical calculations can have less uncertainty by including far, far more terms in the calculation than anyone has been able to do to date with limited resources. And, since the light hadron masses are known to far more accuracy than the extremely high energy measurements of perturbative QCD, this connection could ultimately use constants determined in the precisely measured low energy QCD regime to dramatically improve the accuracy of calculations in the high energy perturbative QCD regime that applies, for example, in the high energy particle accelerator collisions conducted at the Large Hadron Collider (LHC) by the ATLAS and CMS experiments.

This paper is also a critical intermediate step in linking both the perturbative QCD and nonperturbative QCD calculations done in the real world to the exact equations of QCD and the fundamental physical constant of the Standard Model which is the strong force coupling constant, from which they are both in principle derived, thereby helping to make it possible to use first principles calculations using the actual exact equations of QCD of real world quantities. The equations and constants of perturbative QCD and non-perturbative QCD are both informed by knowledge of what the exact equations of QCD look like but ultimately are phenomenological approximations of the exact equations, rather than being rigorously and exactly derived matheatically from the exact equations of QCD.

Popular accounts of QCD and the Standard Model often read as if this is a solved problem. But while we believe that we know the exact equations of QCD, the quark masses and the strong force coupling constant with sufficient precision to make these calculations in principle, in fact, no one has yet managed to do it without major approximations, in practice.

The most precisely measured hadron masses (the proton, neutron and pion) are known to six significant digits, and even the least precisely determined ones (heavy hadrons with bottom quarks) are known to six significant digits. But, the strong force coupling constant is known only to about 0.5% precision. But, theoretically, it should be possible using only the light quark masses known to their current accuracy, the most precise several hadron masses, and the known exact equations of QCD, to calculate the strong force coupling constant to roughly 200 times as much accuracy as it is known today without conducting another experiment ever, if one has sufficient computational capacity. This paper is a major intermediate step in that direction.

Moreover, one of the reasons for a significant amount of the uncertainties in the experimentally determined quark masses in the Standard Model is due to the uncertainties in the strong force coupling constant together with the accuracy lost in numerical approximations of the true equations of QCD. So, improvement in measurement of the strong force coupling constant facilitated by this research has the potential to greatly improve the accuracy with which six other Standard Model fundamental constants are known using existing experimental data. And, knowing both the strong force coupling constants and the quark masses with more precision, in turn, also makes it possible to greatly improved the statistical power of experiments done to determine the four CKM mixing matrix parameters. This is because uncertainties regarding the Standard Model background predictions from QCD greatly reduce the statistical power of experiments measuring other Standard Model constants.

Finally, great precision in all of the physical constants going into QCD calculations which are used to determine Standard Model backgrounds in high energy particle accelerator experiments, in turn greatly improves the statistical power of experiments setting out to identify beyond the Standard Model physics.

For example, the primary decay path of the Standard Model Higgs boson is to quark-antiquark pairs of bottom quarks. But, lots of other Standard Model processes also produce quark-antiquark pairs of bottom quarks. The measurement of the Higgs boson signal in the bottom quark decay channel is determined by using perturbative QCD to make estimates of Standard Model bottom quark decay backgrounds from other processes, which have quite significant error bars of their own, and then to look at the total number of observed bottom quark decays observed to estimate the number of Higgs boson sourced bottom quark decays observed. But, since quantum mechanics is stochastic, the number of Higgs boson bottom quark decays expected even with perfect backgrounds is a gaussian distribution around a most likely number of bottom quark decays for any given Higgs boson mass, the expected number of Higgs bosons produced is subject to further statistical variation, and the backgrounds with error bars (only some of which are irreducible statistical variation) that are large compared to the expected signal. So, it is hard to see the Higgs boson in its main decay channel even when there are lots of Higgs boson bottom quark decays out there to be seen even at fairly low Tevatron energies. But, if you can dramatically reduce the non-statistical errors in the Standard Model background prediction, it would be much easier to distinguish the signal of bottom quark decays from Higgs bosons from other Standard Model backgrounds, even with Tevatron data which is far inferior to the LHC in energy scale and total number of events observed.

Going forward, reducing error bar noise in Standard Model backgrounds in the current LHC experiments would significantly improve the ability of ATLAS and CMS to confirm that the Higgs boson seen at the LHC at 125 GeV or so has all of decays expected at the frequencies expected for a Standard Model Higgs boson of that mass, or in the alternative, to see statistically significant differences from the Standard Model Higgs boson expectation even if they are quite subtle differences.

Similarly, if new physics manifest at some characteristic energy scale "lamda_BSM", the energy scale at which the new physics can be detected experimentally could be reduced by an order of magnitude or two, if we were able to leverage our existing precision knowledge of light hadron masses into more precise values of the Standard Model fundamental constants of QCD using improved mathematical approximations of the exact equations of QCD which papers like this one are bringing closer to reality.

If BSM physics exist at some level greater than the electroweak scale of O(100 GeV) and the GUT scale of O(10^16 GeV), we might be able to find them using current experiments using current numerical QCD methods at the scale of O(1000-10,000 GeV) using the LHC with currently available technology and perturbative QCD calculation accuracies. But, the kind of improvements that may be possible in QCD with much more accurately known QCD physical constants could stretch our experimental research to revealing or ruling out new physics up to scales of O(100,000-10,000,000 GeV) (i.e. 100-10,000 TeV).

There estimates may be a bit optimistic (because some Standard Model backgrounds have inherent statistical variation that is large relative to the expected signal even if the background is calculated perfectly, requiring experimenters to look at signals with little or no Standard Model background instead), but testing new physics up to the several hundred TeV scale with technology not involving any major technological breakthroughs not present at the LHC today is not unthinkable if we can make better progress on the math of QCD, which may be possible to achieve to a significant extent with nothing more than a big investment in spending on the supercomputers (without any advances in supercomputing technology itself from current levels) that are available to QCD physicists.

This paper is an important step in making these advances a function of our willingness to spend the money to allow our scientists to make absolutely inevitable and certain progress, as opposed to a gamble on whether no conceptual breakthroughs can be devised by physics geniuses, if those breakthroughs are even out there waiting to be discovered, which they might not be at some point.

Right now, most new physics scenarios have strong minimum energy scales, but their maximum energy scales are far in excess of the minimum energy scales at which they can be ruled out, But, if the parameter space in which new physics can be sought expands enough, the entire parameter space of many BSM theories may be possible to confirm at particular values or rule out, if we can simply improve the statistical power of present day LHC technology experiments by using more precision knowledge of Standard Model fundamental constants to more precisely predict the expected Standard Model backgrounds.

For example, the non-detection of proton decay and neutrinoless double beta decay places an energy scale ceiling on many kinds of supersymmetry (SUSY) theories. But, this ceiling is much higher than the minimum energy scales at which new physics from SUSY theories can be excluded using the LHC and other experimental data that is available. Increased experimental power from a more precise knowledge of the fundamental constants of QCD, however, might make it possible to close that gap for many kinds of SUSY theories. And since string theory almost universally assumes that its low energy approximation resembles fairly genetic versions of SUSY this could even make it possible to experimentally rule out immense swaths of the string theory landscape.

Lest I overhype too much, I do need to provide some perspective. Physicists have known that what Deur and his colleagues did was possible in principle for half a century. We knew already that this was a problem with a correct solution that was out there waiting to be found. But, the fact that it took half a century to get from knowing that the answer to this intermediate result was out there, and actually discovering it, is also a testament to how non-trivial an effort this very lucid paper really is in fact, even if it seems deceptively simple. The authors of this paper have not only reached an important intermediate result, but have also artfully make it look more much elementary and obvious than it actually was (much of the really hard stuff is hidden in results from QCD methods such as the light front method which are described only by bottom line result and citation in this paper).

Monday, October 19, 2015

Did Dogs Originate In Mongolia Or Tibet?

Dog genetics are most diverse in Mongolia and Tibet and show a roughly clinal trend towards less diversity with distance away from that area, suggesting that the domesticated dog may have originated there. But, there are conflicting indications from different kinds of data.

Dogs were the first domesticated species, originating at least 15,000 y ago from Eurasian gray wolves. Dogs today consist primarily of two specialized groups—a diverse set of nearly 400 pure breeds and a far more populous group of free-ranging animals adapted to a human commensal lifestyle (village dogs). Village dogs are more genetically diverse and geographically widespread than purebred dogs making them vital for unraveling dog population history. Using a semicustom 185,805-marker genotyping array, we conducted a large-scale survey of autosomal, mitochondrial, and Y chromosome diversity in 4,676 purebred dogs from 161 breeds and 549 village dogs from 38 countries. Geographic structure shows both isolation and gene flow have shaped genetic diversity in village dog populations. Some populations (notably those in the Neotropics and the South Pacific) are almost completely derived from European stock, whereas others are clearly admixed between indigenous and European dogs. Importantly, many populations—including those of Vietnam, India, and Egypt—show minimal evidence of European admixture. These populations exhibit a clear gradient of short-range linkage disequilibrium consistent with a Central Asian domestication origin.

L. Shannon et al., Genetic structure in village dogs reveals a Central Asian domestication origin, PNAS (Published online October 19, 2015).

Siberian Genetics

Siberia and Western Russia are home to over 40 culturally and linguistically diverse indigenous ethnic groups. Yet, genetic variation of peoples from this region is largely uncharacterized. We present whole-genome sequencing data from 28 individuals belonging to 14 distinct indigenous populations from that region. We combine these datasets with additional 32 modern-day and 15 ancient human genomes to build and compare autosomal, Y-DNA and mtDNA trees. Our results provide new links between modern and ancient inhabitants of Eurasia. Siberians share 38% of ancestry with descendants of the 45,000-year-old Ust-Ishim people, who were previously believed to have no modern-day descendants. Western Siberians trace 57% of their ancestry to the Ancient North Eurasians, represented by the 24,000-year-old Siberian Malta boy. In addition, Siberians admixtures are present in lineages represented by Eastern European hunter-gatherers from Samara, Karelia, Hungary and Sweden (from 8,000-6,600 years ago), as well as Yamnaya culture people (5,300-4,700 years ago) and modern-day northeastern Europeans. These results provide new evidence of ancient gene flow from Siberia into Europe.

Valouey et al., "Reconstructing Genetic History of Siberian and Northeastern European Populations" (2015).

Eurogenes has noted that while there is a close correspondence between Malta-like ancestry and Eastern European Hunter-Gatherer genetics, that there is not a good correspondence between this autosomal component and the Y-DNA N1c1 commonly found in Uralic and Scandinavian populations. This maker may instead distinguish circumpolar from non-circumpolar populations.

Time depth is a tricky issue in Siberian populations. There appear to have been multiple populations sweeps from East to West and then back from West to East again across the region, even in historical times, and this area was completely depopulated during the LGM with less clarity than in some other places about where the refugia from which it was repopulated were located.

There are Paleo-Siberian layers (pre-modern Siberian), modern indigenous Siberian layers (pre-Uralic), Uralic layers (ca. 35th to 25th centuries BCE), Proto-Indo-European layers (Tocharians ca. 20th century BCE to ca. 6th century CE), Turkic migrations (1st to 6th centuries CE), Islamic expansions (West to East starting in the 7th century CE), Mongolian migrations (East to West ca. 13th-14th centuries CE), and Russian migrations (West to East starting in the 17th century or so CE). It is possible that instances of Y-DNA C in Europe in ancient DNA may represent traces of old East to West migration, but they are outliers and it is hard to say why they turn out. This list is illustrative only and surely contains mistakes and overlooks nuances.

Honestly, the extent to which Siberian ancestry is Paleolithic is remarkably high and probably reflects the fact that the region is largely ill suited to intense farming. It isn't clear to what extent "Ust-Ishim ancestry" and "Malta ancestry" overlap from my initial glance at the paper. But, it does appear that modern Siberia is more eastern influenced than western influenced. It also isn't clear to which extent the chosen populations minimize some of the migrations known to have occurred historically.

Tuesday, October 13, 2015

Scores Of New Ancient European Genomes

The Mathieson, et al. (2015) paper on "Eight thousand years of natural selection" has few surprises when it comes to the title of the paper that weren't either widely known or strongly suspected before the study was published, but its barrage of new raw ancient DNA data, and ancient Y-DNA data in particular, is impressive. The study has very little ancient DNA from the Atlantic region, however.

Saturday, October 10, 2015

Did The Yamnaya Die Or Run?

Razib Khan has tweeted the biggest story of the ASHG 2015 Conference (key points in bold):

@iosif_lazaridis revision of paper @mathiesoniain with more ancestry stuff on biorxiv soon
@iosif_lazaridis [Neolithic Anatolian] mtDNA look familiar to EEF. Y mostly G2a2. also J2 H and I at low frequency. C1 too
@iosif_lazaridis anatolian neolithic close to EEF on pca. but EEF shifted toward WHG #ASHG15
@iosif_lazaridis anatolian neolithic different from modern anatolian and se europe populations.
@iosif_lazaridis eurasian steppe, population transect done. 5,500 to 1,200 BC. author told me some R1a1a possibile stuff here yesterday
@iosif_lazaridis indo-european steppe = EHG + near eastern. new data eneolithic samara. 75% EHG ancestry. 25% "armenian" 5,200 to 4,000 BCE
@iosif_lazaridis poltavka people 3000 to 2200 BC basically like yamnaya. 50% EHG and 50% armenian-like. then srubnaya different.
@iosif_lazaridis srubnaya 2/3 yamnaya 1/3 middle neolithic european
@iosif_lazaridis yamnaya/poltavka went from R1b to R1a in the srubnaya period. z93 group found on bronze age steppe samara (s asian R1a)
@iosif_lazaridis there was back migration of EEF to the steppe after the initial yamnaya migration.

Via Eurogenes.

This is a huge new set of facts with potentially profound implications for how we understand European history.

In the span of a few centuries, starting around 2200 BC, a Y-DNA R1b dominated population, the Yamnaya and their descendants the Poltavka, were replaced in the southern part of the European steppe (or at least the men were) by a Y-DNA R1a dominated population with strongly overlapping autosomal genetic profiles, the Srubnaya.

One possibility is that the Yamnaya men were slaughtered by the Srubnaya men who may have assimilated some of the Yamnaya women, in a scenario mirroring that of the battles described in the Biblical Book of Numbers.

But, something remarkable happens in Western Europe right around the time that Y-DNA R1b men disappear from the southern part of the European steppe. All of the sudden, Y-DNA R1b that was virtually absent from Western Europe rapidly becomes the predominant Y-DNA haplogroup of Western Europe and there are substantial shifts in the mtDNA mix of Western Europe.

The distribution of Y-DNA R1b sub-haplogroups in Europe and their phylogenetic relationship, suggests a route from East to West of Y-DNA R1b carriers from the European steppe to central France and from their spoke-like migrations in all directions.

The insight in today's talk provides a push factor - the Srubnaya (whom Davidski at Eurogenes describes as more militaristic and technologically advanced than the Yamnaya). A collapse of Western European first wave Neolithic farming societies as a consequence of the 4.2 kiloyear event, meanwhile, may have left their societies in turmoil and collapse (including population collapse) leaving a political vacuum and slack in food production capacity once the event's harsh climate abated, into which the Yamnaya people, participating in a folk migration much like that of the "migration period" travels of Germanic tribes like the Goths, Visigoths, and Vandals in the Dark Ages, into Western Europe.

The Yamnaya were basically steppe pastoralists, which is to say, herders. Faced with a potentially deadly military adversary, farmers stand their ground upon which they rely to survive, even if the consequences are dire. But, herders in a culture that exults cattle and bulls rather than corn and wheat, don't have to suffer the consequences of standing and fighting against opponents who may be superior to them militarily, or just more determined. They can run, as an entire community, taking the cattle and horses that provide the source of their wealth with them, at a lower cost that does not have to be paid in blood.

And, wouldn't it stand to reason that people in a cattle herding society would be more likely to have LP genes that allow them to drink cow's milk as adults which was gradually selected for over thousands of years, which they would bring with them in their genes to their new homeland, than a society of farmers would be to suddenly develop this gene through explosive and rapid natural selection?

And, given that these people could have been ancestral to Europe's Basque (who have high frequencies of Y-DNA R1b, has traditions that place an emphasis on cattle, who arrives in their current relict homeland in my view from France, have high levels of the LP genes, and speak a language that is distinct in being ergative, just like the language of the Georgians with whom the Yamnaya's non-Eastern Hunter-Gatherer autosomal genetic component shows strong affinity), it is highly plausible that their language (and hence Basque and other Vasconic languages) was an offshoot of the Kartvelian language family, possibly after creolization with Eastern Hunter-Gatherer languages that also contributed strongly to the Proto-Indo-European language, and with substrate influences from whatever first farmer Neolithic language was spoken in Western Europe before they arrived.

Razib recently hypothesized that Proto-Indo-European and Afro-Asiatic were hunter-gatherer substrate languages that were adopted by Early European farmers who admixed with them.

But, I don't think that scenario is plausible. When hunter-gatherers and farmers collide, usually, the farmer's language prevails (see, e.g., Japan, where the rice farming Yayoi's language became the backbone of the Japanese language, but almost no words from the hunter-gather Jomon who spoke a language in the same language family as the Ainu made it into Japanese, despite the fact that something like 40% of the genetic ancestry of the Japanese is Jomon including a large share of the male Y-DNA), although sometimes there are some substrate influences that shape the superstrate language dialect that comes to be spoken in the blended community.

The alternative, which usually happens when the superstrate population is greatly outnumbered by a substrate population and the two populations have no linguistic common ground, is the development of a creole language, which Indo-European shows some elements of (or at a minimum simplification of the language driven by a large community of second language learners in the society), and which has been suggested on archaeological grounds by the mixed ethnicity communities that existed around the time and place that PIE came into being, long before ancient DNA tools were available.

The lexical similarities that Maju recently discovered between both Proto-Indo-European and Nilotic Nubian languages may very well be real, but he may have misapprehended the direction of the connection. Perhaps, the lexical similarities in Nilotic Nubian may be the result of Neolithic migrants to Africa who arrive via the Sinai and the Nile bringing the words of their language, related to Kartvelian, with them, where the existing residents adopted them, rather than the other way around.

The Yamnaya folk migration hypothesis which I have just sketched out, which is strongly motivated by powerful ancient DNA evidence, has the potential to pull together myriad puzzle pieces of European prehistory in a single stroke.

It doesn't answer all of the questions.

What connection did the Bell Beaker culture have to the Yamnaya?

Is there any archaeological evidence to support this hypothesis? And, if there was such evidence, what would we expect it to look like? Is the lack of evidence of an apocalyptic war that destroyed the vast majority of Yamnaya men itself evidence favoring this hypothesis? Before dismissing this conjecture for a lack of archaeological evidence, at a minimum, the archaeological evidence should be reviewed with fresh eyes informed by this hypothesis.

Did other members of a Yamnaya diaspora make their way to Western Anatolia (perhaps Troy I?) or Crete?

Finally, when did the migration(s) start?

Perhaps the Poltavka people are the Yamnaya who held out on the southern European steppe longer, but the transition from the Yamnaya culture to the Poltavka culture was a product of the disruption caused by the folk migration of the rest of the Yamnaya to Western Anatolia, Crete and Western Europe. A 3000 BCE start date for these migrations is a better fit to the archaeological culture that could potentially reflect their arrival in their putative destinations.

This also illustrates the fact that dying and running are not necessarily mutually exclusive possibilities. Perhaps some of the Yamnaya ran in various directions, giving rise to the various European cultures in which Y-DNA R1b is common, while other Yamnaya stood their ground, becoming the Poltavka, and ultimately died at the hands of Y-DNA R1a dominated peoples from the northern European steppe who slaughtered the Poltavka and took their land.

Fortunately, it is likely, given the stunning improvements that have been made in ancient DNA extraction, that these are questions to which the answer probably isn't, "we may never know." Instead, stay tuned. More answers seem to lurk around every corner.

Thursday, October 8, 2015

A Brief History of Exponents

The Math With Bad Drawing blog has a nice little post explaining very lucidly who the notion of exponents of repeated multiplication was generalized in a way that is pretty much unique to allow for exponents that have values other than whole numbers.

This fact, typically first taught in middle school or high school algebra, has been well known for a long time. Euclid toyed with the idea a little. Ancient Greek scientist Achimedes first generalized the concept and proved the law of exponents. A fairly efficient form of exponential notation was invented by Nicolas Chuquet in 1484. More than three hundred years ago René Descartes established the modern superscript notation for exponents in the late 1600s around the same time that Newton's law of gravity and motion were invented and around the same time that Newton and Leibniz invented calculus (the modern notation used in undergraduate calculus follows the practice of Leibniz and not Newton's much more awkward notation).

There has been one notable elaboration of a similar concept in mathematics since then, called the fractal dimension which was first formally defined using that name by the late Benoit Mandelbrot in 1967 and entered the upper level college mathematics curriculum in the late 1980s and early 1990s, around the time I was an undergraduate math major. This concept was also invented in Newton's day, but then consigned to the dustbin of history as a curiosity until the late 1800s when several mathematicians developed it some more, and then remained out of sight until Mandelbrot, more or less single handedly repopularized the concept in a way that actually stuck and found practical applications.

The fractal dimension generalizes the notion of a dimension in a manner similar to the way that the law of exponents generalizes the notion of repeated multiplication by relating change in detail to change in scale. For example, the smaller the ruler you use to measure a shoreline, the longer the shore gets in ruler lengths, because the ragged pattern of a shoreline has a high fractal dimension, while a smooth shoreline would have a low fractal dimension and doesn't change in length at all based upon the length of the ruler used to measure it.

I probably wouldn't ordinarily have found any of the blog post on exponents notable at all. But, earlier just this week, I had been thinking about the precise issue of how the generalized notion of an exponent is so much more subtle than the naive repeated multiplication definition, in the context of thinking about Euler's formula and the Euler's number "e", which is equal to approximately 2.71828 and is a transcendental number that cannot be produced from the ratio of any two integers (something called a rational number). It felt remarkable to see in illustrated print found at random on the Internet, almost exactly the same line of thought.

I guess I still belong to the math tribe, even though I'm a lawyer now.

4500 Year Old Ethiopian Ancient DNA

UPDATE 3 (January 25, 2016): The portion of this paper pertaining to Eurasian admixture in people outside East Africa was due to an IT error and was retracted. Some key inaccurate conclusions have been stricken below, but a careful reread is necessary to confirm the accuracy of all statements below.

UPDATE 2 (October 9, 2015): More figures at this tweet.

UPDATE: This is the first African autosomal ancient DNA sample that I am aware of, a remarkable technological feat, and it is paradigm shifting. There is so much data in the whole genome of even a single individual, and the accumulated genomes of various modern and ancient populations is sufficiently significant already, that it is possible to make reliable and powerful inferences even with a sample size of just N=1 from a new population, ancient or modern, as this paper does.

ORIGINAL POST:

I had originally read this paper to imply that the 4500 year old Southwest Ethiopian male Mota in the sample was Eurasian admixed. Upon a more careful reading, it appears that this individual predates significant recent Eurasian admixture and can be used as a reference point to establish the levels of Eurasian back migration found in other African populations, since he has no measurable Eurasian admixture himself.

Characterizing genetic diversity in Africa is a crucial step for most analyses reconstructing the evolutionary history of anatomically modern humans. However, historic migrations from Eurasia into Africa have affected many contemporary populations, confounding inferences. Here, we present a 12.5x coverage ancient genome of an Ethiopian male (‘Mota’) who lived approximately 4,500 years ago.

We use this genome to demonstrate that the Eurasian backflow into Africa came from a population closely related to Early Neolithic farmers, who had colonized Europe 4,000 years earlier. The extent of this backflow was much greater than previously reported, reaching all the way to Central, West and Southern Africa, affecting even populations such as Yoruba and Mbuti, previously thought to be relatively unadmixed, who harbor 6-7% Eurasian ancestry.

M. Gallego Llorente et al, "Ancient Ethiopian genome reveals extensive Eurasian admixture throughout the African continent" Science (October 8, 2015) DOI: 10.1126/science.aad2879

Hat tip to Dienekes.

Uniparental Haplogroups

Eurogenes notes that "this individual belongs to Y-haplogroup E1b1 and mtDNA haplogroup L3." These uniparental haplogroups, which are disclosed in the supplemental materials to the paper.

The mtDNA haplogroup is more specifically L3x2a which "is restricted to the Horn of Africa and the Nile Valley in modern Ethiopian samples, suggesting a degree of maternal continuity in Ethiopia over the past 4,500 years. . . . Mutation E-P2, present in Mota, represents the most widespread subclade of haplogroup E and has been found at high frequency in modern Ethiopians."

This individual also strengthens the case for an African origin of Y-DNA E, relative to a back migration hypothesis, because trace levels of Neanderthal ancestry found in other Africans can now be firmly attributed to recent Eurasian sources and are not present in this not really very old Y-DNA E African individual without Neolithic era Eurasian admixture. This suggests that any back migration of Y-DNA E would have happened, if it did happen, prior to any Neanderthal admixture, which is present in all modern non-Africans.

The Source and Context of the Ancient DNA Sample

Mota Cave, situated 1,963 meters above sea level in the Gamo highlands of southwest Ethiopia, overlooks the Kulano River, a tributary of the Deme-Omo River. The cave was found in 2011 in collaboration with local Gamo elders and partially excavated in 2012. It measures 14 meters in width and 9 meters in depth and contains more than 60 centimeters of anthropogenic deposits and substantial rock fall. The cave’s deposits suggest at least seven different human occupations from the middle to late Holocene (c. 5295 BP to c. 300 BP), and contains the only middle Holocene burial known in southwest Ethiopia. This burial consists of a complete but fragmentary male adult skeleton dated via AMS radiocarbon to the fifth millennium BP (OxA-29631: 3997 ± 29 BP; 4524-4418 Cal BP).

This is part of an endorheic basin that flows into Lake Turkana in the Southwestern Ethiopia.

The context isn't well enough established to know if this individual was part of a hunter-gatherer, Neolithic, or metal age material culture, but there are hints that he might have been a hunter-gatherer because the only relics found with the body are "A geode and at least 27 obsidian, chert, and basalt flaked stone tools were found in the grave; such artefacts are characteristic of the Later Stone Age lithic tool assemblage present in much of the cave’s deposits.", and because of his genetic affinity to the Sandawe people, discussed below, who are a click speaking people who were a relict hunter-gatherer population of East Africa until about 150 years ago.

Genetic Affinities Of Linguistic Groups

This is right in the vicinity of the homeland of the Southern Omotic languages like Ari. Cushitic languages are also spoken in the region, which has a high level of linguistic, religious and ethnic diversity in an area that is 90% rural.

The supplement also notes that "Principal component analysis shows that Ari and Sandawe are the closest contemporary populations to Mota.", and that Mota has no discernable Neanderthal component relative to modern African populations.

Mota was placed close to the Ethiopian samples, in between the clusters formed by the Ari and the Sandawe (but very close to an Ari individual that stands out from the rest of that group). The Ari can be split into two castes, Ari Cultivator and Ari Blacksmith, which share a common origin within the last 4,500 years. Since data on a larger number of SNPs are available for Ethiopian populations, we repeated the PCA using this higher quality dataset, which gave us 484,161 usable SNPs that could be called in Mota. Once again, Mota fell in between the Ari and the Sandawe cluster. . . . The Ari speak a language classified as Omotic, which is the most differentiated branch of the Afro-Asiatic languages. Gumuz, a population member of the Nilo-Saharan family (also an Afro-Asiatic language), also shows a high level of shared drift with Mota, but significantly less than the Ari. Sandawe, which are closer to Mota in the PCAs, do not show high shared drift with Mota in the f3, possibly because they are closer to the Khoisan populations than the other Eastern African populations.

Southwest Ethiopia is about as far from the place where any putative migration across the Gate of Tears would have taken place as one can be in Ethiopia and is in an area where Omotic languages are among the languages currently spoken. The Sandawe people who also cluster with Mota currently live in central Tanzania but almost surely had a much larger geographic range in the past. The highly tonal features of the Omotic languages may perhaps reflect a modified click language heritage.

The date is also a bit early for any Eurasian admixture to have an Ethio-Semitic source. And, other recent studies have suggested that Eurasian admixture in non-Ethio-Semitic populations of Ethiopia (presumably arriving via the Blue Nile) took place at about the same time as Ethio-Semitic admixture.

Mota is not particularly close to Nilotic, Cushitic or Ethio-Semitic populations genetically. Nor was Mota close to the Hadza people, another relict population of Paleo-Africans in East Africa. Only the Sandawe and Omotic populations were reasonably close to Mota in the PCA analysis. The study also tends to show that the Sandawe and Ari people of the Owo Valley cluster together rather closely genetically relative to other African populations, possibly shedding some light on the linguistic position of the Omotic language. Both Cushitic and Ethio-Semitic populations deviate from the cluster that includes Mota in the same direction, while Nilotic and Hadza populations are essentially orthogonal to the Afro-Asiatic populations in the PCA. The Omotic people, Sandawe people, and Mota are clustered together midway between the Afro-Asiatic, Nilotic and Hadza spokes at about 120 degree angles from each other.

Implications For Eurasian Ancestry In Other Africans

We used f4 ratio analysis to formally assess the extent of back-migration to Africa by West Eurasians . . . . Mota does not show any evidence of a West Eurasian component. . . . This contrasts in particular with the Ari, their closest contemporary relatives, which show large West Eurasian components (17.8%±1.0% and 14.9%±1.2% for Ari Cultivator and Ari Blacksmith, respectively). We confirmed that such a difference is not due to a comparison of a single individual to population estimates by recomputing the f4 ratio for each individual belonging to an Ethiopian population in our dataset.

The absence of a West Eurasian component in Mota supports the dating of the backflow into Africa, which, at ~3.5kya, is younger than our ancient genome (dated to 4.5 kya). Given that Mota predates the backflow, it potentially provides a better unadmixed African reference than contemporary Yoruba. Thus, we recomputed the extent of the West Eurasian component in contemporary African populations using Mota . . . instead of Yoruba in our f4 ratio. By using this better reference, we estimated West Eurasian admixture to be significantly larger than previously estimated, with an additional 6-9% of the genome of contemporary African populations being of Eurasian origin. Importantly, this analysis shows that the West Eurasian component can be found also in West Africa, albeit at lower levels than in Eastern Africa. Importantly, a sizeable West Eurasian component is also found in the Yoruba and Mbuti, which are often used a representative of an unadmixed African population.

~~Ethiopians have more Eurasian admixture than other Africans, but essentially all modern Africans have significant levels of Eurasian admixture relative to Mota.~~

~~With respect to Neanderthal and Denisovan ancestry:~~

Given that Mota is our best example of an unadmixed African population, we used it as a reference to assess the affinity of a number of contemporary genomes with Neanderthals. We also investigated the effect of using Mota as a reference when estimating Denisovan introgression. We performed this analysis using the complete genomes (rather than a subset of SNPs as in earlier analyses), since a large number of SNPs is needed to obtain accurate estimates. . . . Both Yoruba and Mbuti were shown to have a small Neanderthal component, in line with their West Eurasian ancestry. As expected, estimates for French and Han were higher than for either of the two contemporary African genomes (from 0.21% in Mbuti to 2.96% in Han).

No evidence of any Denisovan ancestry was found in Mota or any of the other African samples tested with him as a reference for an unadmixed African genome.

The Nature of the Neolithic Eurasian Population Migrating To Africa

The fact that the inferred Eurasian component in other Africans determined with reference to Mota is similar to early Neolithic farmers in Europe is also notable.

Since we have in Mota an unadmixed African population, we can look for the origin of the West Eurasian backflow by modelling contemporary Ari as a mixture of Mota and possible source populations.

We do this by using the admixture f3-statistics . . . from our global panel or a Eurasian ancient genome. For the latter, we used a representative of Mesolithic hunter-gatherers (Loschbour), and one of the Early Neolithic farmers (LBK, also known as Stuttgart); these two genomes were chosen for their high coverage, allowing us to use most of the SNPs available for contemporary populations and Mota....

LBK (an early Neolithic farmer) and Sardinians are the two most likely sources (showing the most negative admixture f3 values) for the Eurasian admixture in the Ari. A number of other analyses have shown Sardinians to be the closest contemporary population to early Neolithic farmers that came into Europe from the Near East, as contemporary populations from that region have been affected by large-scale populations movements in the last few millennia. Thus, the West Eurasian backflow originated from the direct descendants of the same early farmers who brought agriculture into Europe. Given that we have a putative source for the West Eurasian component, we can re-estimate its extent by using LBK as its source in our estimation of the f4 ratio . . . . without having to worry about West African ancestry in the source.

We next tested whether the West Eurasian component found in Yoruba, which had been previously suggested to be older than Mota [dated to 9.6k±1.8k yrs ago . . .], comes from the same source found for the Ari. We use the D statistics . . . . from our global panel or a Eurasian ancient genome. Sardinians and LBK were again found to be the most likely source of the West Eurasian component (giving the strongest positive values that indicate excess affinity between X and Yoruba compared to Mota). This result suggests that there was a single source for the West Eurasian component found throughout Africa.

The Copper and Bronze Age steppe and indigenous European hunter-gatherer population sourced admixtures that transformed the gene pool of early Neolithic Europe did not, by and large, extend to Africa. But, given recent ancient DNA results from Neolithic Western Anatolia and the shifts that Near Eastern populations have seen genetically since then, I'm inclined to call a population that is similar to LBK and Sardinian individuals, Western Anatolian rather than Near Eastern.

This is consistent with previous studies showing that Eurasian admixture in East African and Khoisan people was more similar to Levantine people than to South Arabians and (except among highly West Eurasian Ethio-Semitic individuals with some South Arabian affinities) was of a uniform character throughout Africa similar in proportion and type to modern Omotic people. It would be interesting to see, however, if the Chadic people who live mostly in the Sahel between North Africa and Sub-Saharan Africa, have different autosomal Eurasian affinities to match their unique Y-DNA Eurasian affinities.

As long as the people with Early European Farmer type genetic began their migration that culminated in sub-Saharan Africa before the influx of Steppe-like people into Europe, this doesn't pose a paradox. As summarized in a blockbuster paper earlier this year:

By ~6,000-5,000 years ago, a resurgence of hunter-gatherer ancestry had occurred throughout much of Europe, but in Russia, the Yamnaya steppe herders of this time were descended not only from the preceding eastern European hunter-gatherers, but from a population of Near Eastern ancestry. Western and Eastern Europe came into contact ~4,500 years ago, as the Late Neolithic Corded Ware people from Germany traced ~3/4 of their ancestry to the Yamnaya, documenting a massive migration into the heartland of Europe from its eastern periphery. This steppe ancestry persisted in all sampled central Europeans until at least ~3,000 years ago, and is ubiquitous in present-day Europeans.

Allowing at least 500-1,500 years for a group of Early European Farmer-like people to migrate from Western Anatolia to Ethiopia before the resurgence of hunter-gatherer ancestry or the steppe ancestry had changed the European gene pool is not an unreasonable scenario. This trip involves a march of about 1200 miles more or less due South (although, obvious, the route would not be as the crow flies).

This is comparable to the time needed for Early European farmers to advance that far (i.e. to the Northern coast of Continental Europe and Southern Scandinavia) and with that much of a change in latitude in Europe during the first wave of the Neolithic revolution in Europe.

The time depth and distribution of Y-DNA T (which is present at relatively high levels on Omotic and Cushitic populations relative to Ethio-Semitic populations) suggests that this may have been an important Y-DNA haplogroup of the EEF-like Neolithic farmers whose autosomal DNA contributed to Africa's gene pool via the Levant, possibly with Y-DNA J mixed in (although the multiple possible historical events that could have spread Y-DNA J complicate the analysis). But, Y-DNA T is too young to be a plausible candidate accompanying the spread of mtDNA M1 (as has been suggested by some) and U6 in their migrations back from Eurasia ca. 30,000 years ago, and is a poor fit to mtDNA clades that probably arrived in Africa via Iberia and then spread across North Africa to East Africa.

Y-DNA F* is basically absent from Africa, and Y-DNA I, while old enough, has a distribution that is to thin and patchy to be a very strong candidate for a companion to mtDNA M1 and U6.

Y-DNA J has about the right geographic spread in Africa to match mtDNA M1 and U6 as part of the same back migration, but it is hard to know how much of Y-DNA J is due to Semitic migration to Africa (Ethio-Semitic and Phoenician first, and then Arab later) in the last 4,000 years, how much is due to earlier Neolithic and Paleolithic migrations. Another possibility is that a Y-DNA E population migrated to Iberia early in the Upper Paleolithic era (where it left genetic traces) and then back migrated to NW Africa ca. 30,000 years ago with mtDNA M1 and U6 women from Europe.

The Origins of African Herding And Farming

The apparent timing of the Eurasian Neolithic admixture in almost all modern Africans is also relevant to determining what role, if any, migrant people from food producing societies played in the conversion of wild African plants into domesticated crops that sustained early African-style farming.

Well dated plant remains can determine when domestication happened, and this can be compared to the apparent dates of admixture of people who resembled Early European Farmers genetically (probably about 1500 BCE - 2500 BCE with the Ethio-Semites arriving closer to 1500 BCE and the best guess for other farmers via the Nile closer to 2000 BCE).

Some of the data on the switch to food production is here. Cattle reached Egypt in the earliest part of the Neolithic revolution in Africa around 7000 BCE, the donkey was locally domesticated around 6000 BCE, and sheep and goats appear around 5000 BCE.

Fertile Crescent crops were mostly unsuited to sub-Saharan climates, so farming came much later to African than herding. Dillon (2007) argues that "The domestication of sorghum has its origins in Ethiopia and surrounding countries, commencing around 4000–3000 BC." Pearl millet cultivation became ca. 3200-2700 BCE in Africa and started in the West with a transfer to the East and to India by 1700 BCE. An early type of locally developed farming in Ethiopia was originally conducted primarily by Omotic people and was flourishing when the Ethio-Semites arrived, but was largely displaced and set aside when Ethio-Semites brought their more developed farming techniques to the Ethiopia ca. 1500 BCE.

The balance of the evidence, therefore, favors the development of African domesticated plants after Fertile Crescent pastoralist populations arrive in Africa, but before a major demic contribution of people with Early European Farmer genetics.

Of course, Mota, because he lived in such an isolated area, could have been one of the last unadmixed people of Africa when he died. There is a decent chance that there would have been some Omotic farmers within a couple hundred miles or so of the place he died at the time.

Functional Traits

Functional traits discerned from Mota's genome include the following:

Skin colour could not be determined although Mota did not have common European variants associated with light skin colour (rs16891982 and rs1426654). Mota was determined to have had brown eyes (p-value = 0.997) and dark (p-value = 0.996), probably black (p-value = 0.843) hair. . . . Mota did not have any of the major alleles known to cause lactase persistence. . . . Mota . . . lived at high altitude and was . . . likely adapted to hypoxia.

Thus, Mota was probably black, brown eyed and dark haired, lacked lactase persistence associated with many herding and farming populations in Africa, and was genetically adapted to high altitudes.

Wednesday, October 7, 2015

Strict New Limits On BSM Physics

Increasingly, complex theories of particle physics and cosmology are disfavored by experiments.

* There are new combined limits on dark matter product from the ATLAS and CMS experiments at the Large Hadron Collider (LHC) based upon complete Run I data. No dark matter signal has been observed at the LHC.

The LUX direct dark matter detection experiment still places the most strict bounds on a cross-section of interaction with nucleons for spin independent dark matter (about 10^-45 per cm^2) for dark matter particles of about 10 GeV/c^2 or more of mass. But, for lighter dark matter particles (certainly below 1 GeV), the maximum cross section of interaction with nucleons is set by CMS at about 10^-40 per cm^2 for spin independent dark matter and about 10^-41 per cm^2.

The cross-section of interaction of a neutrino with a nucleon is on the order of 4*10^-39 to 8*10^-39 per cm^2/GeV. Thus, the bounds on dark matter cross-sections of interaction from CMS are comparable to those of neutrinos with hundreds of MeV/c^2 of kinetic energy for dark matter particles up to about 10 GeV. For dark matter particles with masses of 10 GeV or more, exclusion from LUX is comparable to that of neutrinos with less than 10 eV/c^2 of kinetic energy (still relativistic by about three orders of magnitude, but nevertheless a very low energy for a neutrino).

Also, as recently noted, experimental observations of cosmic rays emitted by dwarf galaxies which are dark matter dominated in the dark matter particle theories, place strict bounds on the mean lifetime and dark matter annihilation cross-sections of any potential dark matter particle. Dark matter must have a mean lifetime much longer than the age of the universe and must very rarely annihilate. But, this limitation is more model dependent than some of the other boundaries.

None of these experiments, of course, can rule out any kind of dark matter particles whose only interactions with ordinary matter are via gravity, a particularly simple kind of dark matter model that is increasingly favored.

* Theories with an additional Higgs doublet predict an additional pseudo-scalar neutral Higgs boson, often called A, which could be light. The BESIII collaboration has put increasingly tight boundaries on this possibility in the 212 MeV to 3 GeV mass range, where maximum branching fractions can now be not more than 4.7*10^-6 in J/Psi decays, and is about 100 times smaller than that in parts of that mass range.

Previous experiments have excluded it in other mass ranges for the pseudo-scalar neutral Higgs boson called A. Generally, these experiments rule out light A bosons for masses from about 212 MeV to 9 GeV with significant branching fractions in a quite model independent fashion, and rule out supersymmetric A bosons with masses of less than that of the Z boson (about 90.1 GeV).

There is simply no meaningful experimental evidence to support theories with multiple Higgs doublets, including supersymmetry.

* New, more strict, limits have been set on the maximum magnetic moment of the neutrino.

The scattering of solar neutrinos off electrons in Borexino provides the most stringent restrictions, due to its robust statistics and the low energies observed, below 1 MeV. Our new limit on the effective neutrino magnetic moment which follows from the most recent Borexino data is 3.1 x 10^-11 mu_B at 90% C.L. This corresponds to the individual transition magnetic moment constraints: |Lambda_1| less than 5.6 x10^-11 mu_B, |Lambda_2| less than 4.0 x 10^-11 mu_B, and |Lambda_3| less than 3.1 x 10^-11 mu_B (90% C.L.), irrespective of any complex phase.

The Standard Model expectation with a simple Dirac mass neutrino model is 3*10^-19 mu_B. This is non-zero mostly because there is a chance that the neutrino will emit a virtual W boson and a virtual charged lepton that emits a photon at the one loop level. But, it can be much higher (to the point of approaching thresholds of experimental detection) in models where neutrinos have Majorana mass and in supersymmetric models.

Essentially, this is yet more evidence (along with the continuing non-detection of neutrinoless double beta decay) tending to show that violations of baryon number conservation and lepton number conservation are non-existent, or at least virtually non-existent (high energy sphalerons aside) to the point where they are insufficient to account for the baryon asymmetry to the universe, if you assume that the starting point of the universe had matter and antimatter in equal amounts, or was pure energy.

* There are some two sigma tensions between SM predictions and experimental data in the areas of CP violation and the CKM matrix at the LHC, but researchers think that this it is likely that this is due to "penguin pollution" in the Standard Model predicted value (i.e. the impact of often ignored Feynman diagrams that go into the final prediction but are hard to calculate called "penguins" based upon the way that the Feynman diagram that goes into the calculation looks visually). Overall, however, the new data "set strong constraints on models" beyond the Standard model.

Tuesday, October 6, 2015

Neutrino Physicists Win 2015 Nobel Prize

Two leading neutrino physicists have won the 2015 Nobel Prize in Physics for their work on neutrino oscillation.

Neutrino physics remains a work in progress.

The three main parameters governing neutrino oscillation (mixing angels theta12, theta13 and theta23) and the relative masses of the three neutrino mass eigenstates (delta m12 and delta m23) has been established (although not always with the precision desired).

But, there are important outstanding questions regarding the nature of neutrino mass (Dirac or Majorana, through the Higgs mechanism or otherwise such as a see-saw mechanism), the absolute values of the neutrino masses, the CP violating phase of neutrino oscillation (if any), and the possibility that more particles (e.g. "sterile" right handed neutrinos) or more complex models than the PMNS matrix may be necessary to describe neutrino oscillation completely.

Backreaction examines the work of the prize winners in more detail.

Monday, October 5, 2015

Parasites

The 2015 Nobel Prize for Medicine goes to researchers who found novel treatments for diseases caused by parasites. The map in the linked article showing where parasitic diseases are a concern. This seems not very different from some important geographic regions in the study of prehistory, suggesting that parasitic diseases could be an important big picture factor in shaping prehistory, especially after accounting for shifting climates over time.

Dark Matter Annihilation Must Be Very Rare If It Happens

Examination of the cosmic rays produced by a dwaft galaxy with an apparent high proportion of dark matter places strict limits on the dark matter annihilation cross-section and mean dark matter lifetime for dark matter candidates with 10 GeV or more of mass.

The age of the universe is about 4.35*10¹⁷ seconds (13.8 billion years). The minimum mean lifetime of dark matter with various assumptions given the observations made in this study is from 10²⁵ to 10²⁷ seconds. Thus, 99.999999% or more of the dark matter, if it exists and has 10 GeV or heavier particles, that ever in existence during the lifetime of the universe must still exist.