Dispatches From Turtle Island: September 2015

Wednesday, September 30, 2015

The Big LGM Bottleneck

The 1000 Genomes paper has been published and has few surprises, given that the data that went into it has mostly been widely used for many years.

The one really striking point made by the paper, however, is that modern humans experienced the most intense bottleneck in the history of the species during the Last Glacial Maximum (LGM) and the several thousand years thereafter (i.e. 20,000 to 15,000 years before present). The effective population size of each of the non-African populations in the sample at that time depth was under 1,500. The bottleneck was less intense in Africa where the effective population size was somewhat in excess of 4,500 at the bottom (but still more intense than the bottleneck at any point our species history going all of the way back to Homo Erectus). Basically, our entire species rebooted at that point, while we were still all hunter-gatherers.

Effective population size is not strictly comparable to census population. There were probably more than 7,000 people alive outside of Africa even at the most dire moment of the LGM, and probably more than 10,000 people alive in Africa at that point. But, it is quite surprising that the bottleneck during the LGM was more severe, for example, than the bottleneck at the point of Out of Africa migration, or the Toba eruption that may have facilitated the passage of modern humans from South Asia to Southeast Asia and beyond by the Southern route.

The bottleneck around the time of the settling of the New World in the population ancestral to the first wave of Native American migration, of course, is well known. The number of people belonging to the founding population of the Americas at its lowest point may have numbered in the hundreds, or even less.

Likewise, it is widely known that Northern Europe and Northern Asia was entirely depopulated and covered with a glacier at that point in time with relict populations surviving only in three Southern European refugia (the Franco-Cantabrian refugia, one in Italy, and one in the mountains of far Southeast Europe).

But, the intensity of the bottleneck in Asia was a surprise to me. South Asia, Southeast Asia and East Asia were not rendered uninhabitable during the Last Glacial Maximum. They were not buried under a glacier. The weather in India wouldn't have been all that much worse than the weather in much of Africa. Yet, South Asians, Southeast Asian, and East Asians suffered a bottleneck just as severe.

Monday, September 28, 2015

Sargin and Faizal Refuse To Amend Physics Paper While Knowing It Is Wrong

Good physicist Sabine Hossenfelder rightly calls out the authors on a new paper about the black hole physics of loop quantum gravity, who make a claim in the title of the paper that they later admitted to her was incorrect when she contacted them and pointed out their error.

But, they wouldn't change the paper and minimized their error, even though the claim that they have wrong is in the very title of their paper. This pretty much totally offends the entire process of posting pre-prints at arxiv.org, and the entire peer review process. So, she wrote a blog post that will track back to the paper.

You might get away with the impression that we have here two unfortunate researchers who were confused about some terminology, and I’m being an ass for highlighting their mistakes. And you would be right, of course, they were confused, and I’m an ass. But let me add that after having read the paper I did contact the authors and explained that their statement that the LQG violates the Holographic Principle is wrong and does not follow from their calculation. After some back and forth, they agreed with me, but refused to change anything about their paper, claiming that it’s a matter of phrasing and in their opinion it’s all okay even though it might confuse some people. And so I am posting this explanation here because then it will show up as an arxiv trackback. Just to avoid that it confuses some people.

The offending authors are Ozan Sargın and Mir Faizal, and the paper is: "Violation of the Holographic Principle in the Loop Quantum Gravity" http://arxiv.org/abs/1509.00843.

Friday, September 25, 2015

Unsolved Math Problem Cracked

Eccentric and highly collaborative mathematician Paul Erdős

It took more than 80 years, but a problem posed by a mathematician who delighted in concocting tricky ones has finally been solved.

UCLA mathematician Terence Tao has produced a solution to the Erdős discrepancy problem, named after the enigmatic Hungarian numbers wizard Paul Erdős. Tao’s proof, posted online September 18 at arXiv.org, shows that the difference (or discrepancy) between the quantities of two elements within certain sequences can grow without bound, even if someone does the best possible job of minimizing the discrepancy.

From Science News.

The full proof is here. A post about the proof at Terence Tao's blog is here.

Basically, Tao has proved that a special kind of series of alternating +1 and -1 terms added together is infinite, even though it would seem naively that the +1 and -1 terms of the series would balance out.

This conjecture has been around for more than eighty years and was published as one of a group of unsolved problems in mathematics by Paul Erdős in the article "Some unsolved problems," 4 Michigan Math. J. 299-300 (1957).

Meanwhile, another famous unsolved problem in mathematics, the Riemann hypothesis, remains unsolved. The connections between the Riemann hypothesis and physics are explored here. While billions of numerical tests of the Riemann hypothesis have not yet found a single exception, just one counter-example would disprove it and could be stated in a single line.

Tuesday, September 22, 2015

Comprehensive Tree Of Life Prepared

Above is a visualization of the entire tree of life for all species on Earth in a single image.

A first draft of the tree of life for all 2.3 million named species of animals, plants, fungi and microbes has been released. Thousands of smaller trees have been published over the years for select branches, but this is the first time those results have been combined into a single tree. The end result is a digital resource that is available online for anyone to use or edit, much like a 'Wikipedia' for evolutionary relationships.

From here. There is also a PNAS paper announcing the draft and discussing how issues like conflicting opinions regarding taxonomy are addressed.

This is one of those "wicked" problems that can only be solved once after which there remains plenty to be done in biology and taxonomy, but no future research ever will fundamentally change the overall structure once it research a fully up to date draft version.

It isn't entirely clear from the press release how horizontal gene transfers and hybrids are described, since not all of evolution is tree-like. The initial draft is based upon 500 pre-existing sub-trees that have already been published. Also, there have long been comprehensive trees of life that don't go all of the way down to the level of detail of specific species, but do include all of the top several levels of taxonomic classification (traditionally, Kingdom/Domain, Phylum, Class, Order, Family, Genus, Species, Subspecies/Race/Breed). The modern trend is to identify "clades" without worrying about exactly which level of classification is represented by a clade. For example, a dispute over whether a clade above the species level should be a "class" or an "order" is largely considered anachronistic.

The first draft includes some trees that are known to be outdated but are easily uploaded as a stop gap measure until better publications that are less easily translated into digital form are incorporated.

Stone Monuments

A recent paper discussed at Bernard's blog examines the similarities between stone monuments (especially anthropomorphic stele) in the Mediterranean region and those on the Steppe. The Old European culture blog makes a similar more focused comparison of such stone monuments in the Balkans and Ireland.

These stylistic connections are too striking an idiosyncratic to be mere coincidences. But, when did these styles migrate, who was associated with that migration, and in what direction did the cultural transfer take place? Were these aspects of physical culture associated with a particular language or language family, and if so, which one? Is this a case of "pots are people" or were these aspects of physical culture transmitted without any major demic migration?

Consider the many cultural connections between the Balkans and the Irish that are explored at the Old European culture blog. The most recent and obvious source for linguistic similarities would be via proto-Celtic or proto-Celtic/Italic which probably originated not far from the Balkans and then migrated to Ireland evolving into Celtic en route, arriving there something within the last 2500 to 3000 years ago. Many of the cultural objects which are similar could also date to this time period.

Even more recently, but less plausibly, some of these connections could be due to shared exposure to the Roman Empire.

But, if there are older cultural links, that would suggest connections via some non-Indo-European peoples, such as the first wave Neolithic population, or a latter Copper or Bronze Age transmission. A link via a post-first wave Neolithic population that is pre-Celtic would be particularly interesting because this era of prehistory is still not well understood.

Some scholars have associates the population genetic shift in Western Europe from low frequencies of Y-DNA R1b-M269 to high frequencies of Y-DNA with a culture that was the means by which the practice of making these stone monuments was transferred from the East to Western Europe, via "the Stele people".

It also isn't entirely clear if the pre-Celtic peoples of Western Europe were non-Indo-European, as I tend to believe, or if a pre-Celtic Indo-European language prevailed in the region earlier (although still probably after the first wave Neolithic migration to Western Europe).

Monday, September 21, 2015

Euler's Formula

Woit's blog has a nice textbook section format set of materials on Euler's formula (e.g. e^ipi=-1) for use in a Calculus II class that also does a good job of explaining some of the identities of the trig functions.

Austro-Asiatic Origins

Another ASHG 2015 paper. Linguistically Austro-Asiatic populations (e.g. the Vietnamese) have their homeland in China and migrated through Southeast Asia to India (the Munda), not the other way around.

XM. Zhang, et al., "Y-chromosome diversity suggests southern origin and Paleolithic backwave migration of Austro-Asiatic speakers from eastern Asia to the Indian subcontinent."

Analyses of an Asian-specific Y-chromosome lineage (O2a-M95)—the dominant paternal lineage (60.65% on average) in Austro-Asiatic (AA) speaking populations, who are found on both sides of the Bay of Bengal—led to two competing hypothesis of this group’s geographic origin and migratory routes. One hypothesis posits the origin of the AA speakers in India and an eastward dispersal to Southeast Asia, while the other places an origin in Southeast Asia with westward dispersal to India.

Here, we collected samples of AA-speaking populations from mainland Southeast Asia and southern China and then analyzed both the Y-chromosome and mtDNA diversities. Combining our samples with previous data, we generated a comprehensive picture of the O2a-M95 lineage in Asia, including both AA and Daic speaking populations.

We demonstrated that the O2a-M95 lineage originated in the southern East Asia among the Daic-speaking populations ~20-40 thousand years ago and then dispersed southward to Southeast Asia after the Last Glacial Maximum before moving westward to the Indian subcontinent. This migration resulted in the current distribution of this Y-chromosome lineage in the AA-speaking populations. Further analysis of mtDNA diversity showed a different pattern, supporting a previously proposed sex-biased admixture of the AA-speaking populations in India.

A Pre-Yayoi, Post-Jomon Wave Of Hmong Migration To Japan?

This too is from the ASHG 2015 Conference abstracts. Mostly it confirms past investigations, but is is notable for detecting a previously unsuspected wave of Hmong migration to Japan between the original Jomon wave of migration to Japan and the Yayoi migration of East Asian rice farmers from Korea ca. 300 BCE.

W. Ko, et al., "Genetic origins and admixed ancestry characterization of Japanese people."

A modern human population found at a certain geographic location is often descended from multiple ethnic groups owning to the complex migration history of human expansion. In Japan, although it has been studied extensively over the past decades, the genetic origins of Japanese people remain controversial.

Current genetic evidence supports a dual model which suggested that the Japanese people are constituted mainly by an early settlement of human populations during the Upper Paleolithic period (i.e., Jomon people) followed by an admixture event with the people migrated from the Korean peninsula around 2300 year ago (i.e., Yayoi people).

However, the genetic origin(s) of the native Jomons remains unclear. Tracing the genomic signatures of admixture history can not only reveal the unknown human migration events but also provide critical information that can facilitate the genetic profiling of disease susceptibility, which is critical for the success of personalized medicine. Here, we analyzed a combined dataset of the whole genome SNP genotyping data from 2,277 individuals sampled globally across >100 populations for a total of 19,290 SNPs (after intersecting the two datasets). We performed principle component analysis to project individuals onto a series of orthogonal axes to reveal the genetic structure among diverse ethnic groups.

After separating the genetic components contributed from the populations representing the Yayoi, we identified several candidate populations that share common non-Yayoi ancestry with the modern Japanese people. Our results suggest that the genetic origins of Jomons may consist of multiple migration events from both Southeast and Northeast Asia. Surprisingly, we also identified an additional migration wave from the Hmong population.

We assigned local ancestry (LA) on the phased chromosomes of the mainland and Okinawa Japanese by performing RFmix (which used the identified candidate ancestral populations to infer the LA tracts in admixed chromosomes by finding the most likely sequence of ancestries through maximum a posterior estimation). Because an ancient population admixture would allow more recombination events to break LA tracks into shorter segments than a recent admixture event, our results of the LA track-length distributions differ significantly between the Yayoi, Hmong, and Jomon ancestries (in descending order), suggesting that the Hmong migration may have occurred before the Yayoi migration.

The Out of Africa Founding Population Size And Gender Imbalances

It takes a village with five men for every woman to settle the world (except Africa). Specifically, about 330 men and 65 women of reproductive age. So says another notable ASHG 2015 paper:

M. H. Quiver, et al., "Selective constraint and sex-biased demography of human populations from X chromosome-autosome comparisons"

Because the number of X chromosomes differs for men and women, comparisons between sex-linked and autosomal genetic loci reveal sex-biased patterns of human demography. Using 44 high-coverage whole genomes from a diverse global set of 11 human populations we quantified the strength of selective constraint on different chromosomes, found evidence of sex-biased colonization, and determined whether recent migrations are matrilocal or patrilocal. Relative amounts of genic and intergenic diversity were similar across all studied populations regardless of subsistence pattern or geography. The strength of selective constraint on genes was greater for X-linked loci compared to autosomal loci – a pattern that is consistent with selection against deleterious recessive alleles. The ratio of X chromosome to autosome diversity (Q) was greater than the null expectation of 0.75 for African populations and less than 0.75 for non-African populations, with lower values of Q for populations located farther from Africa.

This pattern is consistent with a male-biased serial founder effect model, and computer simulations suggest a plausible out-of-Africa bottleneck size of 320-340 males and 60-70 females.

Using PSMC, we found evidence of large historic population sizes for West African Pygmies, but not Hadza or Sandawe populations. Genetic distances revealed female-biased gene flow between Hadza and Sandawe hunter-gatherers, between Maasai pastoralists and African farmers, and between Chinese and Japanese populations. We found evidence of male-biased gene flow between African farmers and hunter-gatherers, and between different African farmer populations. This calls into question the idea that patrilocality is coupled with the emergence of agriculture.

Another paper from the conference using the same methods and researching the same issues has a less bold abstract, but appears to each essentially the same conclusions.

Arbiza and Keinan,"The relative effective population size of chromosome X and the autosomes along distinct branches of the human population tree."

In recent years, many studies have focused on the effective population size of chromosome X relative to the autosomes. This comparison can be useful to reveal past demographic processes, differences in the histories of males and females, and the action of natural selection. We have recently shown how the ratio of nucleotide diversity between the two (X-to-Autosome ratio; X/A), when compared between pairs of populations (relative X/A), can be used to uncover sex-biased processes in human history. While this strategy serves to alleviate the response of genetic diversity to the influence of events in a time range that largely predates the split of the studied populations, a different and more natural approach to capture recent changes occurring after populations split can be formulated based on the differentiation of allele frequencies between populations, as commonly summarized by the F ST statistic. Here, we consider population differentiation in humans, and extend beyond simple pairwise comparisons, using allele frequency differences across several populations to learn about the ratio of X-to-autosomal effective population size along distinct branches in the tree of human populations. We then test these for differences from the expectation of equal female-to-male breeding ratios, as well as differences between different branches. Using coalescent simulations of a variety of previously published human demographic models, we show that our approach is able to capture the ratio of interest and is more accurate than estimates based only on pairwise F ST across all pairs of populations. We then turn to the latest data from the 1000 Genomes Project, controlling for the effect of uncertainty associated with low coverage sequencing, as well as the influence of linked selection (background selection or hitchhiking), all of which differentially affect the X chromosome and the autosomes.

Estimating the X-to-autosomal effective population size ratio for branches leading to different 1000 Genomes populations, as well as for internal branches in the population tree, points to a higher female effective population size in African-specific population history, but not in non-Africans. More interestingly, we localize previously-debated observations to a significant increase in male effective population size on the branch leading to all non-African populations, suggesting male-biased processes associated to the Out-of-Africa event.

Yet another ASHG 2015 paper looks at similar issues with Y-DNA:

F. L. Mendez, et al., "Estimation of growth rates for populations and haplogroups using full Y chromosome sequences."

Evolutionary processes affecting a population influence gene genealogies across the genome. Coalescent theory provides the mathematical framework to connect realized genealogies to the underlying evolutionary processes. However, in most cases, information about the genealogies is obtained only indirectly through the observation of genetic variation. Therefore, in general, very limited information about any individual locus is available. As the longest non-recombining portion of the human genome, the Y chromosome accumulates mutations relatively quickly. When large amounts of sequence are used, the Y chromosome provides an unparalleled ability to resolve the structure and coalescence times of its genealogy. Because patterns of variation in the Y chromosome are only influenced by processes affecting men, they can be used to study both demographic and social phenomena. The 1000 Genomes Project includes whole Y-chromosome data from more than 1000 men and has an extensive representation of most lineages that have experienced recent massive expansions in size. Though the dynamics of population growth have likely changed over time, we are more interested in the growth rates at the times of these rapid expansions than on an average effect. To study this, we have developed a new method that takes advantage of the temporal resolution provided by Y-chromosome data and of historical data, while accounting for the uncertainties associated with the coalescent and mutational processes.

We estimate the growth rates for several branches of the Y-chromosome tree, including those in Europe, sub-Saharan Africa and South Asia. We estimate that several lineages within the European R1b, sub-Saharan African E1b, and South Asian R1a haplogroups experienced growth rates of at least 20-60% per generation at the onset of their massive expansions, some 3-5 thousand years ago. These high growth rates are comparable to those experienced by human populations during the 20th century. However, we find that most observed genealogies are unlikely to be the result of whole population expansion or of natural selection.

A fourth conference paper thinks that it sees two distinct Out of Africa waves (something previously suggested by the divergent geographical distributions of Y-DNA D and Y-DNA F and its descendants), but previously genetic studies have shown the timing of the waves to be nearly degenerate within the accuracy of the available genome based dating techniques. I'm skeptical that there methods can really be as definitive as they claim.

Metspalu, et al., "Demographic inferences from 447 complete human genome sequences from 148 populations worldwide."

Complete high coverage individual genome sequences carry the maximum amount of information for reconstructing the evolutionary past of a species in the interplay between random genetic drift and natural selection. Here we use a novel dataset of 447 human genomes sequenced at 40X on the same platform (Complete Genomics) and uniform bioinformatic pipelines. Based on SNP-chip data we generally chose three samples to represent each population of interest. We cover a wide range of mostly Eurasian populations with additional populations from Oceania, South America and Africa.Here we describe the dataset in terms data quality and new recovered genetic variation that originates predominantly from previously subsampled continental regions.

Using MSMC, D-statistics and Finestructure we have shown that peopling of the World from Africa is best explained by at least two migration waves (See Lawson et al abstract …). Here we expand on these conclusions by investigating short IBD segment sharing patterns using diCal, Hapfabia etc. We also disentangle split times involving the two migrations out of Africa (OoA), by running MSMC separately on genome chunks derived from OoA1 and OoA2. We also present detailed regional population histories in reconstructions of past dynamics of effective population size and population split times.

Roger Blench, et al, has a 2006 paper (pdf) that explores a similar two migrations out of Africa hypothesis. The abstract from that paper states:

Recent hypotheses on the early expansion of early modern humans out of Africa have emphasised the coastal route, crossing the Red Sea, following the coast of Arabia, India and eventually reaching insular SE Asia and Australia. Given the ca. 50,000 BP dates for these sites, a date of ca. 80,000 BP has been proposed for the preliminary move out of Africa. Indeed, this has been linked with the explosion of Mount Toba at this period. However, there is a striking lack of direct archaeological evidence for the greater part of this route; explanations for this lacuna are varied but none are wholly satisfactory.

Nonetheless, there seems to be an array of linguistic, cultural and genetic evidence that links together relic populations throughout this area. The paper proposes that the Malagasy Vazimba, the Sri Lankan Vedda, the Andamanese, perhaps the Shom Pen of the Nicobar Islands, the Negritos of SE Asia all provide evidence for this early expansion.

Recent proposals for features common to the languages of Africa and the Pacific will be considered in the light of this model.

More On mtDNA C

My children and many of my other relatives through marriage have mtDNA C5. So an ASHG 2015 abstract on mtDNA C naturally enough, attracted my attention:

A. Askapuli, et al., "Haplogroup C Phylogeny for Altaian Populations and its Implications for the Peopling of Siberia and the Americas."

Characterization of mitochondrial DNA at a genomic level is very important since it provides opportunities for more accurately estimating the timing and directionality of prehistoric human migrations from a maternal perspective. The Altai Mountains are located at the geographic center of the Eurasian landmass, and have been a hotspot of human activities since ancient times due to its geographic location and rich natural resources.

Aiming to contribute to a better understanding of the prehistoric human expansions in Siberia and subsequent colonization of the Americas, we sequenced and characterized eighteen whole mtDNA genomes belonging to haplogroup C from Altaian populations. The sequenced Altaian mtDNAs represent all four subgroups of haplogroup C (C1, C4, C5, and C7), and two of them belong to C1a, the Asian sister branch of Native American C1.

The Altaian whole mitochondrial sequences were analyzed together with 313 previously published haplogroup C sequences from different parts of the world. The analyses of whole mitochondrial genomes reveal that haplogroup C lineages in Siberia are distributed without any specific association with geography or language, and suggest northeastern Siberia as a place of origin for haplogroup C and its subbranches C1, C4, C5, and C7.

The analyses also indicate that Native American haplogroup C types are distantly related with their Siberian sister branches.

Given the distribution pattern of haplogroup C in Eurasia, the timing of expansions could be inferred from the age estimates of the lineages within haplogroup C. Age estimation of haplogroup C sequences in our data set via ρ statistics shows that haplogorup C has a TMRCA of 31.25 kyr (24.13-38.56), and its subbranches C1, C4, C5, and C7 have TMRCAs of 21.64 kyr (16.83-26.55), 24.88 kyr (16.65-33.41), 19.76 kyr (13.63-26.08), and 27.2 kyr (16.69-38.17), respectively.

Still, it is almost impossible to pinpoint geographic origin of Native Americans and directionality of prehistoric migrations in Siberia with certainty. Based on the results of the current study, the Amur region in northeastern Siberia could be the geographic origin for ancestral Native Americans. In order to obtain clearer picture of human population movements in Siberia and the Americas from a maternal perspective, more mitochondrial genomes need to be sequenced, especially mitochondrial genomes belonging to the relatively diverse haplogroups C and D.

Vasco-Nubian

Maju, who is in addition to being a public intellectual, a Basque person who is fluent in the Basque language, discusses the possibility that the hypothetical Vasconic language family of which Basque is a part and all other members are extinct, may be derived from the Nilo-Saharan Nubian languages of Africa, with substantial later contributions from Proto-Indo-European (as opposed, for example, to its later variants), i.e. a Vasco-Nubian hypothesis.

He suggests a Mesolithic (i.e. immediately pre-invention of farming) presence in the Levant of Nilo-Saharan language speakers, whose language becomes the language of the first farmers of Europe and then is bit by bit replaced by Indo-European languages at a later date.

He makes an effort at mass lexical comparison with a Swadesh list of words that shows a much stronger than random chance relationship. I've looked at the phonetics and grammar and it isn't too much of a stretch on that front at that time depth.

No one is declaring this conjecture to be "the truth" at this point, but the evidence is serious enough that it has leaped to the front of the line as a viable hypothesis compared to other existing hypothetical proposals. He has also used mass lexical comparison to pretty much definitively rule out several other hypothetical connections between Basque and other language families that have been proposed by credentialed linguists but don't deserve serious consideration.

Friday, September 18, 2015

Another Couple of Papers On Gypsy Origins

A new paper on the origins of the European Roma (aka Gypsies) quantifies the likelihood that particular subregions within South Asia are their place of origin based upon Y-DNA and mtDNA haplogroup frequencies. They migrated out of India sometime around 1000 CE and had reached the Balkans by the 1300s.

The bottom line

Y-DNA origins in North India are favored with a 65.6% probability of that subregion being the place of origin, and mtDNA origins in Northwest India are favored with a 71.3% probability of that region being the place of origin. Both of these probabilities are much greater than the runner up and are consistent with each other.

The nuance

East India is the runner up for mtDNA (19.7%); but, the probability of an East India origin for Roma Y-DNA is only 2.7%.

The runner up options for Y-DNA origins are Central India (19.0%), and South India (10.3%). The probability of an mtDNA origin in South India is just 6.1% (all of which is from Southeast India as opposed to Southwest India). There is no separate Central India estimate for the mtDNA data which are broken up into somewhat different regional bins.

Background and analysis

For the most part, this confirms prior genetic and linguistic research on the issue, which is quite substantial, although some prior research has suggested a migration of men from elsewhere in South Asia to Northwest India, followed by marriage of local women, and a second migration from there to Europe.

I've previously blogged research on this topic on May 12, 2012, January 11, 2011, and August 31, 2009. There is also a passing reference to mtDNA U6 in Roma people here.

This study doesn't rule out these more complex scenarios, but does suggest that a more parsimonious theory is sufficient to explain the uniparental genetic data.

The paper and its abstract

The Roma, also known as ‘Gypsies’, represent the largest and the most widespread ethnic minority of Europe. There is increasing evidence, based on linguistic, anthropological and genetic data, to suggest that they originated from the Indian subcontinent, with subsequent bottlenecks and undetermined gene flow from/to hosting populations during their diaspora. Further support comes from the presence of Indian uniparentally inherited lineages, such as mitochondrial DNA M and Y-chromosome H haplogroups, in a significant number of Roma individuals. However, the limited resolution of most genetic studies so far, together with the restriction of the samples used, have prevented the detection of other non-Indian founder lineages that might have been present in the proto-Roma population.

We performed a high-resolution study of the uniparental genomes of 753 Roma and 984 non-Roma hosting European individuals. Roma groups show lower genetic diversity and high heterogeneity compared with non-Roma samples as a result of lower effective population size and extensive drift, consistent with a series of bottlenecks during their diaspora. We found a set of founder lineages, present in the Roma and virtually absent in the non-Roma, for the maternal (H7, J1b3, J1c1, M18, M35b, M5a1, U3, and X2d) and paternal (I-P259, J-M92, and J-M67) genomes. This lineage classification allows us to identify extensive gene flow from non-Roma to Roma groups, whereas the opposite pattern, although not negligible, is substantially lower (up to 6.3%). Finally, the exact haplotype matching analysis of both uniparental lineages consistently points to a Northwestern origin of the proto-Roma population within the Indian subcontinent.

Martinez-Cruz, et al., "Origins, admixture and founder lineages in European Roma" European Journal of Human Genetics (16 September 2015) doi:10.1038/ejhg.2015.201

Y-DNA I-P259 is also known as I1c and as I-M507 and is currently considered a "private" haplogroup exclusive to one individual, surname or group of closely related individuals.

Y-DNA J-M67 is also known as J2a1b and J-M92 is also known as J2a1b1.

A paper at the ASHG 2015 Conference also addresses this subject and is less definitive in its conclusions: B. Melegh, et al., "Refining the South Asian origin of the Roma people."

Purpose: Historical and linguistic studies have suggested that Roma people, living mainly in Europe, migrated into the continent from South Asia about 1000-1500 years ago. Genetic studies, based on the examination of Y chromosome and mitochondrial DNA data, confirmed these findings. Recent genetic studies based on genome-wide Single Nucleotide Polymorphism (SNP) data further investigated the history of Roma and, among many other findings, suggested that the source of South Asian ancestry in Roma originates mainly form the Northwest region of India.

Methods: In this study, using also genome-wide SNP data, we attempted to refine these findings using significantly larger amount of European Roma samples. We also had the opportunity to use more data of distinct Indian ethnic groups, which provided us a higher resolution of the Indian population. The study uses several ancestry estimation methods based on the algorithmic method principal component analysis and model-based methods that apply Bayesian approach and uses Markov chain Monte Carlo or maximum likelihood estimation.

Results: According to our analyses, Roma showed significant common ancestry with Indian ethnic groups of Jammu and Kashmir, Punjab, Rajasthan, Gujarat, Uttarakhand states, e.g. with Kashmiri Pandit, Punjabi, Meghawal, Gujarati and Tharu. However, we found strong common ancestry with Pashtun and Sindhi, ethnic groups living in Pakistan. Populations of Northeast India have also strong common ancestry with Roma. These ethnic groups are Brahmin, Kshatriya, Vaish.

Conclusion: We can conclude, that Northwest India plays an important role in the South Asian ancestry of Roma, but they have similarly strong ancestry with some Pakistani ethnic groups and we can find populations in the east region of North India, which also could function as a source of Indian ancestry of Roma. However, ethnic groups of the southern region of India do not show strong relationship with Roma people, living in Europe.

Wednesday, September 16, 2015

No Surprises In LHC Strong Force Coupling Constant Measurements

Background

In the Standard Model, the strong force coupling constant and the six quark masses are the only experimentally measured parameters of QCD, and the value of the strong force coupling constant is known much less exactly than the electromagnetic coupling constant and the weak force coupling constant.

Like the other two Standard Model forces, the strength of the strong force coupling constant in the Standard Model is a function of the energy scale of the interactions in which it is measured. Its strength peaks in the general vicinity of the energy scale of the rest mass of the proton and gets weaker at lower and at higher energy scales according to an exactly known formula whose precise terms depend upon the conventions of the renormalization scheme used to define the quark masses, of which there are several versions in widespread use (which are equivalent once operationalized in an experimental setting). It is customary to quote the value of the strong force coupling constant at its value renormalized to what it would be at the energy scale of the Z boson mass (about 90.1 GeV/c^2).

The New Data

The CMS and ATLAS experiments at the LHC released data today on their measurements of the strong force coupling constant (alpha S) in LHC Run 1, both normalized to strength at the Z boson mass using the running of the strong force coupling constant with energy scale prescribed by the Standard Model, and with raw measurements at energy scales that for the first time exceed 1 TeV.

The New Alpha S Measurements

All of the strong force coupling constant strengths normalized to the Z boson mass are consistent at a one standard deviation level with the Particle Data Group world average is a dimensionless 0.1185(6), although the consistency of the new measurements with the old world average is partially due to the fact that the CMS and ATLAS measurements have much greater margins of errors than the world average.

All but one of the CMS and ATLAS measurements, however, were below the world average and one was identical to the world average with larger error bars, so the LHC data is going to pull down the world average to some value below 0.1185 in coming years, although how much is hard to tell. The Particle Data Group uses a weighted average proportionate to the margin of error in each independent measurement, so the high margin of error LHC measurements may not have all that much weight in the average.

New Measurements Of The Running of Alpha S

The running of the strong force coupling constant as measured by the LHC is also consistent with the Standard Model expectation, which constrains the parameter space of theories such as SUSY in which the running of the strong force coupling constant with higher energy scales differs significantly from the Standard Model expectation. This constraint isn't all that strict yet, however, for two reasons. First, because the margins of error in these measurements are great. Second, because these measurements are at energy scales only modestly above those of previous measurements, even though symbolically, breaking the 1 TeV energy scale barrier is a big deal.

How much precision would we need to get more than an interesting hint of SUSY at the LHC?

As I noted previously at this blog in January of 2014 (emphasis and material in the square brackets added):

The strong force coupling constant, which is 0.1184(7) at the Z boson mass, would be about 0.0969 at 730 GeV and about 0.0872 at 1460 GeV, in the Standard Model and the highest energies at which the strong force coupling constant could be measured at the LHC is probably in this vicinity.

In contrast, in the MSSM [i.e. the Minimal Supersymmetric Model], we would expect a strong force coupling constant of about 0.1024 at 730 GeV (about 5.7% stronger) and about 0.0952 at 1460 GeV (about 9% stronger).

Current individual measurements of the strong force coupling constant at energies of about 40 GeV and up (i.e. without global fitting or averaging over multiple experimental measurements at a variety of energy scales), have error bars of plus or minus 5% to 10% of the measured values. But, even a two sigma distinction between the SM prediction and SUSY prediction would require a measurement precision of about twice the percentage difference between the predicted strength under the two models, and a five sigma discovery confidence would require the measurement to be made with 1%-2% precision (with somewhat less precision being tolerable at higher energy scales).

Total uncertainty in the latest measurements is 3.5%-5.5% in one of the measurements, 4.7% in another, 6%-15% in a third (with uncertain in energy scale measurements dominating), 4%-6% in a fourth, and 2.4% in a fifth. These error bars, in general, are somewhat smaller than anticipated, but are still not small enough to definitively distinguish between the SM and SUSY predictions.

The world average measurement of the strong force coupling constant normalized to the Z boson mass has an uncertainty of less than 0.6%, but that requires aggregating many independent measurements. This isn't possible when trying to determine the running of the strong force coupling constant at energy scales that only the LHC can reach.

Hints So Far

At a bit over 1 TeV the MSSM expectation for the strong force coupling constant is about 7-8% strong than the Standard Model expectation. Yet, so far, the high energy measurements of the running of the strong force coupling constant have been weaker than the quite precisely measured world average value of this Standard Model parameter.

This is the opposite of what we would expect if SUSY were an accurate description of Nature.

While the statistical significance of the result isn't great, the LHC data so far favor the SM hypothesis relative to the SUSY hypothesis, although not yet to a statistically significant degree (realistically somewhat less than 2 sigma). Look elsewhere effects do not apply to the statistical significance of this result to make it even less significant, however, because this is basically a single combined measurement from both experiments and there is no comparable measurement anywhere else at the LHC.

The error bars would have to be about 2.5 times smaller to rule out the MSSM expectation and to greatly constrain SUSY parameter space. But, only some of the simplifications of SUSY theories present in the MSSM are relevant to determining the terms of the beta function that governs the running of the strong force coupling constant in SUSY theories. So, this constraint, when and if it is established, will have much broader applicability than many of the other model dependent constraints on SUSY parameter space determined in a model dependent way using the MSSM.

Prospects For LHC Run 2

Measurements of the running of the strong force coupling constant at even higher energies and with smaller margins of error will be one of the most important experimental results to watch at LHC Run 2 because it has the potential to discriminate between SUSY and SM predictions at a lower energy scale that the energy scale at which new particles would be discovered in SUSY theories with the same parameters. We should almost surely see anomalies in the running of the strong force coupling constant before we definitively discover new particles, because changes in the running of the strong force coupling constant should manifest at lower energy scales.

LHC Run 2 may provide less insight, however, into the absolute value of the strong force coupling constant because even quite small differences between the measured values of the strong force coupling constant at high energies translate into quite big differences in the value of the strong force coupling constant once they are normalized to the Z boson mass according to the Standard Model formula for the running of the strong force coupling constant. In other words, errors at high energy scales are magnified when measurements are converted to lower energy scale equivalents.

Unless the scientists at CERN can make some breakthroughs in reducing systemic error and in particular in reducing uncertainty in energy scale determinations in QCD events, however, this kind of data is unlikely by itself to produce a breakthrough. All it can provide are strong hints, and so far, those strong hints favor the Standard Model rather than beyond the Standard Model physics.

Tuesday, September 15, 2015

ASHG 2015 Abstracts Available

Razib has noted some of the most interesting conference paper abstracts from the ASHG 2015 conference.

Eurogenes has focused on the most interesting of the lot, describing 34 ancient NW Anatolian genomes from 6300 BCE which look just like Early European LBK and CP farmers. The paper is I. Lazaridis, et al. "Genome-wide data on 34 ancient Anatolians identifies the founding population of the European Neolithic." (2015). This strongly supports the current paradigm regarding the origins of the first wave Neolithic farmers of Europe developed based upon other data. I discussed this paper earlier at this blog when only its title and not its abstract was available in some depth.

But, there are dozens of other very noteworthy papers in the ASHG 2015 abstract collection that Razib has posted.

Eurogenes also has noted an interesting PNAS paper on the migrations of people from the seminal Kura-Araxes culture of the South Caucasus mountains which may have been one of the primary sources of the Bronze Age technological package of Europe, West Asia, the Levant and beyond.

I've was out of town on businesss Sunday and Monday, and will be out of town in court for a client on Wednesday, but I'll try to give these more attention when I have time.

Thursday, September 10, 2015

New Species of Genus Homo Announced

Overall, they are similar in size to shorter modern humans.

This species has teeth similar in size to modern humans, but much smaller brains.

The long awaited results of a major fossil find in South Africa are out. They discovered a new extinct species from the genus Homo, a genus that also includes modern humans. In classic evolutionary fashion, it is a transitional species with both modern and archaic features.

Homo naledi is a previously-unknown species of extinct hominin discovered within the Dinaledi Chamber of the Rising Star cave system, Cradle of Humankind, South Africa. This species is characterized by body mass and stature similar to small-bodied human populations but a small endocranial volume similar to australopiths. Cranial morphology of H. naledi is unique, but most similar to early Homo species including Homo erectus, Homo habilis or Homo rudolfensis. While primitive, the dentition is generally small and simple in occlusal morphology. H. naledi has humanlike manipulatory adaptations of the hand and wrist. It also exhibits a humanlike foot and lower limb. These humanlike aspects are contrasted in the postcrania with a more primitive or australopith-like trunk, shoulder, pelvis and proximal femur. Representing at least 15 individuals with most skeletal elements repeated multiple times, this is the largest assemblage of a single species of hominins yet discovered in Africa.

Lee R Berge, et al., Homo naledi, a new species of the genus Homo from the Dinaledi Chamber, South Africa (September 10, 2015). DOI: http://dx.doi.org/10.7554/eLife.09560.001

National Geographic which provided some of the funding for the expedition tells the story in a way directed at a more general audience.

About National Geographic

Incidentally, in an unfortunate development, National Geographic which has since the year 1888 been the flagship publication of a leading non-profit publisher of geographic inquiry, was converted yesterday into a for profit venture owned 73% by 21st Century Fox, part of Rupert Murdoch's media empire which also includes conservative Fox News outlet, in exchange for $725 million. It is not obvious that conservative political activist Rupert Murdoch, who also owns one of the most anti-science and anti-geography news sources on the planet, will be a good steward of the fabled and highly credible National Geographic brand that was developed while it was a science first non-profit.

The economics driving the deal, however, are obvious. Circulation of National Geographic which reached 12 million in the 1980s, has fallen to 3.5 million in the U.S. and 3 million in non-English language editions, and advertising revenues have plunged.

The partnership was in some ways obvious. Fox has run the National Geographic cable channels since 1997, which together with some smaller affiliated TV channels produce $400 million a year of profit for the joint venture (it isn't clear what part of that take goes to the National Geographic Society whose endowment now exceeds $1 billion as a result of this deal). But, National Geographic journalists dread the future under the new management and aren't pleased with what some of the content on the TV channels suggests about the future of the magazine. Imminent cost cutting initiatives under the new venture also have magazine employees and freelancers worried.

Tuesday, September 8, 2015

New Northern Iberian Ancient DNA Doesn't Shake Basque Bell Beaker Hypothesis

On November 4, 2011, in a post at this blog, I laid out some thoughts about the origins of the Basque people in Europe that I've developed somewhat over the years, but still believe is the most plausible narrative to explain the facts. A newly released, open access PNAS paper with eight ancient DNA samples from the Copper Age and early Bronze Age in what is now Basque County may tweak this hypothesis, but doesn't seriously overcome it. (See also the Supplemental Materials).

In my view, the Bell Beaker culture was a linguistically Vasconic source of Y-DNA R1b in Europe. The archaeology reported in the new PNAS paper is consistent with the hypothesis that these people (including two males who were not Y-DNA R1b) are pre-Basque people who lived where the Basques live now, not actual culturally Basque individuals. The introduction of the PNAS paper explains the archaeological context (citations and references omitted):

We investigated the remains of eight individuals from the Chalcolithic and Bronze Age periods excavated from the cave of the El Portalón de Cueva Mayor, of the Sierra de Atapuerca—a site with a remarkably rich archaeological record, with human occupation from the Paleolithic to the historical period. The human remains were associated with offerings, such as domestic animals and pottery vessels corresponding to the pre-Bell Beaker culture, and were directly radiocarbon-dated to between ∼5,500 (Chalcolithic) and ∼3,500 cal yBP (Bronze Age). Seven of the burials contained fragmentary human remains whereas one burial was a near-complete skeleton of a male child showing signs of chronic malnutrition.

The harder question, however, is the strong autosomal similarity between these people and modern Basques, who have somewhat more European hunter-gatherer component, and somewhat less Early European farmer component than these individuals, but only modestly so, and don't seem to have any ancestral components not found in some proportion in Early European farmers.

How could the Y-DNA landscape of Western Europe change so dramatically, while making such a negligible change to the autosomal and mtDNA mix?

The Basque ethnicity, in my Bell Beaker hypothesis, probably had their ethnogenesis around 2900 BCE in Southern Portugal, with a migrant population drawn to the area's rich resources of copper and tin from someplace far to the East of Iberia (probably ultimately from the general vicinity of the Southern Caucasus Mountains give or take 200 km or so), arriving either by land or by sea. Either initially, or as they expanded, they then married a succession of local women, some descendants of first wave Cardial Pottery farmers, some descendants of European hunter-gatherers, in an ever expanding pool of local women as the men expanded their horizons to the frontier of Atlantic and Western Europe. Socially, sons of Bell Beaker men were favored and assumed the reins of local megalithic farming communities, while daughters of Bell Beaker men were absorbed into the general gene pool.

In my view, the modern day Basques were relatively late on the scene in modern Basque country (ca. 2500 BCE) and probably arrived from France, but unlike follow members of their linguistic community in Western Europe, Central Europe, and Northern Europe were not overrun by Indo-Europeans around the time of Bronze Age collapse (starting ca. 1300 BCE), for a variety of reasons.

By the time that the Indo-Europeans arrived, 50 generations or so of introgression of local women into Bell Beaker patrilineages diluted the Bell Beaker autosomal contribution and mtDNA contribution to an imperceptible level, while causing Y-DNA R1b which was predominant (but probably not universal) in the founding population of Bell Beaker men, to become the most common Y-DNA type in Europe. And, 14 generations or so would have passed between Basque ethnogenesis in Southern Portugal and their arrival in modern Basque Country.

The cultural and Y-DNA descendants of the Bell Beaker people who were the most important contributors to Vasconic ethnogenesis in Portugal ca. 2900 BCE, by the time that reached modern day Basque country, were so genetically diluted with contributions from local women. As a result, they were autosomally similar to other Iberians, except that they had less North African influence because more of their ancestors were from further North in Western Europe where there was no possibility for trace North African influences to penetrate the gene pool.

There is some hope that we may before too much longer have more data to explain the contradictions with evidence from the same site. The Supplemental Materials note that:

43 additional human bone fragments have been recovered in Middle Bronze Age levels at the space known as the Salón del Coro or Galería Principal, which is also part of the El Portalón site itself

The Supplemental Materials have this to say regarding historical linguistics:

The Neolithic cultures appear in the Iberian Peninsula around 7,500 cal BP as a result of dispersal of human groups along the Mediterranean coastal areas and (eventually) visible as the Cardial culture in the western part of the Mediterranean. According to [104] and others, the dispersal of the Neolithic communities was related to the spread of Indo-European languages to Europe. However, this model (the Anatolian Hypothesis) coexists with a number of competing models. In particular, other models have placed the origin of Indo-European languages (or Proto-IndoEuropean) in the East, North of the Caspian and Black Seas, and in a chronologically younger period (often termed the Steppe Hypothesis). The recently confirmed eastern migration of human groups (linked to the Yamnaya group) into Europe around 4500 cal BP has been interpreted as evidence for the Steppe Hypothesis, but the population movement is also consistent with a secondary expansion under the Anatolian Hypothesis.

The linguistic implications of this are still under debate. There are a number of different linguistic scenarios consistent with these genetic results, although as always it is difficult to associate genetic information confidently with archaeological groups or language families. The Basque language (Euskara) is a linguistic isolate, and is believed to be the last surviving pre-Indo-European language in Western Europe. The only known precursor to Basque is Aquitanian, reported during the Roman empire and spoken in southwest Gaul, the Pyrenees, and some adjoining areas. This language is clearly related to Basque, but is probably a relative rather than the direct ancestor. Basque has always been a magnet for extravagant linguistic speculation, but the hypothesis of Paleolithic roots of Basque is most wide accepted on the groups on the grounds of explanatory parsimony, and in the absence of adequate evidence for other hypotheses.

One intriguing suggestion is that the Basque language exhibits similarities to the pre-Roman language of Sardinia (Paleosardo) based on, for example, place-names on Sardinia. The number of linguistic forms is small, but this is particularly interesting given Sardinians and Basques are the two modern populations with the highest genetic proportion of early farmer ancestry. Contacts between Iberia and Sardinia in the Neolithic are indicated by recent studies of Obsidian artifacts, facilitated by maritime (and coastal) movement. This suggests the Basque might be the remnant of a much larger Vasconic speaking area, suggesting a the possibility that language family spread along with the first farmers. If so it would be tempting to suppose that it was the only language of the first farmers, which would support the Steppe Hypothesis of Indo-European origins over the Anatolian Hypothesis.

Language isolates are however not uncommon outside of Europe. Of the approximately 350 language families in the world, 121 of are isolates. The existence of such isolates is not really surprising given the highly skewed of linguistic diversity, and isolates are sporadically encountered embedded within the ranges of most large language families worldwide. Within the Indo-European languages, the Greek, Armenian and Albanian subgroups are also (near) isolates, consistent with the prediction of the Anatolian Hypothesis that the center of linguistic diversity in Europe would coincide with the entry points of the first farmers (and not contradicted by [68]). It is not implausible that Basque is an indigenous language that expanded in place after adoption of agriculture, or that Basque entered Europe alongside these other Indo-European languages. There is some hope that advances in Proto-Basque reconstruction will shed light onto these issues. Proposals of linguistic similarities between the Basque and other languages must however be evaluated with caution.

For what it is worth, the Anatolian Hypothesis isn't remotely credible given the facts as we know them today. And, it is almost certain that Basque is the remnant of a much larger Vasconic speaking area at some point in the past.

In my view, the real hard question is not between the Steppe Hypothesis and Anatolian Hypothesis of Indo-European language origins, which is largely resolved, or between a Paleolithic and post-Paleolithic origin for the Basque languages.

The hard question, one which the Supplemental Materials don't quite seem to grasp, is whether Basque is a language associated with the Early European Farmers of Europe, or the period of time starting with the Bell Beaker culture's appearance and ending with Bronze Age collapse. Neither position, by the way, is inconsistent with an apparent link between Paleosardo and Basque as Sardinia also had a Bell Beaker period during which there was maritime trade between the island and the nearby mainland. Literate history post-dates the beginning of the Bell Beaker period in Sardinia, and all oral historical and naming conventions had to pass through the Bell Beaker period in Sardinia from the early Neolithic to reach the present.

Secondary Hunter-Gatherer Admixture

The increase in European hunter-gatherer proportion relative to the first wave of farmers (whose hunter-gatherer component shows more affinity to Hungary than to Western Europe), as these individuals indicate, may have come from new infusions of local hunter-gatherers into the gene pool during the collapse of the first wave of farming.

This new data strongly favors the idea that the first wave of European farmers were formed somewhere to the Southeast of Hungary and probably either in Anatolia or just beyond it, and fused at about a 60-40 ratio with local hunter-gatherers (with the farmer contribution disproportionately from men and the hunter-gatherer contribution disproportionately from women).

Support for a basically common first wave of hunter-gather admixture into European farmer populations before they went on to have additional local hunter-gatherer admixture is found in the following excerpt from the Supplemental Materials (citations and references omitted):

In this case we used – as others before - KO1 (individual found in a farming context) as a proxy for Hungarian hunter-gatherers since he grouped with Mesolithic individuals in all other analyses. The highest proportion of Mesolithic ancestry in the Portalón individuals seems to be related to central European hunter-gatherers (KO1, Loschbour) and not to the geographically close LaBrana (several |Z|>2).

Central European farmers (CO1, Iceman, NE1) exclude only Mesolithic Scandinavians (Motala12) as a possible source so it seems likely that their admixture happened in Central Europe as well.

Surprisingly, Mesolithic Scandinavians (Motala12) are excluded as a possible source of admixture into the Scandinavian farmer Gok2 whereas all other hunter gatherer groups (including the Neolithic Scandinavian Ajv58) are consistent with the data. This suggests multiple admixture events into Scandinavian farmers which happened in different parts of Europe.

However, we note that the currently available data does not allow us to detect a strong population structure in Mesolithic Europe. Only Scandinavia seems to be an outlier from a relatively uniform Mesolithic population.

After this Early European Farmer ethnogenesis, these Early European Farmers probably migrated across Europe in a mostly endogamous basis until the first wave of farming collapsed at various times in various places, leading to a partial reversion to hunting and gathering and substantial introgression of hunter-gather populations into their communities (probably 15%-35% of total ancestry, with higher percentages on the frontiers and lower percentages in Central Europe), until this ceases when farming recovered in the Bronze Age. The model done in the new study suggests a 30% introgression of European hunter-gatherers similar to the ancestor of two hunter-gatherers from whom we have ancient DNA, one from Iberia and one from Luxembourg. Otzi the Iceman appears to have a 19% introgression from a Scandinavian hunter-gatherer, while a Scandinavian first wave farmer appears to have a 33% introgression from a Scandinavian hunter-gatherer.

The closest extant populations to hunter–gatherers from Iberia, Scandinavia, and Central Europe are Northern Europeans; however, the hunter–gatherers fell outside the range of modern-day European genetic variation. In contrast, early farmers from Iberia, Scandinavia, and Central Europe grouped with modern-day Southern Europeans, consistent with outgroup f3 statistics. These results demonstrate that early European farmers, including those in Iberia, emerged from a common group of people. This observation indicates that farming was brought to Iberia via migration, similar to the process in Scandinavia and Central Europe.

Chalcolithic farmers (Iberian ATP2 and the Tyrolean Iceman) and Scandinavian Neolithic farmers (Gok2) traced a substantial amount of their genetic ancestry to European HG groups, in contrast to the earliest farmers of Central Europe (NE1 and Stuttgart), and this increase in HG admixture across Europe was significant as a function of time (R^2 = 0.69, P = 0.001). The best fitting source for the HG admixture into the El Portalón individuals was the common ancestor of the nearby La Braña Mesolithic individual and a Mesolithic individual from Luxembourg (Loschbour) whereas contemporary farmers from Central Europe (Iceman) and Scandinavia received their (best-fit) HG admixture from Scandinavian hunter–gatherers.

These inferred admixture events demonstrate that different farmer populations had different HG groups as the best proxy for the source of admixture (D-tests showed similar results of multiple admixture events in different parts of Europe). These analyses showed that, whereas early farmers—who were likely more numerous than the hunter–gatherers —spread across Europe, they assimilated HG populations, a process that continued locally for several millennia.

This discussion continues in the Supplemental Materials:

Chalcolithic and Scandinavian Neolithic farmers (ATP2, Gok2, Iceman, which are all dated to approx. 5000 BP) seem to harbor a higher proportion of hunter-gather related ancestry than the first Neolithic farmers of central and eastern Europe (Stuttgart, NE1).

This is, of course, perfectly consistent with the hypothesis that substantial new, local hunter-gatherer introgresion into farming populations occurs after the universally experienced first collapse of farming among first wave European farmers, although this doesn't happen at exactly the same time in each place.

The New Data Points

All eight of the ancient DNA samples produced mtDNA haplogroups, but only four produced autosomal genetic profiles and only two of those were men. The other two men had only 3% genome coverage so Y-DNA haplotyping was not possible. Three of the autosomal samples were Copper Age, while one was a woman from the early Bronze Age.

The Y-DNA Data

Of the two men for whom Y-DNA profiles could be determined one had Y-DNA I2a2a (dated to about 2,960-2829 BCE), and the other had Y-DNA H2 (dated to about 2,849-2,628 BCE).

Y-DNA I2 appears to have been predominant among European hunter-gatherers, but is not terribly uncommon in early European farmers. The Supplemental Materials have this to say about this individual's Y-DNA after discussing the specific genetic markers used to classify this Y-DNA sample (citations and references omitted):

While the almost European-specific haplogroup I arose approximately 20000 to 25000 years ago, haplogroup I2a2a may have diverged as a subclade, around 15000 years ago, possibly during the recolonization of Europe following the Last Glacial Maximum (LGM). Unlike the more common subclades of I1 and I2a1, haplogroup I2a2a appears at relatively low frequencies across much of Europe. Its highest levels (10-12%) are found in modern-day Germany and the Netherlands, with frequencies of around 5%, notably occurring in parts of modern-day France as well as Mordvin in the Volga region of central Eastern Europe. Other members of haplogroup I have been discovered previously in ancient individuals; e.g. I* in Mesolithic Scandinavians, I1 in Hungary, I2a in Neolithic individuals from Hungary and France, I2a1 in Neolithic Croatia and a late hunter-gatherer from Sweden, and I2a1b in Mesolithic individuals from Luxembourg and Sweden.

Other haplogroups found among ancient specimens include C* in Upper Paleolithic Russia and Mesolithic Spain, C6 in Neolithic Hungary, E1b1b1 in Neolithic Spain, F* in Neolithic Germany and Neolithic Hungary, G2 in Neolithic Hungary, G2a in Neolithic France, Neolithic Germany, Neolithic Hungary, Chalcolithic Italy, and Neolithic Spain, J2a1 in Bronze Age Hungary, K (xLT) in Upper Paleolithic western Siberia, N in Iron Age Hungary, R* in south central Upper Paleolithic Siberia, R1a in Neolithic Germany, and Neolithic R1b in Germany.

The citation for the "Neolithic R1b in Germany" is Lee EJ, Makarewicz C, Renneberg R, Harder M, Krause-Kyora B, Müller S, et al. Emerging genetic patterns of the european neolithic: Perspectives from a late neolithic bell beaker burial site in Germany. American Journal of Physical Anthropology. 2012;148: 571–579. doi:10.1002/ajpa.22074 The Bell Beaker context makes clear that it wasn't really Neolithic as opposed to Copper Age or Bronze Age.

The H2, as discussed further below, is rare but has a generally West Eurasian non-hunter-gatherer distribution.

The big story there is that Y-DNA R1b is absent from these two individuals despite the fact that modern Basque people have one of the highest percentages of R1b in Europe.

The mtDNA Data

On the mtDNA side:

The eight individuals, genetically inferred to be four males and four females, carried mtDNA haplogroups associated with early farmers of Europe (e.g., haplogroups K, J, and X), with hunter–gatherers (e.g., haplogroup U5), or with both groups (e.g., haplogroup H).

The Supplemental Materials have a nice further analysis of the mtDNA findings in the context of the larger literature (citations and references omitted):

All eight individuals from Atapuerca displayed unique haplotypes. The most abundant haplogroup, U5, was found in three temporally non-overlapping individuals. Two belonged to subtypes of U5b (U5b3 and U5b1b) and one belonged to U5a (U5a1c).... Two individuals belonged to H3. They were dated to within the same time-frame but were not maternally related as one of them carried a T to C transition at np 12957 classifying it to H3c. The remaining three individuals belonged to the haplogroups J, K and X (J1c1b1, K1a2b and X2c).

The mitochondrial lineages of the ATP individuals show a heterogeneous ancestry and can be traced back both to hunter-gatherer (HG) and subsequent farmer contexts. The most frequent haplogroup in ATP, U5, is commonly found in HG groups in Iberia and across Europe and Scandinavia. U5 subhaplogroups are also found in Neolithic farmer populations in Europe although at lower frequencies. The remaining four haplogroups found in ATP, H, J, K and X, are present in other farmer populations from the Neolithic and onwards. In southern Europe (e.g. Spain, Portugal and Italy), however, haplogroup H is also frequent in Paleolithic and Mesolithic HG populations.

Even though some haplogroups (U5b and H) are shared between ATP and HGs from Mesolithic Iberia (southern hunter-gatherers SHG), the general haplogroup composition between the groups differ, similar to the differences between other farmer and HG populations in Europe. None of the previously investigated Neolithic farmer populations from Iberia have similar haplogroup distribution as ATP. These farmer groups also differ from each other. Analysis of haplogroup frequency data have for example shown that early Neolithic north-eastern Iberian populations cluster with early- and middle Neolithic populations from central Europe while other Neolithic Iberian populations (from Basque Country and Navarre,NBQ and Portugal, NPO) share a closer affinity to HG populations. NBQ is the population that share the largest number of haplogroups with ATP (X, H, J, U5b and K although the frequencies differ and NBQ also display additional haplogroups (U, T, HV and I). The Chalcolithic individuals from El Mirador (MIR), a cave located in the same mountain system as ATP (Sierra de Atapuerca), present a somewhat different haplogroup distribution than ATP. MIR clusters with early Neolithic Iberians and early and middle Neolithic central European populations. They lack the U5 subhaplogroups found in ATP and instead display T2 and U3. RFLP data from another Chalcolithic population from the Basque Country show the same main haplogroups as found in ATP and MIR (38% H, 17% U, 13% J, 21% K and 9% T+X), although the lower resolution of the data cannot specify which population (ATP or MIR) that it is most similar to. It has further been suggested that the mt-haplogroup composition of Basque populations differs between Chalcolithic and historical times (600-700 AD) with increasing frequencies of H and V haplotypes and with increasing similarities to present-day western European populations.

The picture of the ancient farmers in Iberia remains unresolved and the limited level of information retrieved from mitochondrial DNA has not been able to go beyond the above described observations. Present-day European populations are genetically quite homogenous in terms of mitochondrial haplogroup distributions and it is mainly haplogroup frequency differences that separate different populations. It is therefore not a straightforward process to assess the potential connections between ATP and specific present-day populations. We note that the most abundant lineages in the ATP individuals are found in higher frequencies in some Basque-speaking populations (U5b and H3) than in other European populations. Further, several haplotypes have been suggested to be autochthonous to present-day Basque populations. Two of these are J1c1, a lineage ancestral to the J2c1b1 haplotype in ATP7, and H3c2a, a lineage that derives from the H3 and H3c haplotypes found in ATP17 and ATP12-1420.

The whole genome data

In autosomal genetics, these four individuals cluster together in both PCA and a ten population admixture analysis.

Like modern Basque persons, they have essentially no North African component and no Caucasian/Central Asian component. Other other modern Spanish populations have some trace North African component (perhaps 1-3%) (the time depth of this component isn't entirely clear and much of it could be from as late as the Moorish era in Spain). Modern French populations (except some people from Southern France) and modern Spanish populations (apart from the Basque) also all have small Causasian/Central Asian components (perhaps 2%-10%) (probably due to Indo-European migration into the region in the Iron Age).

These individuals, like modern Basque persons and Sardinians and ancient DNA from other first wave early European farmers are a mix of two admixture components: European hunter-gatherer (which is pretty much the sole component of ancient DNA from European hunter-gatherers) and Early European Farmer. Sardinians and first wave European farmers have the highest percentage of Early European farmer (more than half). Basque people have similar amounts of Early European farmer to other Spanish and French people. These four individuals have an intermediate amount of Early European farmer, suggesting hunter-gatherer introgression beyond the portion that went into the ethnogenesis of the LBK and Cardial Pottery farmers.

The PCA chart in the paper is a bit of a puzzle. PC1 clearly represents a hunter-gatherer to early European farmer proportion continuum. But, PC2 is harder to make sense of. Most European population are at roughly the same spot on PC2 as the Sardinians and the ancient DNA from first wave farmers across Europe.

But, European hunter-gatherers and the Basque are significantly to the left and many Spanish people lean in that direction on PC2, while Cyprus and Malta are strongly to the right on PC2. It might be some Middle Eastern contribution which was present in modest amounts in early European farmers, is present at higher levels in island populations near the Middle East, and is absent in European hunter-gatherers.

But, if the Basque simply diluted this "Middle Eastern" PC2 contribution with European hunter-gatherer contributions, then they would be much higher on PC1 than they are in fact. This suggests some sort of "anti-Middle Eastern" farmer contribution in Basque and Spanish farmers that counteracts the pull of the Middle Eastern tendency of the early European farmer contribution, perhaps one too much like the early European farmer contribution to be distinct from it in a K=10 ancestry analysis, but which might pop out at a higher number of ancestral populations.

The Supplemental Materials have this to say about additional PCA Analysis done that was not discussed in the main paper (citations and references omitted):

We repeated this analysis including North African populations in order to look for any additional component of the Iberian farmers to modern North Africans. PC1 separates North Africans from Europeans while PC2 seems to be correlated with the amount of Near Eastern ancestry. All ancient samples line up along this gradient with one extreme in Druze and the other in Mesolithic Europeans. Farmers from El Portalon and Sweden are slightly shifted towards hunter-gatherers in comparison to central European farmers. The PCA suggests no additional North African ancestry in any of the ancient farmers.

An additional PCA including modern populations from the Caucasus was conducted since ADMIXTURE results suggest some Eastern ancestry in some samples. PC1 correlates largely with Near Eastern ancestry with Druze and Mesolithic Europeans as the two extremes. PC2 has Sardinians and Tajiks as extremes suggesting some correlation with longitude. Ancient farmers group around Sardinians and the Chalcolithic El Portalon individuals form a line between Sardinians and Basques whereas central European farmers are shifted towards Near Eastern populations. There is no specific affinity to modern-day Caucasian populations for any of the ancient individuals.

Another discussion of potential North African genetic tries from the Supplemental Materials is here (citations and references omitted):

The geographic proximity of Iberia to Northern Africa opens up possibilities to migrations across the Strait of Gibraltar. In fact, farming reached Northern Africa and Southern Spain long before Northern Iberia, and modern Iberian populations show a significant proportion of North African ancestry. Admixture estimates and outgroup f3 statistics do not support a strong contribution of North African populations to the individuals of El Portalón.

Modern-day North Africans are highly admixed with contributions from Europe, sub-Saharan Africa, the Near East and Neandertals, and the level of admixture vary among groups. In order to avoid other components in reference populations from confounding the D-tests, we assume that all early European farmers contain the same Near Eastern component (which is also found in North Africa to some degree) and conduct D-test in the form of (Mbuti, modernday North African; ancient farmer 1, ancient farmer 2).

We use Mozabite, Saharawi, Algerian, Tunisian and Burbur as representatives of modern-day North African populations since the a particular ancestry component (the ‘North African component’) is maximized in these groups in the admixture analysis.

These analyses demonstrate that ATP2 and ATP12-1420 have similar genetic affinities to North Africans as Central European early farmers have.

However, ATP16 shows higher affinities to North Africa than other ancient farmers, suggesting that there was at least some contribution from North Africa ~5,000 years ago (in one out of eight Portalón individuals).

Surprisingly, ATP9 shows the lowest North African affinity of all ancient farmers. Since ATP9 also represents the youngest individual (Bronze Age) in the analysis, we suspect that this is the result of increased admixture with other European groups in the Bronze age, which contained less North African or Near Eastern ancestry. Generally, genomic data from Neolithic North Africans is needed to solve the question whether there was a strong Neolithic African contribution to the Iberian Neolithic population.

Phenotype and Inbreeding Data

The Supplementary Materials also discuss from phenotype conclusions that can be drawn from the genes of these individuals and about inbreeding in the entire sample of ancient DNA (citations omitted):

[T]he inhabitants of the El Portalon cave were probably all lactose intolerant in adulthood. This suggests a much later spread of this variant that has been the target of adaptation to a milk-rich diet in modern-day northwestern Europeans which also occurs at reasonably high frequencies in Northern Spain and Basques.

All sequence for the SLC24A5 (rs1426654) variant showed the derived state in the El Portalon individuals, which together with two derived variants at SLC45A2 (rs16891982), suggest that the pigmentation of the Chalcolithic Iberians was lighter than the Mesolithic LaBrana1 individual who carried the ancestral states at these major pigmentation loci. rs1805007 in MC1R which is associated with red hair and light skin is ancestral in all but one sequence (out of eleven in all El Portalon individuals) but that single derived base call might also be due to post-mortem damage. rs12913832, a SNP that explains more than 56% of the variation between blue and brown eyes, has been shown to be derived in the Mesolithic LaBrana1. Two of the El Portalon individuals show only ancestral alleles at this site whereas one individual shows both variants suggesting the individual is heterozygot at the site. These observations suggest some eye color variation but also a tendency towards brown eyes in the Chalcolithic Iberians.

To summarize, Chalcolithic Iberian farmers seem to be lactose intolerant as the Mesolithic inhabitants of the Peninsula. However, their pigmentation was fairer and their eyes were darker than in the hunter-gatherer LaBrana1. . . .

Diversity was estimated for all sites or cultures with two reasonably contemporary individuals and decent coverage: sites Ajvide (using Ajv58 and Ajv70), Motala (Motala12 and Motala1), Gökhem (Gok2 and Gok4), El Portalón (ATP2 and ATP12-1420) and the culture Alföld Linear Pottery (ALP; NE1 and NE5). This procedure was chosen to avoid the effects of potential inbreeding. The Scandinavian hunter-gatherers show the lowest diversity of all groups whereas the Scandinavian farmers from Gökhem are intermediate between those and the central European and Iberian farmers. Generally, farmers show a higher diversity than hunter-gatherers which is consistent with previous results and might be attributed to the increased carrying capacity of farming groups and/or the admixture with hunter-gatherers.

Boundaries In Time on Basque Origins

What we know about the genetics makes it increasingly unlikely that the Basque culture and language emerged from European hunter-gatherers of Western Europe (particularly the Franco-Cantabrian refugium). These autochronous people may have had a substrate influence and may have made genetic contributions, particular maternally (although mtDNA H now looks like a likely Mesolithic contribution to Iberia that expanded with the Bell Beaker/Vasconic surge or with the early megalithic expansion).

But, the earliest plausible time that the Basque culture and the language could have emerged is from the first wave of European farmers, who reached Iberia relatively late (ca. 5,500 BCE).

The first farmers of Europe were apparently very similar to each other genetically all across Europe. We only have a couple of data points of first farmer Y-DNA in Western Europe, but not one of them is Y-DNA R1b, and the same is true of the first farmer data points we have from elsewhere in Europe, which are more numerous.

We know that the Basque culture and language had already come into being and was losing ground to the Indo-Europeans (first the Urnfield culture, ca. 1,300 BCE, and then the Celts and then the Romans), by the time of Bronze Age collapse and the Iron Age.

This is roughly a 4,200 year window.

Something happened to turn Western Europe which had almost no Y-DNA R1b, when the first farmers arrived, into a place where Y-DNA R1b was the dominant Y-DNA type of men in the region.

Yet, the fact that Basque men have one of the highest Y-DNA R1b percentages in Europe, rather than one of the lowest ones, almost surely implies that Indo-Europeans were not the source of Y-DNA R1b in Europe the way that Indo-Europeans were almost surely the source of Y-DNA R1a in Central and Eastern Europe (as we now know from ample direct ancient DNA evidence).

Likewise the high rate of lactose persistence in Basque persons, which this most recent paper strongly indicates arrived with the Bell Beaker culture or later, similarly can't have had Indo-European origins and had to have had fairly recent origins. The LP dynamics are different than the R1b, which is merely ancestry informative, because LP clearly conferred some strong selective fitness advantage, even if the exact mechanism by which this happened isn't entirely clear. So, a very low frequency introgression into the population combined with strong selective fitness effects could produce a dramatic change in the frequency of this genotype without having to hypothesize a major demographic event. But, in all likelihood, given the timing involved, R1b and the LP gene probably entered the population of Northwest Spain at about the same time. (It would be interesting to know the RH negative or positive blood type of these individuals as the high rate of RH negative individuals is another distinctive aspect of the modern Basque gene pool and this might shed additional light on the extent to which they are ancestral to modern Basque people.)

The fact that the range of the Bell Beaker culture matches the Vasconic linguistic substrate in Europe and the approximately range where Y-DNA R1b is found in Europe now, and that I can't find any other plausible sources of this change, makes me conclude that the Bell Beaker culture was Vasconic linguistically and was the source of the transformation of Western Europe's Y-DNA mix and pinpoints the transition very precisely in time.

I just can't find any other explanation that can explain these singular data points. While it could certainly be a coincidence, particularly taking this find in isolation, that these new four autosomal ancient genomes, two from men, don't contain Y-DNA R1b and are pre-Bell Beaker, I don't think that it is a coincidence. I think it is much more likely that these individuals are some of the last people living in what is now Basque County who were not Basque and instead were pure first farmer descendants with an extra infusion of local hunter-gatherer ancestry picked up in the wake of the collapse of the first wave of farming that produced sad situations like the skeleton of the little boy who died of starvation in this most recent find.

A few centuries later or even contemporaneously a few valleys over, Vasconic people from France would arrive and put in place their culture in the only place where it would ultimately survive. The extent to which this happened through the cultural influence of the thin Vasconic ruling class, and the extent to which this happened through population replacement, is hard to know, especially since the incoming Vasconic people, and the resident first wave farmer people who already lived there, may not have been that different from each other genetically at least in autosomal and mtDNA population genetics.

The need to explain the modern high proportion of Y-DNA R1b and what is increasingly clear was the near total absence of Y-DNA R1b in either European hunter-gatherers, or in the first wave of European farmers, compels some sort of solution, and makes otherwise less plausible narratives seem like the only possible explanations for the facts.

What About The Man From El Trocs Cave?

This said, there is one chink in this argument. In a 2015 paper by Haak et al., that reports 69 ancient DNA results from Europe, there is a reference to a man from Neolithic Spain ca. 5100 BCE whose body was found in the El Trocs cave in the Pyrenees Mountains in Northern Aragon whose Y-DNA haplogroup was found to be R1b1* ancestral to all extant forms of Y-DNA R1b (both V-88 from Africa and the Eurasian haplogroups; he is not R1b-V88 as has been frequently misreported). This long predates Bell Beaker and coincides with the very early Neolithic era in the region, and is also not far from modern Basque country. Unfortunately, since so much ancient DNA was dumped on the world in this one paper all at once, there is essentially no detailed analysis of the context of theses El Trocs remains, even though they have the potential to be paradigm shaping.

My inclination is to think that this instance is a fluke outlier individual whose ancestors joined the wave of the expanding Neolithic revolution but ultimately left no modern living descendants (perhaps they died off in the bust that followed the first wave Neolithic), because the phylogeny of R1b and its distribution around Europe are not a good fit to this man or his kin being the an important source of Y-DNA R1b in Europe - it isn't a good fit for the pattern of Y-DNA R1b haplotype diversity, for example, or the apparent path from the Steppe to Western Europe that phylogeny analysis of modern R1b haplogroups supports. But, if more pre-Bell Beaker Y-DNA R1b turns up in Southern Europe, I might be persuaded otherwise.

A North African origin for this individual is even more unlikely than the narrative I suggest, since the El Trocs individual's autosomal genetics bear no similarity to North Africans. He is autosomally pretty much identical to lots of other first wave European farmers. Also Neolithic Spain in that vicinity at the time involves cereals and legumes with either wild caught or domesticated pigs or rabbits, while the Chadic people associated with R1b-V88 in Africa at approximately the same time, have at least sheep and goats (and possibly cattle) and seafood, but don't seem to have farmed cereals or legumes.

The oldest instance of R1b1* is found in Samara, Russia in a hunter-gatherer individual about four hundred years earlier (with autosomal genetics similar to other European hunter-gatherers) where the region is teaming with ancient Y-DNA R1b similar to that found in Europe two thousand years later in the Yamanya culture.

A Footnote Regarding Y-DNA H

Y-DNA H, like its parent clade, Y-DNA F, is centered around South Asia.

Y-DNA H1 and H3 in South Asia

Most Y-DNA H is found in South Asia and among South Asian expatriates, including the Romani of Europe. But, this is predominantly Y-DNA H1 (aka H-M69) with a sprinkling of Y-DNA H3 (aka H-Z5857) in South Asia. Y-DNA H1 is present at the highest percentages in Southern India (25%-40%). It is found in about 10% of upper caste men in South Asia, and in about 25%-35% of tribal men in India. It's other sister clade, Y-DNA H3 (aka H-Z5857) is much rare than Y-DNA H1 and is found in some individuals in South Asia.

Outside of India, the most common clade of Y-DNA H is Y-DNA H1a1 (aka H-M82). This is found at rates of 13% to 50% in European Romani men (i.e. European Gypsies). But, a 2003 study of 20 Andaman Islanders found it absent there.

Y-DNA H1a1 Outside South Asia

In Southeast Asian, a 2006 study found one of six Cambodian men tested had Y-DNA H1a1, and a 2000 study found one of eighteen men in Cambodia and Laos had Y-DNA H1a1, as did only 2 of 1090 men in a 2012 study in Northeast India.

At the Northern fringe of South Asia, Y-DNA H1a1 was found in 8 of 188 men in Nepal in a 2007 study, in 7 of 204 men in Afghanistan in a 2012 study. But, a 2007 study found none of 156 men in Tibet had Y-DNA H1a1, and a 2006 study of 26 men in Japan and 18 Siberian men found that it was absent.

Y-DNA H1a1 is found at more than trace levels in Iran and at trace levels in adjacent areas. In West Asia, a 2004 study of 523 men from Turkey found one man with Y-DNA H1a1. A 2009 study found 2 of 150 men with Y-DNA H1a1, and a 2012 study of Iran found 11 such men out of 938. It was not found in a 2011 study of 1789 Caucasian men, or a 2009 study of 66 Georgian men, but was found in 1 of 38 Balkarian men (a Caucasian ethnicity) in the same 2009 study. In the Middle East, a study of 298 men in Yemen, UAE and Qatar found 3 men with Y-DNA H1a1. A 2009 study of 1891 men in Saudi Arabia, Oman, Egypt, Somalia, Lebanon, Jordan and Iraq found one man with Y-DNA H1a1 (in Saudi Arabia where 157 men were tested).

In Europe, Y-DNA H1a1 is found at only trace levels and only in populations that historically probably had contact with the Romani people. In non-Romani European samples, Y-DNA H1a1 was found in 1 in 92 Ukrainian men in a 2009 study, in 1 of 113 Serbian men in a 2005 study (but none among 141 Herzegovians in the same study), and in 2 of 57 Macedonian Greeks in a 2008 study. In the same 2009 study as the Ukrainian men, it found no Y-H1a1 in 92 Greeks, 55 Albanians, 324 Bozniaks, 75 Slovenians, 67 Northeastern Italians, 53 Hungarians, 75 Czechs, and 99 Poles.

Y-DNA H2

Y-DNA H2 (aka H-P96, formerly known as H* and F3) is less familiar and has a very different distribution. It is found in Western Europe in France, Switzerland, Germany, and the Netherlands, and also among Armenians, in Iran and in India.

The Supplemental Materials state with regard to Y-DNA H2 after clarifying the history of this label and identifying the loci used to make the classification that:

While only a few H2 individuals have ever been found, the haplogroup appears to have a west Eurasian distribution; with a low level Middle Eastern presence in modern-day Iran, Turkey, Bahrain, Kuwait and Qatar (Family Tree DNA), as well as minor occurrences in modern-day England, France, Sardinia, Sweden and the Netherlands (Family Tree DNA). H2 also seems to occur at low frequencies in Neolithic sample.*

* Citing for that sentence Haak W, Lazaridis I, Patterson N, Rohland N, Mallick S, Llamas B, et al. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature. 2015; doi:10.1038/nature14317

This is suggestive of the couple of possibilities. One is that Y-DNA H2 could have been a Cardial Pottery minor component that spread from Iberia to the megalithic first wave farmers further to the North as well.

Perhaps as few as just one Y-DNA H2 could have made his was from Southwestern Anatolia (once home to Armenians) who had ancestors who made it there from Iran, and then migrated further to wind up in the Cardial Pottery founding population, without leaving descendants whose descendants survived to the present along the way.

It would be interesting to see if there is any Y-DNA H2 in Tuscany where the Etruscans had their non-Indo-European civilization.

Given the dates of the Y-DNA H2 individual of 2,849-2,628 BCE, the pre-Bell Beaker archaeological context, and the location of the remains, any theory that the H2 was associate with Bell Beaker origins has to dramatically less likely. The oldest Bell Beaker wares in Iberia are from ca. 2900 BCE, and the Bell Beaker culture reached Northwest Iberia much later than that. In principle, a single man could have made this way from Southwestern Iberia to Northwestern Iberia and been incorporated into this community of pre-Bell Beaker Cardial Pottery farmers, but that does not seem very likely, and if Y-DNA H2 had more than a trace presence in a population that was a source of the Y-DNA R1b expansion in Western Europe, Y-DNA H2 would probably be much more common in Europe today than it is in fact, unless there were only one or two individuals with Y-DNA H2 in the entire founding population of the group that caused Y-DNA R1b to expand in Western Europe.