Tuesday, May 22, 2018

How Big Was The Founding Population Of The Americas?

A current best estimate, based upon genetic data, places the size of the founding population of the Americas in the range of 229 to 300 with a best fit of about 284 people.

The source scientific journal article for the material in the link is as follows:
In spite of many genetic studies that contributed for a deep knowledge about the peopling of the Americas, no consensus has emerged about important parameters such as the effective size of the Native Americans founder population. Previous estimates based on genomic datasets may have been biased by the use of admixed individuals from Latino populations, while other recent studies using samples from Native American individuals relied on approximated analytical approaches. 
In this study we use resequencing data for nine independent regions in a set of Native American and Siberian individuals and a full-likelihood approach based on isolation-with-migration scenarios accounting for recent flow between Asian and Native American populations. Our results suggest that, in agreement with previous studies, the effective size of the Native American population was small, most likely in the order of a few hundred individuals, with point estimates close to 250 individuals, even though credible intervals include a number as large as ~4,000 individuals. 
Recognizing the size of the genetic bottleneck during the peopling of the Americas is important for determining the extent of genetic markers needed to characterize Native American populations in genome-wide studies and to evaluate the adaptive potential of genetic variants in this population.
Nelson J.R. Fagundes, et al., "How strong was the bottleneck associated to the peopling of the Americas? New insights from multilocus sequence data", 41(1) Genetics and Molecular Biology (2018).

This number is the "effective population size" of the Founding population of the Americas which is generally significantly smaller than the actual adult census size of the same population, and requires additional adjustment for non-reproductive age adults. Depending upon the circumstances, effective population size could be somewhat more than half to less than 1% of the total census size of the population.

The review of the literature in the paper recounts many previous estimates of the same quantity, after which the authors argue that their approach is better than their predecessor's approaches, despite a quite small data set that was analyzed.
The first quantitative approach to infer the effective population size of the founder Native American population was developed by Hey (2005), who did a meta-analysis of nine sequence loci, used a likelihood-based inference and assumed a isolation with migration (IM) population model to suggest an extreme population bottleneck with an effective population size of ~70 individuals. Since this pioneer work, other groups tried to replicate this result using multilocus autosomal data, with partial success. Kitchen et al. (2008) re-analyzed Hey’s dataset, adding mtDNA genomic data under different priors for migration rates and suggested an effective population size ranging from 1,000 to 5,400 individuals. Ray et al. (2010), using a dataset of 401 STRs, estimated an effective founder population size between 42 and 140 individuals (with a median of 87 individuals). Between these two extremes, Fagundes et al. (2007), based on the re-sequencing of 50 short loci, estimated an effective founder size of ~450 individuals (with a 95% credible interval (CI) ranging from 71 to 1,280 individuals). Recent autosomal data generated from admixed Latino populations also provided very different figures. Gutenkunst et al. (2009), based on a very large dataset of more than 13,000 SNPs, suggested a value of 800 effective individuals, with a confidence interval between 140 and 1,600 individuals; while Wall et al. (2011), using resequencing data, estimated a bottleneck effective population size not larger than 150 individuals. Gravel et al. (2013) proposed intermediate values of about 514 effective individuals, ranging between 316 and 2,264 individuals.


NeilB said...

Dear Andrew, a quick question as you have obviously read the paper in great detail. Surely as the Authors sequenced the genomes of Native American individuals, they would have missed a lot of the genetic diversity that was lost post European contact? Estimates of the catastrophic affect, in terms of percentage population loss vary, but couldn't this loss have affected their estimate? If the genetic diversity in the Americas wasn't fully sampled doesn't this invalidate their findings? NeilB

andrew said...

Effective population size is a popular thing to study with whole genomes because even if you have only a small sample size (this one involved about a dozen samples), you can get quite accurate results because each SNP in the genome is a different N in the sample size although you have to adjust for the fact that adjacent SNPs are not fully independent from each other although their correlation is very exactly known based on linkage disequilibrium principles. The methods used in this paper are particularly well tuned to samples involving only a small number of individuals.

These results a very robust over bottlenecks comparable to the catastrophe experienced by Native Americans in the post-Columbian era, and the non-random sample is diverse enough to address most of the worst biases that could arise from population structure. Basically, the mixing that would have taken place in the first 14,000 years should be thorough enough to cause most variants to be extremely widespread prior to the bottlenecks of ca. 500 years ago. And, no large regional populations were entirely exterminated in those catastrophic events. Even a loss of 99% of a sub-population of hundreds of thousands of people from a total population of ca. 3M+ does almost nothing to remove the variants found in the surviving population except a handful of disease resistance genes that were positively selected for in some hard sweeps which are a tiny part of the total genome. If 99.999% of the population of say, Native Americans from the U.S. Southwest died, that could reduce effective population size, however.

It also helps that some of the individuals, IIRC, are pre-Columbian ancient DNA, pre-catastrophe.

Sampling just one individual in a population that has even modest admixture with neighboring populations (the rule of thumb is one individual per generation exchanged) should suffice to cover both populations.

Also, since the goal is merely to capture the founding population of the Americas, Na-Dene and Inuit populations and all post-Columbian migrants are excluded by definition, so lack of inclusion of those populations in a sample doesn't matter.

The biggest concern which would be a problem and underestimate diversity is that an almost completely genetically isolated population that is not descended from any of the other sampled populations and is genetically distinctive is entirely omitted from the sample. But, in the places where it is possible that a population has been almost completely genetically isolated for many thousands of years, mostly in South America, we can mostly be comfortable that serial founder effects mean that their genomes are a subset of other populations in the sample. A few Amazonian populations in the far NW of that river basin who appear to have "Paleo-Asian" ancestry are petty much the only population that could fit that profile and I think that there is one sample from one of those populations in the overall sample that solves that issue.

The proof is in the pudding. The result of the paper is of the same order of magnitude of similar estimates made by other methods and is actually a larger potential population size than other recent estimates made using different methods as described in the literature review, some of which actually had larger samples in terms of number of individuals but used the available data from each individual less efficiently. The extreme lack of uniparental marker diversity in Native Americans when extrapolated back 14,000 years corroborates these conclusions, for example.