Was Amazonian agriculture a local innovation or transmitted by early Neolithic cultures of the Americas? Is there population genetic structure corresponding to mtDNA patterns showing a more or less exclusively Pacific coastal clade? This and some other recent studies is also starting to show the interplay of prehistoric civilizations in the Americas.
Studies of Native South American genetic diversity have helped to shed light on the peopling and differentiation of the continent, but available data are sparse for the major ecogeographic domains. These include the Pacific Coast, a potential early migration route; the Andes, home to the most expansive complex societies and to one of the most spoken indigenous language families of the continent (Quechua); and Amazonia, with its understudied population structure and rich cultural diversity. Here we explore the genetic structure of 177 individuals from these three domains, genotyped with the Affymetrix Human Origins array. We infer multiple sources of ancestry within the Native American ancestry component; one with clear predominance on the Coast and in the Andes, and at least two distinct substrates in neighboring Amazonia, with a previously undetected ancestry characteristic of northern Ecuador and Colombia. Amazonian populations are also involved in recent gene-flow with each other and across ecogeographic domains, which does not accord with the traditional view of small, isolated groups. Long distance genetic connections between speakers of the same language family suggest that languages had spread not by cultural contact alone. Finally, Native American populations admixed with post-Columbian European and African sources at different times, with few cases of prolonged isolation. With our results we emphasize the importance of including under-studied regions of the continent in high-resolution genetic studies, and we illustrate the potential of SNP chip arrays for informative regional scale analysis.
Chiara Barbier, et al., "The genomic landscape of western South America: Andes, Amazonia and Pacific Coast" bioRxiv (December 23, 2018).
From the Introduction:
In South America, genetic studies robustly recovered a substantial differentiation between the Andes and Amazonia, which has been framed within a model of large communities connected by gene-flow in the Andes vs. small, isolated communities in Amazonia. This model builds on evidence for major complex societies in the Andes (culminating with the well-known but short-lived Inca empire) which fostered population movements and connections, counterbalanced by the traditional view of the Amazon basin as the homeland of small, isolated tribes. The latter view is challenged by increasing evidence of large-scale societies, the role of rivers as primary routes for gene-flow, and the presence of important centers of plant domestication. To gain a better representation of the highly diverse cultural landscape of Amazonia, a more intense archaeological effort is needed, together with a more fine-grained sampling of living and ancient human populations.
In particular, this model of South American genetic structure typically overlooks the Pacific Coast, a key context for the early migration history of the continent and the cradle of the earliest complex societies in South America, from 3000 BCE. Recent studies have begun to investigate human variation on the Pacific coast through aDNA and by sampling urban areas, but to fill out this picture requires further, complementary genetic studies on living populations (especially from non-urban areas).
Language diversity can also be a factor shaping population relatedness. The diffusion of major language families is traditionally associated with demographic movements: this association was validated with genetic data for some of the largest language families of the world, but no strong candidates are found in South America, where genetic diversity overall does not correlated with linguistic diversity. Previous genetic work evaluated alternative models of cultural vs. demographic diffusion for Quechua, the most spoken language family of the Andes, present also in small pockets of the Amazonian lowlands. These studies, based on uniparental markers, revealed intense contact routes in the southern highlands, but not in the northern regions nor in neighboring Amazonia. Relatively few genetic studies have addressed the diffusion of the main language families of Amazonia (notably Arawak, Tupí or Carib), although very recent research does focus on sub-branches or smaller families. Some scholars suggest a major role for cultural contact alone behind the diffusion of Arawak. The particularly fragmented distribution of the three major language families across much of lowland South America calls for a more fine-grained sampling to test for potential connections between their speaker populations.
Here we focus on western South America to investigate environmental and cultural influences on population genetic structure over the three ecogeographic domains: the Andes, the Amazonia and the Pacific Coast, and transitional environments in between. The analysis of new genetic samples from populations with different cultural, linguistic and historical backgrounds should contribute to understanding both the modes of early migration and the immediate consequences of colonial contact. A first open question concerns the scale of the genetic impact of major complex societies, which arose in two main focal regions: the north coast of Peru, and the Andean highlands of Central and Southern Peru and northern Bolivia. Large-scale societies possibly left a trace in the demographic profiles of indigenous populations, associated with high population density, but the extent of long-distance population movements and the origins of the populations that developed such societies remain largely unknown. A second open question concerns the diffusion mechanisms of major language families. We aim at tracing genetic connections over the scattered and widespread diffusion of representative Andean and Amazonian languages, focusing in particular on a vast region where different varieties of Quechua are spoken. Finally, a third open question concerns the demographic events occurring over the last five centuries since European contact, and how they impacted upon different South American populations. Gene flow from European and African sources can be easily distinguished in the genomic ancestry of the American populations. The timing and intensity of the European-mediated admixture has been estimated for urban, heavily admixed regions, but has yet to be investigated systematically across South America.From the discussion section:
[W]e investigated Native American ancestry across the continent. One major Native American ancestry component is shared by all populations, as seen in the ADMIXTURE plot (Fig. S2, K=3, associated with the lowest Cross Validation error), in line with results from other living populations and from ancient DNA, which support a single major migration . This Native American component exhibits a marked population structure; the main signal of structure separates Amazonia from a shared Andes/Coast ancestry. Populations from North America were shown to have a distinct ancestry component in the ADMIXTURE analysis (Fig. S2), with a smaller differentiation between populations from southern North America (Yaquis and Pima) and Meso-/Central America (Mixe and Cabecar, Fig. S4). The diversification of these ancestry blocks from the initial single Native American gene pool does not bear traces of a north-south gradient of differentiation, or of serial founder effects, as the genetic distances between populations displays a radial structure rather than a sequential one (Fig. S4). A previously observed early diverging component similar to the Mixe or the ancient Californian Channel Islands is not captured by our data, which focuses more on South American genetic diversity. Amazonian ancestry is further split into two components: one more widespread in the Amazon Basin (here called “Amazonian Core”), the other corresponding to the piedmont populations of Ecuador and south-western Colombia (“Amazonian North”, Fig. 1 and 2). This latter component is strongly differentiated (it is one of the first ancestry components to appear in the ADMIXTURE analysis, at K=4 - Fig. S2), suggesting that it either diverged early in the migration process, or reflects stronger drift than in the rest of Amazonia. It is unclear if this ancestry could correspond to one branch of the earliest continental population structure, which were hypothesized from aDNA admixture graph reconstructions. This “Amazonian North” ancestry represents a previously unreported source of ancestry from a transitional environment: this region is in fact geographically close to the Northern Andes between Colombia and Ecuador, but its populations are traditionally associated with the Amazonian cultural domain. An early human settlement of Ecuador and northern Peru (between 16.0 and 14.6 kya) has previously been inferred from high resolution mtDNA data, in line with the archaeological record. Meanwhile, the presence of pockets of diversity in Ecuador and Colombia is paralleled by the presence of distinctive Native American lineages, such as Y chromosome haplogroup C3, otherwise rare in the continent. This haplogroup is also reported for other populations in Colombia. Interestingly, haplogroup C is found in the sample from Ecuador (Fig. S8, Table S1), too, but further analysis is needed to precisely identify the Y chromosome subhaplogroup and confirm the C3 affiliation. Finally, populations from the Coast and the Central Andes (both north and south) show close genetic proximity to each other, as visualized by the PCA in Fig.s 2A and 2B and by the same ancestry component profile up until K=6 in Fig. S2). This strongly suggests a common origin and/or extensive contact, which may be associated with a coastal migration route and a colonization process from the coast inland into the highlands. Previous analyses have already noted the common history of these two regions, with first settlement from around 12,000 years ago. . . .
[W]e explored signatures of demographic history and haplotype sharing patterns. The ROH variation profile of most populations from the Coast and the Andes (both North-East and Central-Southern) is consistent with a history of a relatively large population size, with some exceptions (Sechura, Narihuala, Cusco) that may have experienced isolation and drift only very recently (Fig. 3, Fig. S5). The long-term presence of large-scale state societies in the Andes and on the Coast can be expected to have promoted gene flow across wider geographical scales and merged previously structured populations, contributing to the higher genetic diversity of the current inhabitants. On the north coast of Peru, the Moche culture was one of the largest entities from the 1st century CE, followed by the Chimú from the 12th century. Their political influence over the coast would have promoted human connections that overcame the long stretches of desert that separate the main river valleys and the Humboldt current and wind regime that make long-distance seaborne trade difficult. In the Chachapoyas region (North-East Andes), a number of structured societies flourished from the 12th to the 15th century. In the Central-Southern Andes, the Wari and Tiahuanaco ‘Middle Horizon’ (c. 500-1000 CE) and especially the Inca ‘Late Horizon’ (c. 1470-1532 CE) established vast networks that mobilized and moved large labor forces for agricultural production (terracing, irrigation, raised fields), operated resource exchanges through camelid caravans, and resettled populations as explicit state policy. The impact of the Wari and Inca Empires is widely associated also with the diffusion of the two main surviving Andean language families, Quechua and Aymara.There is some genetic linkage with language families.
The Coast and our two Andean sub-regions share a similar ancestry (as discussed above) and a similar history of large population size, but they are differentiated at a finer scale, with localized patterns of IBD segment sharing. Longer IBD blocks are shared almost exclusively within each of the three geographic macro-regions: Coast, North-East Andes and Central-Southern Andes, with the latter sharing fewer and shorter segments, suggesting more ancient contact over a large region, not sustained until recent times (Fig. 4). By contrast, the Amazonian populations in most cases have longer ROH blocks and overall high levels of consanguinity. This could reflect the model first proposed by: larger groups in the Andes vs. small, isolated groups in Amazonia. Nevertheless, by including more populations from a wider range of cultural and geographical backgrounds, we find exceptions to this model with some Amazonian populations characterized by a smaller number of larger blocks, belonging to group 1c and group 2 in Fig. S5. The populations of Amazonia therefore display different demographic histories rather than a uniform history of small sample size (according to the ROH profiles) and are connected by sharing of IBD blocks within the region. Moreover, Amazonian populations also show long distance sharing of large and short fragments with the Andes and the Coast (Fig. 4), which is not consistent with the traditional portrait of isolation between Amazonian populations. This genetic diversity complements the evidence from other disciplines that the region was also home to dynamic, non-isolated population groups. In particular, the linguistic diversity of Amazonia includes not just language isolates but major, expansive language families, with far reaching geographic distributions, reflecting long-range migrations of at least some speakers, and possibly major demographic expansions. There is also clear linguistic evidence for intensive interactions in convergence zones, and (more weakly) across Amazonia as a whole. We explored these potential connections by checking for gene-flow among speakers of the same language or language family. An interesting case is represented by the speakers of Cocama, a language of the Tupí family. The ROH profile for the Cocama of Colombia is lacking in long ROH segments (Fig. 3, Fig. S5), suggesting no recent bottlenecks or isolation. The analysis of shared IBD segments, which indicate shared ancestry through recent contact, reveals a long distance connection between this population and geographically-distant populations of Peruvian Amazonia (Fig. 4). In particular, three Cocama speakers included in the LoretoMix sample from Peru are slightly different from the rest of the LoretoMix sample, and genetically closer to the Cocama of Colombia (Fig. 2A). Archaeological and ethnohistorical evidence indicates that the ancestors of the Cocama and Omagua were widespread in pre-Columbian times, inhabiting large stretches of the Amazon Basin and several of its upper tributaries. Thus, the sharing of IBD segments as well as the lack of long ROH in the Cocama could be explained by large, widespread populations that were connected in pre-Columbian times. Alternatively, more recent migrations could have carried the Cocama language between Colombia and Peru. Both time–frames and both scenarios suggest a parallel between genetic and linguistic history, with language acting as a preferential vector of population mobility.There is evidence of patrilocality and bride exchange over substantial distances.
Weak evidence for long-distance linguistic connections is observed not only within Amazonia, but also between Amazonia and the Andes. This is the case for Quechua-speakers of lowland Ecuador (Kichwa Orellana) and lowland north-eastern Peru (Wayku), who share relatively short IBD fragments with Central-Southern Andes and North-East Andes respectively. Previous results based on Y chromosome haplotype sharing did find a similar pattern of connections between lowland Quechua-speakers in Ecuador and north-eastern Peru, but did not find such long-distance connections with the Central and Southern Andes. These different results can possibly be justified by sex-biased gene-flow (i.e. less male mobility), which should be further investigated with denser sampling and high-resolution mtDNA genome sequences. Overall, this new genomic evidence points towards a demographic connection behind the diffusion of Quechua varieties not only in the southern highlands, as previously attested, but also in the north, across ecogeographic domains. The genetic signature reconstructed can inform the historical reconstructions for the diffusion of this language family.