A new preprint looking at ancient DNA from 11,500 year old remains near Denali Mountain in Alaska shows that the roots of the Founding population of the Americas are more complex and have more structure than previously believed. The abstract doesn't do it justice, so I quote from its discussion of its results instead. Click on the preprint to read the introduction as well which introduces the context of the study and the key facts artfully.
Essentially, they argue that pre-existing population structure in Northeast Asia carried over into the American Founding population. Thus, they argue that the proto-Northern Native Americans and the proto-Southern Native Americans, whose genetics are subtly different, had already diverged into separate genetic communities before they arrived in the Americas even though they have considerable common origins from a Northeast Asian refugium where they diverged from each other.
Linguistic connections to Northeast Asia strongly corroborate and enhance this analysis.
In theory, this analysis helps to explain the Paleo-Asian ancient DNA found in a few isolated populations in the Amazon are nice additional touches. But the narrative doesn't really work, because the Paleo-Asian ancient DNA is immersed in Southern Native American populations, while the delayed linkage to Northeast Asia, genetically and linguistically, is confined to Northern Native Americans (basically Canada and the Northern United States shown in dark red below).
Fig. 4.Maps of archaeological cultures and migrations in Northeast Asia, North and Central America. Seven “time slices” are shown. Key archaeological sites and ancient genomes are marked with yellow stars, archaeological cultures with blue text. Past shorelines and glaciers are also shown. The following abbreviations are used: AB, Ancient Beringians sensu Moreno-Mayar et al., ANE, ancient North Eurasians sensu Raghavan et al.; BHG, Baikal hunter-gatherers sensu Damgaard et al., FAM, the First American lineage in Asia and America; LGM, last glacial maximum, LUP, Late Upper Paleolithic; MUP, Middle Upper Paleolithic; NNA and SNA, northern and southern Native American lineages sensu Raghavan et al.; NS and PS, Neo- and Paleo-Siberians (the meaning of the terms is different from that used by Sikora et al.); UKY, the Ust#x2019;-Kyakhta-3 site.
We question a major result on American pre-history, namely that Ancient Beringians represented by the late Pleistocene/early Holocene USR1 Alaskan individual [the Denali ancient DNA subject] form the deepest branch in the American clade. Here we introduce an alternative interpretation of Ancient Beringian ancestry that is significantly more likely than the one suggested by Moreno-Mayar et al.
We revealed a new admixture cline that can be represented as a mixture between NNA [Northern Native American] and a late Pleistocene Siberian population related to contemporary and later groups in the Amur River Basin (ARB). Only two individuals were confidently placed on this cline: USR1 (11500 cal yr BP) in Alaska and Kolyma1 in Chukotka (9800 cal yr BP). These individuals are roughly contemporaneous with the earliest ARB individual reported here and dated to ~12000 cal yr BP. Ancient Beringians were shown to be genetically unique among Americans, and we offer a different explanation for this uniqueness: a unique admixture event in the history of this group instead of their unique phylogenetic position. . . .
Lexical remnants of a substrate language presumably spoken on the Denali territory before the Athabaskan radiation ca. 2000 – 3000 cal yr BP show similarity with the ChukotkoKamchatkan language family. This suggests that this substrate pre-Athabaskan language could be a member of the hypothetical Chukotko-Kamchatkan–Nivkh linguistic clade. There is also evidence for a distant relationship between the latter clade on the Asian side and Salishan and Algic language families in North America. Since the Nivkh speakers are prominent present-day representatives of the ARB gene pool, it is possible that these linguistic traces reflect the Pleistocene gene flow revealed in this study between the ARB cluster and Ancient Beringians.
We believe that the admixture signals in the USR1 and Kolyma1 individuals remained on the verge of statistical significance in previous studies because the dates of divergence of the First American and ARB lineages from the Asian stem were separated by a relatively short period likely coinciding with the Beringian standstill, and the ARB lineage(s) contributing to Kolyma1 and USR1 split from the ARB stem shortly after the emergence of the ARB population. In other words, the phylogeny is nearly star-like and thus hardly amenable to standard analytical methods. Although without further ancient DNA sampling these split times are hard to estimate, this hypothesis seems probable considering the known divergence dates for First Americans and Siberians and the radiocarbon dates of the individuals mentioned above.
Notably, another analysis revealed a statistically significant signal of Asian admixture in USR1. Four simple demographic coalescent models were tested. . . . A clean split model was rejected for all partner populations tested: present-day Aymara, Karitiana, Athabaskans, Han, Koryaks and Nivkhs. Remarkably, only one model was significantly better than the other three models for the Han/USR1 and Nivkh/USR1 pairs, and that was a second contact model. And Nivkhs are probably the closest present-day relatives of the ARB cluster according to our graph mapping results, with 81% of their ancestry derived from it. We expect that cryptic gene flows similar to the one detected here are common in published admixture graphs, and these undetected gene flows may be important for deciding between competing archaeological interpretations.
The most parsimonious archaeological interpretation of our results, given the geographic anchor points for early NNA, SNA [Southern Native American], Ancient Beringians, and ARB individuals between 13000 and 8000 cal yr BP, is that the divergence of the SNA and NNA clades occurred in Asia, immediately after the end of the Beringian standstill that, in fact, could have taken place outside of Beringia. This model explains the ARB signal in Ancient Beringians without invoking numerous independent crossings of the Bering strait, for which there is no clear archaeological evidence. We hypothesize that SNA ancestors were the first to move to Beringia towards the end of the isolation period, NNA ancestors were the second, and ancestors of Ancient Beringians (an NNA sub-group) were the last to migrate into North America, and thus interacted more with the ARB group ancestors spreading from other refugia at the same time.
The hypothesis that the major First American clades diverged in East Siberia, and not in Beringia, also makes the "population Y" signal in present-day and ancient Amazonians less surprising since it provides more opportunities for contacts between recently diverged American groups and various Asian groups.
The Supplemental Materials contain two mini-papers with separate authorships on linguistic issues which are reproduced below the fold (without indentation of the quoted material to indicate that this is quoted material).
The bottom line of the linguistic analysis in this paper, informed by the newly developed ancient DNA data, is to produce four rather than three linguistic macro-families of pre-Columbian languages in the Americas, which I have summarized in lightly paraphrased text below from the Supplemental Material text, simplified by my own editorial exclusion of some possibilities that I consider to be implausible.
14. Linguistic implications of the gene flow between Siberians and Ancient Beringians
a. Potential Chukotko-Kamchatkan-like linguistic substrate in Alaska
Alexei S. Kassian
Below I evaluate some words of Tanaina, a modern Athabaskan (Dene) language spoken in coastal Alaska which belong to basic vocabulary yet have no etymological connection to other Athabaskan languages. These words probably originate from a pre-Athabaskan substrate language. Several basic terms (including Swadesh items) demonstrate phonetic and semantic closeness to the reconstructed Proto-Chukotko-Kamchatkan vocabulary. The assumed preTanaina substrate language might either belong to the Chukotko-Kamchatkan family or be an outlier in the Chukotko-Kamchatkan–Nivkh clade.
Tanaina, a.k.a. Dena'ina, is an endangered language of the Athabaskan group of the NaDene family (103, 104). It consists of several dialects spoken in the southern part of Alaska. Modern Tanaina's territory partially overlaps with the area of the Denali archaeological complex. A specific feature of Tanaina is that it lacks inherited Athapaskan roots for some semantic concepts (including Swadesh items) using non-etymologizable expressions instead. These cases were collected by James Kari (104: xxi, 105: 545, 106: 59), who called them "elite replacements", explaining them as either substrate loans or taboo replacements. Actually, the Tanaina stems analyzed by Kari can be divided into two classes.
The first class consists of nominalized verbal forms, which have superseded original roots; e.g., ‘head' is expressed as ‘tip that extends', and so on. In these cases, we are most probably dealing with normal gradual evolution of the original lexicon during natural language development. Note that we are not aware of specific evidence that such replacements were indeed taboo-driven in Tanaina.
The second class consists of Tanaina words, which are morphologically unanalyzable and unetymologizable. For example, ‘fire' is expressed by the enigmatic form tazʔi in the bulk of the Tanaina dialects, whereas the inherited Athabaskan term qʰǝn ‘fire' is only used in the Upper Inlet dialect. In these cases, it is likely that we are dealing with remnants of a substrate language which was previously spoken in this territory and later assimilated by Tanaina centuries ago. For the sake of convenience one can label this potential substrate language as "Pre-Tanaina".
The potential Pre-Tanaina words partially belong to the basic vocabulary. First of all, these include several Swadesh items:
• ʁǝs ‘bone' (104: 12).
• (ǝ)ɬtʰuʁ ‘eye' (104: 88).
• tazʔi ‘fire' (104: 248).
• čʼix ‘hair (of head)' (104: 87).
• miɬni ~ piɬni ~ vinɬni ‘water' (104: 121, 293).
• kʼis-ǝn ‘woman; girl' (104: 72), kʼis-i ‘female animal' (104: 13).
Other Tanaina basic terms of unclear origin are:
• čʰikʼa ‘wood, dry wood, stick, firewood' (104: 63, 250).
• iƛʼ ‘clothing (in general)' (104: xxi, 165). 77
• izin ‘arrow' (104: xxi, 207).
• kʼaƛʼ ‘snow showers with moist large flakes' (104: xxi, 153).
• qʰičʰi ‘old woman, old lady' (104: 73).
• χenzi ~ χeɬni ‘coma, unconscious' (104: 99).
• yusti ‘open fire, fireplace, outdoor hearth' (104: 248).
More cultural Pre-Tanaina words are:
• ʔa- (incorporated) ‘to hunt bear at night' (104: xxi).
• ʔǝk ‘to act as a shaman' (104: 307).
• čanču ‘medicine object, dream object kept as source of power; shaman's spirit that can travel' (104: 306).
• ič ‘a contagious disease (unidentified)' (104: 99).
• kǝš ~ kʰǝš ‘webbing holes in snowshoe frame' (104: 239).
• nǝnli ~ nǝli ~ ǝnli ~ nǝlni ‘stream bath' (104: 221).
• niqitay ‘large skin boat with flat-bottom pole frame (downstream use)' (104: 243).
Tanaina names of endemic species form a specific subset of the potential substrate vocabulary:
• aqema ‘arctic ground squirrel' (104: 8).
• kʼ=čʰǝʁuxa ‘marten, pine marten' (104: 5), indefinite possessive prefix kʼ-.
• kʼ=qʰušiya ‘marmot' (104: 5), indefinite possessive prefix kʼ-.
• palušuk ‘chiton (mollusk)' (104: 22).
• qʰančʰi ‘porcupine' (104: 7).
• qʰiyχi ‘marmot' (104: 5).
• qʰunša ‘arctic ground squirrel' (104: 8).
• tʰaza ‘sea lion' (104: 9).
Out of the 13 aforementioned words from Pre-Tanaina basic vocabulary, 8 stems have phonetically similar and semantically compatible counterparts in the Chukotko-Kamchatkan family.
The Chukotko-Kamchatkan family consists of two branches (sub-families): Chukotian and Itelmen (a.k.a. Kamchatkan). A genetic relationship between Chukotian and Itelmen is generally accepted by experts (107-113). The opposite opinion was expressed by Worth (114) and Volodin (115, 116: 224–228), who supposed that the observed Chukotian-Itelmen matches, which cover not only basic vocabulary but also some main grammatical exponents (109), are contact loans from one language group to another. Worth-Volodin's scenario, however, clearly contradicts the theory of language contacts which predicts that cultural vocabulary is borrowed first, whereas basic vocabulary and main grammatical exponents are most protected from borrowing (117).
My Proto-Chukotian reconstruction is based on Fortescue's (110) etymological dictionary as well as on main lexicographic sources on synchronic Chukotian languages. Reconstruction of Proto-Itelmen is a less trivial task, because 18th-19th century data on the extinct Eastern and Southern Itelmen languages are not very reliable and consistent. I generally follow Mudrak's reconstruction of Proto-Itelmen (112, 118, 119) with emendations and/or simplifications, if needed.
There are too many unsolved obstacles in the Chukotian-Itelmen comparison at the current stage of research which makes reliable phonological reconstruction of a Proto-ChukotkoKamchatkan vocabulary impossible. Instead we are forced to use separate Proto-Chukotian and Proto-Itelmen vocabularies.
The most striking case of coincidence between Pre-Tanaina and Chukotko-Kamchatkan with direct semantic match is:
(1) miɬni, piɬni, vinɬni ‘water' (104: 121, 293), the variants are distributed among the Tanaina dialects. Initial m- points to a non-inherited form, it was denasalized m > p~v in some dialects in order to avoid a foreign nasal labial phoneme as, for instance, in Tanaina milo ~ vilo ‘soap' < Russian mɨlo ‘id.' or muka ~ puka ‘flour (food)' < Russian muka ‘id.'. ► Cf. Proto-Chukotian *mi-məl (partial reduplication) ‘water' (110: 99).
Another interesting case is:
(2) (ǝ)ɬtʰuʁ ‘eye' (104: 88). ► The Tanaina stem ɬtʰuʁ may contain cognates of ChukotkoKamchatkan *lV ‘eye', reduplicated pl. *lV-lV (> Chukotian *lǝ-lä ‘eye', Itelmen *lo- ‘eye', pl. *lu-l-, 110: 163; 118) and the Chukotko-Kamchatkan plural or rather dual ending *-t(V) (110: 426, 433). Final -ʁ remains without explication, however.
Other cases with weaker phonetic compatibility and/or without direct semantic matches are:
(3) kʼaƛʼ ‘snow showers with moist large flakes' (104: xxi, 153). ► Chukotko-Kamchatkan *qǝl ‘snow (esp. on ground)' (> Chukotian *ʁǝl-ʁǝl, Itelmen *qal-ʔal, reduplication in both cases, 110: 243). Dissimilation q-q > q-ʔ in Itelmen suggests how the ejectivity in the Athabaskan form might emerge.
(4) ʁǝs ‘bone' (104: 12). ► Itelmen *kas(-)x- ‘shank, shin' (118). For a similar semantic shift, cf., e.g., Icelandic leggur ‘shank; bone (of arm or leg)'.
(5) kʼis-ǝn ‘woman, girl' (104: 72), kʼis-i ‘female animal' (104: 13); final -ǝn and -i are human and non-human nominalizers respectively. ► Chukotian *čakǝ-ɣet ‘sister' (110: 42), Itelmen *čik- ‘girl, young woman' (118). A metathesis?
(6) iƛʼ ‘clothing (in general)' (104: xxi, 165). ► Chukotian *iðʁǝ-n ‘parka', Itelmen *ečʼ ‘clothing' in the complex negated stems ‘naked' and ‘to get undressed' (110: 94; 118).
(7) čʼix ‘hair (of head)' (104: 87). ► Chukotian *ðǝɣ ‘fur' (110: 64).
(8) yusti ‘open fire, fireplace, outdoor hearth' (104: 248). ► Chukotko-Kamchatkan *uyi ‘to make fire, cook' (> Chukotian *uyi, Itelmen *uyi, 110: 305). The assumed nominal suffix -st- in the Tanaina form requires additional explanation.
The lexical similarities offered above may indicate that the assumed linguistic substrate of Tanaina (Pre-Tanaina) represented a close relative of the Chukotko-Kamchatkan family. In other words, it might be a trace of an ancient migration from Siberia to Alaska which probably predated the Athabaskan radiation.
Assessing the offered phonetic and semantic similarities between Proto-ChukotkoKamchatkan and a recent Tanaina substrate, one should keep in mind that we are dealing with a very long chronological distance probably comparable, say, with the distance between ProtoBalto-Slavic (a representative of the Indo-European family) and modern Finnish (a representative of the Uralic family). The split between Indo-European and Proto-Uralic branches is dated back to ca. 8-10 millennia BC (120). Out of 7 Proto-Indo-European–Proto-Uralic lexical etymologies within the Swadesh 110-item wordlist (121: 320), 5 roots have Proto-Balto-Slavic and modern Finnish descendants without semantic changes, but with very different degree of phonetic similarity:
• ‘I (obl.)': Proto-Balto-Slavic *me-n- / Finnish minä.
• ‘name': Proto-Balto-Slavic *inmen- / Finnish nimi.
• ‘thou': Proto-Balto-Slavic *tuː / Finnish sinä.
• ‘water': Proto-Balto-Slavic *vad-aː / Finnish vesi.
• ‘who?': Proto-Balto-Slavic *k- / Finnish kuka.
Note that I intentionally do not use the hyphen in the Finnish forms mi-nä ‘I', si-nä ‘thou' and ku-ka ‘who', since we know nothing about morphology of the hypothetical Tanaina substrate and are not able to propose a synchronous morphological analysis of the Tanaina substrate forms.
Can the observed phonetic similarities between Pre-Tanaina and Chukotko-Kamchatkan be chance coincidences? Definitely yes, but unfortunately we do not have available tools to estimate the probability of getting these similarities due to chance. Weighted permutation test as developed in Changmai et al. forthcoming returns insignificant p = 0.333, if 6 available Swadesh items of Pre-Tanaina and Proto-Chukotko-Kamchatkan are compared (Table S22). It is not expected, however, to obtain a significant p-value for such a time distance since the accumulated phonetic mutations should seriously obscure original phonetic similarities. For examples, the permutation test applied to the Proto-Balto-Slavic and modern Finnish 110-item wordlists also returns insignificant p >> 0.05, but comparison between the wordlists with a shorter time distance, Proto-Indo-European and Proto-Uralic, yields significant p = 0.019 or 0.005 depending on the procedure (121).
Nevertheless, from the traditional linguistic point of view the available matches, especially Tanaina miɬni ‘water' / Proto-Chukotian *mi-məl ‘water', looks persuasive enough to hypothesize about a Chukotko-Kamchatkan-like substrate in Alaska.
meaning Tanaina Proto-Chuk. Proto-Itelmen Proto-Nivkh
‘bone' ʁǝs *qətʁəm *ktχʷəm *ŋa=ɲɣǝv ‘eye' ɬtʰuʁ ~ ǝɬtʰuʁ *lə-lä *lo- *ɲŋaɣ
‘fire' tazʔi *milɣə- *xi=mɬx *tu-ɣr ~ *tuɣ-r
‘hair' čʼix *kəð=wir *kʼʷimi *ŋa=mɣ
‘water' miɬni *mi=məl *iʔ *ȶaʁ
‘woman' kʼis-ǝn *ŋäv- *ŋim-sx *taŋq
Table S22. Six Swadesh item of the assumed Tanaina substrate language (Pre-Tanaina) and their ProtoChukotian, Proto-Itelmen and Proto-Nivkh semantic counterparts.
This substrate scenario raises a question how the assumed Tanaina substrate correlates with Nivkh, a compact language family spoken in the Amur river region and on Sakhalin island, since it is likely that Nivkh is the closest linguistic relative of Chukotko-Kamchatkan (36). See Changmai et al. forthcoming for probabilistic evaluation. In this case, lexical matches between Pre-Tanaina and Proto-Nivkh also could be revealed, but I failed to find convincing etymologies between the Pre-Tanaina forms and the Proto-Nivkh basic vocabulary (as reconstructed by Fortescue (122)). The factual absence of recognizable matches between Pre-Tanaina and ProtoNivkh allows for at least two explications. First, the cognate words in the Tanaina substrate and/or Proto-Nivkh might have mutated to such a degree that currently are not discernible to the naked eye. Second, since the number of Pre-Tanaina words known to us is very modest (e.g., out of 110 Swadesh items, only 6 words are available), it would not be surprising if all historical PreTanaina–Nivkh cognates fell into a non-attested part of the Pre-Tanaina vocabulary. If we assume that historically there were, say, 10 Pre-Tanaina–Proto-Nivkh etymological matches within the Swadesh 110-item wordlist, the probability of missing all these etymologies in the random subset of 6 concepts (as in the Tanaina case where only 6 Swadesh items likely belong to a substrate language) is very high: p = 0.557.
In summary, the available linguistic data (if we are not dealing with a series of chance coincidences) suggest that the assumed pre-Athabaskan substrate language might either belong to the Chukotko-Kamchatkan family or be an outlier in the Chukotko-Kamchatkan–Nivkh clade.
b. Modeling diverse Asian origins for indigenous North American language families
by Edward Vajda
The discovery of a Late Pleistocene gene flow from Northeast Asia into Northern Native Americans (NNA) dated to at least ~12,000 yBP that in this publication is termed Houtaomugarelated (from the eponymous archaeological site) has implications for understanding linguistic prehistory on both sides of Bering Strait. The genes in question are shared on the Asian side by contemporary speakers of Nivkh, Tungusic, Japanese, and Korean languages, but in North America this genetic component was found only in a 11,500-years-old individual USR1, whose genome is the best-quality representative of the Denali archaeological culture. A signal found in a much later ancient Athabaskan individual (ca. 730 yBP, Data S6) was supported on one dataset only, and thus unequivocal evidence for the survival of this component in recent and contemporary peoples is lacking. But given the location of USR1 in Alaska and subsequent development and contacts of the Denali culture attested archaeologically, we expect to find Houtaomuga-related genetic heritage among speakers of four major primary families (Algic, Salishan, Eskaleut, Na-Dene) and several smaller primary families or isolates indigenous to the Pacific Northwest (Tsimshianic, Wakashan, Haida, and possibly also Chimakuan and Kutenai). "Primary family", "family", and "isolate" – a language family with only one member – all refer to genealogical taxons with no currently proven deeper relatives. The goal here is to triangulate data from linguistics, genetics, folklore studies, and archaeology to assess whether chronologically deeper genealogical relationships might exist among these families that would parallel this human genetic link. Several distinct scenarios for the genesis of linguistic diversity in the North America and its more ancient roots in Eurasia are identified.
While archaeology, genetics and the study of folklore motifs have already yielded many facts of potential relevance for modeling the prehistoric origins of indigenous New World language families, several factors continue to argue against the likelihood that language relatedness can be firmly demonstrated at time depths older than the Early to Mid-Holocene. Languages change at variable and unpredictable rates. Attempts to determine a separation time between languages known to be genealogically related based on equating the percentage of shared cognates with a particular time depth (glottochronology) are unreliable. Only when parallel information from historical records, human genetics, or archaeology happens to be available has it been possible to determine the timing of a language family's breakup. One primary family dated with the help of extra-linguistic methods is Indo-European (in the sense of Indo-Hittite), for which archaeological parallels help demonstrate a time depth range of 5800 to 6500 yBP (123). Possibly the oldest universally accepted primary family proven using the Comparative Method is Afroasiatic, which contains Arabic, Hebrew, Amharic, Berber and many other languages of northwestern Africa, and is claimed to have a time depth of ~8,000 years (124). The apparent temporal limitations of the Comparative Method cannot be overcome by substituting human genetic findings for linguistic evidence. The lack of strong correlation between population genetics and language relatedness over time precludes the reliability of the former in contributing actual evidence of linguistic relatedness. The best that can be achieved at present in exploring the Old-World origins of New World language families from an interdisciplinary perspective is a clear articulation of which linguistic scenarios are indicated as plausible by the currently known facts from other scientific disciplines, and which are not. In other words, which speculative hypotheses of distant language relatedness have parallel support from human genetics and archeology, and which do not?
The discovery of a Late Pleistocene Northeast Asian gene flow into at least some NNA (Northern Native American) populations opens the logical possibility that one or more indigenous language families of northwestern North America may descend not from an ancestor spoken in Beringia throughout the Beringian Standstill but rather from one brought after the end of the standstill only about 16,000 years ago. It had previously been thought that all First Peoples of the Americas (defined as indigenous New World populations whose ancestors entered North America from Asia in Pleistocene times) descended from a population that had remained isolated in Beringia during the Last Glacial Maximum from ~23,000 to ~15,500 yBP, a period called the Beringian Standstill. This population later contributed to the genomes of indigenous populations across the Americas. Given the time depth involved, any linguistic connections between contemporary Old-World populations and First Peoples of the Americas mediated through descent from a language spoken in Beringia during the Standstill would necessarily date back deeper than ~23,000 yBP, far too ancient to be demonstrated using accepted comparative linguistic methods. The newly discovered human genetic link between certain populations of Northeast Asia and North America at the much younger time depth of at least 12,000 years ago opens new possibilities for investigating cross-Bering linguistic connections, though the time depth involved may still be too remote to evaluate them with full confidence.
Aside from this gene flow, the ancient Beringian population appears to have been isolated from contact with other populations of Eurasia throughout the last 10,000 years of the Late Pleistocene. At the time of European contact in 1492, the indigenous peoples of the Americas spoke over 120 primary language families (including many isolates) spread from Alaska to Tierra del Fuego. All of these peoples, including speakers of Eskaleut and Na-Dene, possess at least some genetic ancestry derived from the population living in Beringia during the Standstill. The population ancestral to all Native Americans underwent a further split ~15,500 yBP into Southern Native Americans (SNA) and Northern Native Americans (NNA) (13, 15). SNA groups exclusively populated all of South and Central America, as well as much of North America. NNA genes are found in the remaining indigenous populations of northern North America. Looking from the vantage of genetics and archaeology, it is therefore possible that every primary language family of the New World descends from a language (or several languages) spoken in Beringia during the later phase of the Standstill and later spread south into the Americas by either NNA or SNA populations.
For SNA populations, a language origin from Beringia early in the Standstill remains the only logical possibility so far indicated. This encompasses over 90% of the indigenous primary language families documented in the Americas (about 115). If it is true that language genealogy cannot be reliably traced back beyond eight or ten thousand years, then it would not be possible to determine whether SNA ancestors brought multiple distinct proto-languages into the Americas from Beringia in the Late Pleistocene or only one. The Early Beringian population derived about 70% of their genetic ancestry from East Asians and 30% from a deep branch of the European clade termed Ancient North Eurasians or ANE (13, 15, 40). An analysis of the comparative folklore motifs presented in Berezkin (125) suggests that several founding groups with distinct cultures emerged from the post-Standstill Beringian bottleneck to give rise to New World populations. It is therefore plausible that multiple distinct languages connected with either the ANE or East Asian genetic components of Beringia's population were spoken in Beringia at the beginning of the Standstill and that the descendant(s) of more than one of them were brought into the Americas in the Late Pleistocene. The language(s) spoken by the earliest SNA populations in the Americas might therefore descend from: 1) a single language of ANE origin; 2) a single language of ancient East Asian origin, or 3) multiple languages from both or either origin.
These same possibilities equally apply to the linguistic origins of NNA populations, given their shared origin with SNA populations earlier in Pleistocene Beringia at a time depth of ~15,000 yBP. As is true of languages spoken by SNA populations, all languages spoken by populations with NNA genes could likewise derive from a proto-language (or several protolanguages) with ancestral roots present in Beringia during the early phase of the Standstill. However, there exists a second possibility for the linguistic provenance of NNA populations that is not indicated for SNA populations: some language families spoken by NNA populations may descend from a proto-language connected instead with the newly discovered Northeast Asian gene flow brought into Late Beringia after the end of the standstill. The ramifications of this possibility for historical-comparative linguistics will be explored in detail below.
Only two primary language families spoken by populations with NNA genes show additional possibilities. Proto-Na-Dene could have entered Alaska much later, during the MidHolocene about 5,000 years ago in connection with the so-called Paleo-Inuit (Paleo-Eskimo) migration from Siberia (11), rather than deriving from a source in either Ancient Beringia or Late Pleistocene Northeast Asia. Finally, for Eskaleut speakers there exists both this possibility as well as a fourth: while Proto-Eskaleut probably first arrived in North America ~5000 yBP, the population directly ancestral to Yupik- and Inuit-speaking groups may have crossed Bering Strait only ~2000 yBP.
Table S23 summarizes the four logical possibilities for indigenous New World language origins so far suggested by genetics and archaeology:
Table S23. Plausible Origins of Indigenous New World Language Families
Hypothesis One: Early Beringian Pleistocene Origin: The language descends from one or more proto-languages spoken in Beringia already at the earliest phase of the Standstill with deeper origins connected either with ANE or Ancient East Asians. New World language families with Early Beringian origins would therefore be separated by a time depth of at least ~23,000 years from any Old-World relatives and probably not recoverable by the Comparative Method. For indigenous languages spoken by populations with SNA genetic origin (about 115 of the recognized 123 primary New World families), this is the only possibility currently indicated by human genetic evidence. Therefore, the Early Beringian origin hypothesis encompasses the vast majority of indigenous language families of the Americas. Although a linguistic origin in Early Beringia is likewise theoretically possible for the seven to nine primary families spoken by populations with the NNA genetic component (Eskaleut, Na-Dene, Haida, Tsimshianic, Wakashan, Salishan, Algic, as well as possibly Quileute and Kutenai), there exist additional possibilities to account for their Old World origins.
Hypothesis Two: Late Pleistocene Northeast Asian Origin: This hypothesis is new and considered here for the first time. The language descends from a proto-language not present in Early Beringia but rather brought from Northeast Asia after the end of the standstill about 16,000 years ago, and therefore potentially separated from possible linguistic relatives in Northeast Asia at approximately the same time depth. This "Trans North Pacific" hypothesis appears plausible only for languages spoken by NNA populations, including Na-Dene and Eskaleut, and not to populations in the southern or eastern areas of North America or in Central or South America, for which only Hypothesis One applies.
Hypothesis Three: Mid-Holocene Siberian Origin: The ancestral language arrived in Alaska ~5000 yBP in conjunction with the so-called Paleo-Inuit migration, which contributed at least 50% genetic ancestry to Eskaleut speakers and 30-40% to early Na-Dene speakers (11). This hypothesis is not applicable to other NNA populations, including Haida speakers, who lack this gene flow except where later admixed with neighboring Tlingit or Dene speakers. Based on the sharp linguistic difference between Eskaleut and Na-Dene, which are universally recognized as completely unrelated, a Mid-Holocene Siberian origin for both families is plausible only if two linguistically distinct groups crossed Bering Strait and entered Alaska at this time (126).
Hypothesis Four: Late-Holocene Siberian Origin: The ancestral language was brought across Bering Strait during the past 2000 years by an Asian population closely related to that involved in Hypothesis Three. This scenario is applicable only for Eskaleut and probably only for its Yupik-Inuit branch, if Aleut had already been brought into Alaska by the migration from Siberia that occurred ~5000 yBP. See a discussion of the back and forth movement of Eskaleut ancestors across the Bering Strait and the special position of Aleuts in Flegontov et al. (11).
Genetics and archaeology thus indicate four distinct possibilities for the origin of language families spoken by NNA populations in North America. How can findings from comparative linguistics help choose between them? Identifying language genealogies across Bering Strait would narrow down these possibilities. Success in demonstrating cross-Bering language relationships using the Comparative Method, assumed not applicable at time depths earlier than the Holocene, would support a Holocene-Era entry into the Americas for both Na-Dene and Eskaleut. Na-Dene shows increasingly robust evidence of relatedness to the Yeniseian language family in Siberia (Dene-Yeniseian), while Eskaleut appears related to Uralic and Yukaghir in what has been called the Uralo-Siberian family (126). The fact that Eskaleut and Na-Dene have never plausibly been related either to one another or to any of the other primary families spoken in the Americas could be explained if both Eskaleut and Na-Dene were brought into Alaska during the Mid or Late Holocene (126). A firm demonstration of Dene-Yeniseian and UraloSiberian would support a mid-Holocene entry into Alaska of proto-Na-Dene and Proto-Eskaleut and would also attest to the unrelatedness of each of these two families to other indigenous North American families, whose speakers share no Mid-Holocene Siberian gene flow, except where due to late admixture with adjacent Tlingit, Dene, or Inuit-speaking populations. For the remaining five (or possibly seven) primary language families spoken by populations with NNA genes, the data from genetics do not reveal a Holocene-era entry and therefore indicate only a choice between Hypothesis One and Hypothesis Two.
The Trans-North-Pacific Hypothesis (Hypothesis Two) is intriguing since a time depth of ~16,000 years rather than ~23,000 years holds more potential for tracing relatives in the Old World. Of languages spoken by contemporary Asian peoples with a human genetic link to NNA populations, Japanese, Korean and Tungusic show a deep typological patterning with the socalled Altaic (or Uralo-Altaic) band of language families that also encompasses Mongolic and Turkic. None of these languages show structural affinities with North American languages, and it is plausible that the Late Pleistocene Amur River Basin ("Houtaomuga") genetic component present in contemporary Japanese, Korean and Tungusic populations represents a substrate unconnected with the origin of the languages these populations speak today. Only Nivkh and Chukotko-Kamchatkan, which themselves may be genealogically related (36, 127), remain as possible candidates for a distant relationship to languages spoken by NNA populations. Both families have already been investigated by a number of linguists working independently of one another as potential relatives to Algic, Salish, Wakashan, and possibly also Chimakuan and Kutenai (108, 127-129). It is not clear whether Chimakuan and Kutenai speakers actually possess the NNA genetic component, but their locations make this plausible. The Chimakuan microfamily contains the closely related Quileute and Chemakum languages, both indigenous to Washington's Olympic Peninsula. The Kutenai (also spelled Ktunaxa or Kootenai) language isolate is spoken in Montana and adjacent areas of Canada. Kutenai has been plausibly but not definitively linked to Salishan based on both lexical and morphological evidence (130). Dryer (131) investigated Kutenai parallels to Algic morphosyntactic patterns that may be due to deep genealogical relatedness. Another isolate occasionally claimed to be related to Algic is the extinct Beothuk of Newfoundland (132). Nothing definitive has yet been published about Beothuk genetics, but the scant attestations of Beothuk vocabulary do not indicate a relationship with Algonquian. In any event, the homeland of Algic speakers is more plausibly located far to the east, near the Columbia Plateau in the area where there is a primary split between Blackfoot (Siksika) and the remaining Algonquian languages (133). The Siouan languages of the upper Great Plains likely have their ancestral homeland in southeastern North America, far from NNA populations. Finally, the various Penutian languages of eastern and southern Washington State do not pattern with Pacific Northwest coastal language families in most regards and instead appear connected with languages farther south in Oregon and interior California. Siouan, Penutian, Iroquoian and other language families spoken farther south in North America do not appear to have any genealogical link with languages spoken by NNA populations.
What linguistic features could have survived for 16,000 years that might indicate a genealogical relationship between languages spoken by NNA populations in North America and Nivkh or Chukotko-Kamchatkan languages in northeastern Asia? The Comparative Method relies on a combination of evidence from cognate vocabulary and homologous grammatical systems, unified by a system of interlocking sound correspondences. Claims of genealogical relatedness between languages based entirely on resemblances in basic vocabulary or on typological similarities without cognate morphological combinations cannot be regarded as convincing evidence of genealogical language relatedness (134). As mentioned already, if by eight to ten thousand years the pace of language change generally erodes the quantity of evidence needed to fully demonstrate language relatedness, it might simply not be possible to prove linguistic genealogy deeper than the Holocene, even if some evidence of language relatedness did survive from that time. However, if at least partial evidence of this linguistic link survives in contemporary languages, what might this be? It is likely that some cognates in basic vocabulary roots have survived in detectable form among languages with Pleistocene-era genealogical relationships. But finding them based purely on phonetic resemblance without any overarching morphological comparison is likely to yield either too many false matches (due purely to coincidence or, in some cases, to ancient borrowing) resulting in too few positive matches to posit a verifiable system of sound correspondences. Even when examining basic vocabulary, sorting out false lookalikes from true cognates necessarily requires a convincing and systematic morphological analysis to justify the division of root morphemes from fused affixal elements. For this reason, sound correspondences calculated from putative cognates in basic vocabulary without parallel evidence of an inherited morphological system have never proven a single language family. A recent study of primary families in northern Asia (135), concluded that grammatical homologies are in fact more persistent over time than phonological and lexical patterns.
When attempting a preliminary hypothesis of possible distant language relationship (either areal or genealogical) it is therefore preferable to search first for shared morphological patterns of the type explored by Nichols (124) and Fortescue (108). These include unusual combinations of morphosyntactic traits (like the rare combination of prefixes with SOV syntax and postpositions shared between Yeniseian and Na-Dene). Other potential evidence includes firstand second-person pronoun configurations such as the M/T pattern in Eurasia or the N/M pattern in the western portions of the Americas. Both patterns could possibly be remnants of ancient linguistic genealogies. Another possibility includes shared inflectional systems of case endings, tense/aspect/mood affixes, or other idiosyncratic homologies in complex word structure, including specific types of polysynthetic verb structures. Though this sort of "typologicalcomparative" approach cannot replace the Comparative Method in proving genealogical relationship, its results can furnish preliminary and speculative support for certain possibilities over others and point the way toward more intensive comparison that includes both vocabulary and morphological systems.
As already mentioned above, Japanese, Korean, and the Tungusic languages pattern typologically with Turkic and Mongolic that are spoken farther west in Eurasia, rather than with Native American families. All of these families are strongly suffixing and agglutinating; they allow strings of inflectional suffixes in which each morpheme expresses a single, separate grammatical meaning (such as ‘plural', or ‘locative'), unlike the fusional Indo-European 86 languages, where words have one at most two inflectional suffixes, usually with multiple meanings (such as ‘singular' + ‘locative' being fused in a single suffix and ‘plural' + ‘locative' fused in a completely different suffixal form). These shared patterns arose at least in part due to extensive language contact but may also reflect distant genealogical relationship between some or all of these language – a problem still not resolved by historical linguistics. The B/S pattern of first- and second-person singular personal pronouns shared between Tungusic, Mongolic and Turkic has never been adequately explained as arising either from language contact or through sheer coincidence. The Altaic B/S pattern, in turn, has been linked genealogically with the M/T pattern shared by Indo-European, Uralic, Yukaghir, Eskaleut and possibly ChukotkoKamchatkan. Languages of North America's Subarctic regions or Pacific Northwest Coast show no discernable affinities with any of these languages. The most plausible conclusion is that the Amur River Basin genetic component in Asian populations speaking Japanese, Korean, or Tungusic in the latter groups is a genetic substrate not historically connected with their language origins.
The situation is quite different with Nivkh and Chukotko-Kamchatkan. Both Nivkh and Chukotko-Kamchatkan are polysynthetic, "Paleosiberian" languages with verb structures that display typological affinities with certain languages spoken in North America. Neither of these primary language families fit the broader Uralo-Altaic suffixal-agglutinating type, though both share oblique case systems that pattern typologically with other Inner and Northern Asian languages. Chukotko-Kamchatkan also appears to exhibit the M/T pronoun pattern that appears to tie it to Uralic, Yukaghir and Eskaleut, though other forms of first- and second-person pronoun markers also exist in the family and it is not clear which should be reconstructed as primary. There has been extensive areal contact between Eskaleut and Chukotko-Kamchatkan, presumably occurring in the Late Holocene (126) that must be sorted out before attempting a deeper comparison with Nivkh or languages of northwestern North America. Fortescue (36) has already worked out a promising body of preliminary evidence for a genealogical connection between Nivkh and Chukotko-Kamchatkan that includes lexical cognates, grammatical homologies and the beginnings of systematic sound correspondences. Importantly, none of this putative evidence can be found in Yeniseian (also considered a "Paleosiberian" language) or in Na-Dene; instead, both of these families share a completely different type of polysynthetic verb structure with a prefixing templatic grid that contrasts starkly with that of all other families in Northern Asia and North America (136). Similarities between Nivkh and Chukotko-Kamchatkan shared with North American languages thus skip over both Eskaleut and Na-Dene, a fact that is explainable if the latter two families entered North America during the Mid- or Late Holocene. The comparison with languages spoken by the remaining NNA populations undertaken below will consider Nivkh and Chukotko-Kamchatkan together, though these two families have yet to be fully proven as genealogically related.
The most promising possibility of determining a linguistic parallel to the Late Pleistocene gene flow into NNA populations concerns Nivkh and Chukotko-Kamchatkan on the Asian side and Algic and Salishan in North America. The resulting hypothesis of genealogical relationship could be called "the Trans North Pacific Hypothesis" or "Trans North Pacific Phylum", with the use of the word "phylum" indicating a time depth that might not be amenable to conclusive demonstration using the Comparative Method. Several important prior studies have already been published working toward this goal. As earlier mentioned, Fortescue (36) has made a plausible case that Chukotko-Kamchatkan languages could be distantly related to Nivkh, citing morphological homologies as well as lexical cognates with the beginnings of a system of sound correspondences. Bakker (137) describes a range of idiosyncratic structural homologies between Algic and Salishan, some of which also subsume Kutenai. Nikolaev (128, 129) has compiled an impressive list of potential lexical cognates shared between Algic, Salishan and Nivkh, upon which basis he posits a system of interlocking sound correspondences. Lexical correspondences with Chukotko-Kamchatkan, however, are identified by Nikolaev as ancient loanwords rather than potential cognates. Although none of this work has yet conclusively proven the relationship between any of these primary families, it is instructive to note that all of it was conducted before the discovery of any human genetic connection between the populations speaking these languages, in much the same way that the Dene-Yeniseian linguistic hypothesis was formulated prior to the discovery of a Siberian genetic component shared between modern Kets and NaDene speaking populations.
Other Pacific Northwest language families spoken by populations with NNA genes are less likely to be connected with this hypothesized "Trans North Pacific Phylum". Wakashan and Chimakuan are known to form a complex Sprachbund with Salishan, with intensive language contact between the three groups going back many centuries (138). The idea that these three language families are genealogically related, i.e. Sapir's "Mosan" hypothesis (139), has been almost universally abandoned. Nikolaev (128, 129) claims that Wakashan and Chimakuan are genealogically related to Nivkh and Salish-Algic at a deeper time level, but such a genealogy would contradict the known human genetic and archaeological parallels. If all of these languages are genealogically related, the families on the North American side would almost certainly form a taxon together, as would Nivkh and Chukotko-Kamchatkan on the Asian side. The status of Wakashan and Chimakuan remains that of two separate primary families linked with Salishan (and to a lesser extent with other Pacific Northwest language families) through a pervasive areal contact.
The two remaining primary families of the greater Pacific Northwest language area are Tsimshianic and Haida. Both appear to be genealogically unrelated to one another as well as to the other families of the region. The inclusion of Haida in Na-Dene has almost been universally abandoned, with shared features between Haida and Tlingit better explained as arising through long-term contact. The suffixing Haida verb is fundamentally different than the prefixing NaDene (or deeper Dene-Yeniseian) finite verb complex, with its many shared idiosyncratic homologies (136). Tsimshianic has been hypothesized to be somehow connected to the Penutian languages spoken farther south in western North America. One shared feature of these languages is the N/M pronoun pattern found widely in western portions of the Americas. If this pattern is indeed a relic of distant language relatedness in some or all of the primary families in which it appears, the language family (or phylum) in question would probably date back to the end of the Pleistocene, possibly spreading in connection with the Clovis Culture, though other scenarios are plausible, as well. In any event, Tsimshianic and Haida appear to be relics in the Pacific Northwest that descend from languages originating in ancient Beringia rather than brought from Northeast Asia in the Late Pleistocene. The same also may be true of Wakashan and Chimakuan. The ancestors of all of these languages may have been spoken by NNA populations that did not originally mix with these Asian newcomers, the mixing of genes (as well as linguistic traits) occurring only later in the broader Pacific Northwest convergence zone.
Table S24 below presents a theoretically plausible layering of language connections across Bering Strait. The term "family" is used for Holocene-Era genealogical units that might be proven using the traditional Comparative Method, while the term "phylum" refers to hypothetical unity dating back to Pleistocene times that may not be fully demonstrable.
Table S24. Hypothesized Cross-Bering Linguistic Layers (youngest to oldest)
A. Uralo-Siberian Family, containing Uralic, Yukaghir, Eskaleut (historically connected with Mid- to Late-Holocene gene flow out of northeastern Asia)
B. Dene-Yeniseian Family, linking Yeniseian with Tlingit, Eyak and Dene languages (associated with Mid-Holocene gene flow out of central Siberia)
C. Trans North Pacific Phylum, linking Nivkh and Chukotko-Kamchatkan with Algic and Salishan, and possibly also with Wakashan, Chimakuan, and/or Kutenai (historically connected with the hypothetical Late Pleistocene gene flow between proto-Siberians and proto-Americans that happened in Siberia or in Beringia)
D. Languages descending from phyla originating in the Pleistocene Beringian Standstill, including Tsimshianic, Haida, and all language families south of the Pacific Northwest (presumably the ancestors of all of these families have Ancient Beringian origins with possible deeper connections either to ANE or Ancient East Asians)
The languages subsumed under D in Table S24 are presently classified into at least 110 primary families that cannot plausibly be linked to any known Holocene or Late Pleistocene migration out of Eurasia. These languages, with roots in the Beringian Standstill, might represent a single phylum or multiple phyla, depending upon how many distinct proto-languages were brought south into the Americas during the Late Pleistocene. The sheer typological diversity of documented families that are not connectable to groups A, B or C in Table S24, together with the multiple distinct folklore connections with Old World groups observed between the peoples who speak them (125), suggests that ancestral Native Americans populations brought several protolanguages into the Americas from an earlier source in ancient Beringia. Furthermore, these languages may have had completely distinct origins in ancient northern Eurasia or East Asia. In any case, the existence of a Trans North Pacific Phylum with its distinct, non-Beringian origin would already demonstrate the untenability of Greenberg's (132) Amerind Hypothesis, which claimed that all indigenous New World languages except Eskaleut and Na-Dene (into which Haida was erroneously included) comprised a single phylum.
Although linguistics cannot provide the degree of confidence found in archaeological and human genetic studies of ancient prehistory dating back to Pleistocene times, it still can contribute something to the emerging synthesis about the early peopling of the Americas. The language layering across Bering Strait described here involves the logical consideration of speculative possibilities rather than actual evidence or proof. The foregoing discussion has modeled a variety of logically possible scenarios for how distinct population movements out of Beringia or Asia could be reflected in North America's linguistic diversity. In particular it has identified a possible linguistic parallel matching the gene flow between proto-Siberians and proto-Americans ~16,000 yBP with what has been called the "Trans North Pacific language phylum" encompassing Nivkh and Chukotko-Kamchatkan on the Asian side and Salishan and Algic (as well as possibly Kutenai, and less likely also Wakashan and Chimakuan) in North America. This hypothesis could also explain the existence of shared Trans North Pacific folklore motifs such as the Raven Creator. The degree of affinity in both folklore and language typology would be difficult to explain as arising either from Holocene-Era contact, for which there is no evidence, or surviving from Ancient Beringian times, which would not explain the discrete patterning on both sides of the North Pacific Rim.
Finally, it should be noted that there is no evidence that Eskaleut, Na-Dene, or Algic are genealogically related. The stark differences between these major language families spoken by populations with the NNA genetic component argues against their ancestral unity and favors the scenario of multiple Asian origins for the populations and languages indigenous to northern North America.