Showing posts with label language. Show all posts
Showing posts with label language. Show all posts

Tuesday, September 9, 2025

The Huns Were Paleo-Siberian, Not Linguistically Turkic (Also Slavic Origins)

A new paper makes a strong case that the Huns, a group of "barbarians" (in the eyes of Roman historians) who made multiple attempts to invade the Roman empire, spoke a Paleo-Siberian language (to which the Na-Dene languages of North America, such as Navajo, are distantly related), rather than a Turkic language, as conventional wisdom in historical linguistics prior to this paper had wrongly believed.



The Xiōng-nú were a tribal confederation who dominated Inner Asia from the third century BC to the second century AD. Xiōng-nú descendants later constituted the ethnic core of the European Huns. It has been argued that the Xiōng-nú spoke an Iranian, Turkic, Mongolic or Yeniseian language, but the linguistic affiliation of the Xiōng-nú and the Huns is still debated. 
Here, we show that linguistic evidence from four independent domains does indeed suggest that the Xiōng-nú and the Huns spoke the same Paleo-Siberian language and that this was an early form of Arin, a member of the Yeniseian language family. This identification augments and confirms genetic and archaeological studies and inspires new interdisciplinary research on Eurasian population history.
Svenja Bonmann et al, "Linguistic Evidence Suggests that Xiōng‐nú and Huns Spoke the Same Paleo‐Siberian Language," Transactions of the Philological Society (June 16, 2025). DOI: 10.1111/1467-968X.12321

A news report about the paper spells it out this hypothesis at greater length:
New linguistic findings show that the European Huns had Paleo-Siberian ancestors and do not, as previously assumed, originate from Turkic-speaking groups. The joint study was conducted by Dr. Svenja Bonmann at the University of Cologne's Department of Linguistics and Dr. Simon Fries at the Faculty of Classics and the Faculty of Linguistics, Philology and Phonetics at the University of Oxford.

The results of the research, "Linguistic evidence suggests that Xiōng-nú and Huns spoke the same Paleo-Siberian language," have been published in the journal Transactions of the Philological Society.

On the basis of various linguistic sources, the researchers reconstructed that the ethnic core of the Huns—including Attila and his European ruling dynasty—and their Asian ancestors, the so-called Xiongnu, shared a common language. This language belongs to the Yeniseian language family, a subgroup of the so-called Paleo-Siberian languages. These languages were spoken in Siberia before the invasion of Uralic, Turkic and Tungusic ethnic groups. Even today, small groups who speak a Yeniseian language still reside along the banks of the Yenisei River in Russia.

From the 3rd century BCE to the 2nd century CE, the Xiongnu formed a loose tribal confederation in Inner Asia. A few years ago, during archaeological excavations in Mongolia, a city was discovered that is believed to be Long Cheng, the capital of the Xiongnu empire. The Huns, in turn, established a relatively short-lived but influential multi-ethnic empire in southeastern Europe from the 4th to 5th centuries CE.

Research has shown that they came from Inner Asia, but their ethnic and linguistic origins have been disputed until now, as no written documents in their own language have survived. A great deal of what we know about the Huns and the Xiongnu is therefore based on written documents about them in other languages; for example, the term "Xiōng-nú' derives from Chinese. 

 

[Based on the "World Topographic Map" by Esri. Sources: Esri, HERE, Garmin, Intermap, INCREMENT P, GEBCO, USGS, FAO, NPS, NRCAN, GeoBase, IGN, Kadaster NL, Ordnance Survey, Esri Japan, METI, Esri China (Hong Kong), OpenStreetMap contributors, GIS User Community, Simon Fries. Created with QGIS 3.36.]. Credit: Transactions of the Philological Society (2025). DOI: 10.1111/1467-968X.12321

From the 7th century CE, Turkic peoples expanded westwards. It was therefore assumed that the Xiongnu and the ethnic core of the Huns, whose own westward expansion dates back to the 4th century CE, also spoke a Turkic language. However, Bonmann and Fries have found various linguistic indications that these groups spoke an early form of Arin, a Yeniseian language, in Inner Asia around the turn of the millennium.

"This was long before the Turkic peoples migrated to Inner Asia and even before the splitting of Old Turkic into several daughter languages. This ancient Arin language even influenced the early Turkic languages and enjoyed a certain prestige in Inner Asia. This implies that Old Arin was probably the native language of the Xiongnu ruling dynasty," says Bonmann.

Bonmann and Fries analyzed linguistic data based on loan words, glosses in Chinese texts, proper names of the Hun dynasty as well as place and water names. Taken by itself, the data on each of these aspects would have comparatively little significance, but taken together it is hard to argue with the conclusion that both the ruling dynasty of the Xiongnu and the ethnic core of the Huns spoke Old Arin.

The findings of the study also made it possible for the first time to reconstruct how the Huns came to settle in Europe: For the two researchers, place and water names still prove today that an Arin-speaking population once left its mark on Inner Asia and migrated westwards from the Altai-Sayan region. Attila the Hun probably also bears an ancient Arin name: Until now, "Attila" was thought to be a Germanic nickname ("little father"), but according to the new study, "Attila" could also be interpreted as a Yeniseian epithet, which roughly translates as "swift-ish, quick-ish."

The new linguistic findings support earlier genetic and archaeological findings that the European Huns are descendants of the Xiongnu. "Our study shows that alongside archaeology and genetics, comparative philology plays an essential role in the exploration of human history. We hope that our findings will inspire further research into the history of lesser-known languages and thereby contribute further to our understanding of the linguistic evolution of mankind," concludes Fries.

In the body text, a section of the paper explores the previous conventional wisdom and its difficulties:

Although direct evidence is lacking, Iranian, Turkic and Mongolic languages have all been proposed as the language of the ruling dynasty of the Xiōng-nú (cf. e.g. Shiratori 1900; Benzing 1959; Pritsak 1982; Bailey 1985; Dybo 2007; Janhunen 2010; Beckwith 2018; Beckwith 2022) and of the Huns (cf. e.g. Doerfer 1973; Pritsak 1982; Savelyev 2020; Savelyev & Jeong 2020), because in the 1st millennium AD languages from these three families were spoken in Inner Asia. Inscriptions dating between the 4th and 9th century AD demonstrate that Iranian languages (Sogdian, early 4th to 6th century AD, Sims-Williams 2011; Vovin 2018) and Mongolic ones (Khüis Tolgoi and Bugut inscriptions of the 5th–6th centuries AD, Vovin 2018) as well as, much later, Turkic languages (isolated Turkish phrases in Bactrian manuscripts of the 7th century AD, Orkhon and Yenisei Kirgiz inscriptions between the early 8th and 9th century AD, Erdal 2004: 4–8) were spoken in the territory between the Yenisei River in the West, the Tian Shan range in the South and Mongolia in the East. Other Indo-European languages were spoken in oasis cities along the northern and southern ridges of the Takla Makan desert in the 1st millennium AD including Indo-Iranian (Iranian Khotanese and Tumshuqese Saka, Bactrian, Indo-Aryan Prakrit, Sanskrit) and ‘Tocharian’ languages (Agnean and Kuchean).

However, this linguistic situation of a coexistence of Iranian, Turkic and Mongolic in Inner Asia can only be reliably established as such for the late 1st millennium AD. Hypotheses on an Iranian, Mongolic or Turkic identity of the Xiōng-nú primarily rest on written sources post-dating the Xiōng-nú era
While the theoretical possibility of a Mongolic or Turkic presence in Inner Asia already at the beginning of the common era cannot be ruled out a priori, it is important to note that there is, on the other hand, also no robust evidence – especially from textual sources – that could directly imply or prove a Turko-Mongolic presence in this area at such an early date. 
The earliest sources from the Tarim Basin and the territories alongside the Oxus River/Amu Darya (Chorasmia, Sogdia, Bactria) only document Indo-European languages from the Indo-Iranian and ‘Tocharian’ branches (to which might be added, as a cultural import, also Ancient Greek in Macedonian colonies). Judging by more indirect evidence – especially loanwords in other languages, toponyms, etc. – other Iranian languages, namely different Sakan varieties (Tremblay 2005) and ‘Old Steppe Iranian’ (Bernard 2023), must have been spoken in the steppe corridor from the Kazakh steppe to Dzungaria, and perhaps even to Gansu (see Beckwith 2022). It is only centuries later, namely in the Migration Period of the 5th–6th centuries AD, that a (Para-)Mongolic language might be attested in Inner Asia (Vovin 2018), and fragments of this (Para-)Mongolic language, in turn, are still much earlier documented than the earliest secure Turkic words dating from the 7th century AD.

There is thus neither direct nor indirect evidence supporting the claim of a Mongolic or Turkic presence in Inner Asia between the 3rd century BC and the 2nd century AD, and the hypothesis of a Mongolic or Turkic identity of the ethnic core of the Xiōng-nú (as proposed by Benzing 1959, Pritsak 1982; Tenišev 1997; Dybo 2007; Janhunen 2010; Savelyev 2020) is thus rather unlikely from the outset, as is the hypothesis of a completely unknown or unclassifiable language without any living descendants (as proposed by Doerfer 1973). The same applies to the Huns: there is a complete lack of evidence supporting claims of a Turkic presence among the Huns.1 On the other hand, an Iranian component in the Xiōng-nú Empire is possible, and indeed quite likely, although, as we intend to point out with the present study, such Indo-European ethnicity must not necessarily have been shared by the ruling dynasty or ethnic core of the Xiōng-nú (pace Bailey 1985; Beckwith 2022) or the Huns.

Concerning such an Iranian component, (Beckwith 2018, 2022) has argued recently that Xiōng-nú words preserved in Chinese texts are indicative of an Iranian language, which he calls ‘East Scythian’. However, his interpretation depends on a reconstruction of the Old and Middle Chinese pronunciation of Chinese signs which significantly differs from established reconstructions such as the classic one of Pulleyblank, and which has also been criticised by Vovin et al. (2016: 129–30). In addition to this, his Iranian etymologies must be met with serious doubts. For instance, the ethnonym ‘Aryan’, which is amply attested in many Indo-Iranian languages, is given by Beckwith with a word-initial laryngeal sound (discussion in Beckwith 2022: 183–86, cf. particularly p. 186): ‘East Scythian *ḥarya [ɣa.rya] “noble, royal; Scythian” → Old Chinese *ḥaryá 夏/*ḥâryá 華 “royal; Chinese, China”’. This would indeed be a remarkable Iranian word form, because no Indo-Iranian language points to an initial laryngeal (†Hā̆ri̯a- vel sim.): A word-initial laryngeal should have left direct traces in Persianide languages (see Kümmel 2018), but Old Persian <ariy-> /ariya-/ or inscriptional Middle Persian ēr ‘Iranian’ do not preserve such a sound. The hypothetical (East) Scythian would be the only Iranian language to preserve it, and independent evidence for this is entirely lacking. Other etymologies equally rest upon highly questionable ad hoc assumptions on Iranian historical phonology and must accordingly be dismissed (e. g. the etymology of Old Turkic täŋri ‘heaven’ that Beckwith 2022: 195, 203 wants to derive from an East Scythian *tagri through the application of an alleged Scythian syllable contact law of nasalization completely unheard of in the specialist literature and remaining without any reliable parallel; on this word rather cf. Georg 2001).

It must therefore be conceded that, while it is a priori likely that Iranian tribes were one factor among others in the ethnolinguistic melting pot of the eastern Eurasian steppe some 2000 years ago (the Sakan languages would be a good starting point for further research in this direction), the evidence adduced by scholars in favour of a dominant role of Iranian groups and their languages in the Xiōng-nú empire so far does not follow the rigorous methodological standards of Historical-Comparative Linguistics and is therefore insufficient to allow for any reliable inferences.

Etymological analyses of Xiōng-nú glosses in Chinese sources (collected by Pulleyblank 1962, criticised and reanalysed by Dybo 2007), complemented by the interpretation of the so-called Jié couplet, the only short text preserved in the Xiōng-nú language,2 have led to a more promising alternative hypothesis. This hypothesis acknowledges both the multi-ethnic composition of the Xiōng-nú empire as such and the presence of Indo-European and specifically Iranian languages in Inner Asia at the beginning of the common era, yet adds to the complexity the idea that the native language of the ruling dynasty of the Xiōng-nú empire might have been a Yeniseian one (Ligeti 1950; Pulleyblank 1962; Dul'zon 1966; Dul'zon 1968; Vovin 2000; Vovin 2003; Vovin 2007; Werner 2014; Vovin 2020). Yeniseian languages are usually considered remnants or survivors of the original linguistic diversity of Siberia, historically spoken in retreat areas as the result of several waves of superimposition or displacement by expanding Uralic/Samoyedic, Turkic and Tungusic languages. Therefore, Yeniseian languages are also known as Paleo-Siberian languages.3 Several different Yeniseian languages were spoken in the 18th century AD alongside the middle reaches of the Yenisei River and some of its tributaries, yet this probably reflects a northward migration from a point of departure further south, around the headwaters of the Yenisey, the Ob and the Irtyš rivers (see Dul'zon 1959a; Dul'zon 1959b; Dul'zon 1964; Maloletko 1992; Maloletko 2000; Vajda 2019: 194–95; cf. also Janhunen 2020: 167). From the six historically attested Yeniseian languages Ket, Yugh, Kott, Assan, Arin and Pumpokol, it has so far been suggested that Ket/Yugh (Ligeti 1950; Pulleyblank 1962) or Pumpokol (Vovin 2000, 2003, 2007, 2020; Vovin et al. 2016) may have been the native language of the Xiōng-nú ruling dynasty.

Adding value to this hypothesis is the fact that the northward migration of Yeniseian-speaking groups, as reflected in toponyms, from the Altai-Sayan area would well agree with detailed historical studies considering Indic, Iranian and Chinese written sources (de la Vaissière 2005; de la Vaissière 2014). These studies indicate that, following the eventual demise of their steppe empire, remnants of the Xiōng-nú migrated to the north of the Altai-Sayan Mountain ranges in the mid-2nd century AD and that this retreat area was the starting point of a secondary expansion of Xiōng-nú descendants roughly two hundred years later, between ca. 350–370 AD. This expansion occurred in three directions: One migratory trajectory led northward and left traces in the form of toponyms. This population movement downstream of the major rivers Yenisey, Ob and Irtyš perfectly explains the linguistic situation as documented for the first time in the 18th century and provides a direct link between Yeniseian languages and the Xiōng-nú. Another migratory route led to southern Asia and involved groups known from Iranian and Indic sources as Chionites, Kidarites, Hephthalites, Alchons as well as the so-called Huṇa (cf. Pfisterer 2013). A third migratory trajectory led westward, into Europe and involved the Huns who appeared in Eastern Europe in 370 and posed a threat to Roman hegemony until Attila's death in 453, the Battle of Nedao shortly afterwards and the ensuing disintegration of their confederation (cf. e.g. Heather 1996; Bóna 2002; Halsall 2007; Schmauder 2009; Maas 2014; Pohl 2022).

Several nomadic groups of late Antiquity that originated in Inner Asia and migrated to the southern and western peripheries of the Eurasian landmass apparently used the same ethnonymic constituent (Chion-ites – Al-chon – Huṇa – Huns; cf. de la Vaissière 2005; de la Vaissière 2014, but see Atwood 2012), and the traditional hypothesis of a link between the ethnic core of the European Huns of the 4th–5th centuries AD and the Inner Asian Xiōng-nú of the 3rd century BC–2nd century AD, first proposed by the French scholar Joseph de Guignes in the 18th century, has, strictly speaking, never been falsified (de la Vaissière 2005: 15). 
A genetic connection between the Xiōng-nú and the Huns is usually considered unlikely in modern archaeological and historical scholarship (e.g. Beckwith 2009: 72; Savelyev & Jeong 2020; Pohl 2022; Maenchen-Helfen 1944–1945; Maenchen-Helfen 1955; Maenchen-Helfen 1973; Schmauder 2009), partly because of the large chronological gap between the dissolution of the Xiōng-nú empire in the 2nd century AD and the appearance of the Huns in the 4th century AD, and partly because only two archaeological features render a connection likely: large bronze cauldrons of a certain type and artificially deformed or elongated skulls (Pohl 2022: 147).

Despite the prevailing scepticism of historians and archaeologists, the hypothesis of a connection between the Xiōng-nú and the Huns has been corroborated recently by previously unknown and unavailable genetic data analysed by Gnecchi-Ruscone et al. (2025): ‘(…) long-shared genomic tracts provide compelling evidence of genetic lineages directly connecting some individuals of the highest Xiongnu-period elite with 5th to 6th century AD Carpathian Basin individuals, showing that some European Huns descended from them’
On the provision that there was indeed some continuation between the ethnic core of the European Huns and the former Xiōng-nú, the ruling classes of both multi-ethnic confederations may have spoken the same language in two different diachronic stages (an older form and a younger one), implying that the identification of the linguistic affiliation of one of these groups probably also means identifying the native language of the other group
In the following, we will discuss previously unknown linguistic evidence from four domains independently supporting such a connection and thus corroborating the recent archaeological and genetic findings: (1) loanwords, (2) glosses, (3) anthroponyms and (4) toponyms/hydronyms.

This analysis, which moves the Turkic and Tungistic migrations several centuries later in history than previously believed, is also relevant to the Altaic linguistic hypothesis and our understanding of these ethnic mass migrations more generally.

Close in time and space: Slavic ethnogenesis

The Slavic people emerged around the same time as the fall of the Roman Empire and the demise of the short lived Hunnic Kingdom in the Balkans, but before the Magyar conquest of what is now called Hungary and before the appearance of Gypsies in Europe. This period was traditionally called the "Dark Ages" in Europe. There are some historical roots, however, which suggest Slavic origins several centuries earlier (from the Wikipedia link at the start of this paragraph):

Ancient Roman sources refer to the Early Slavic peoples as "Veneti", who dwelt in a region of central Europe east of the Germanic tribe of Suebi and west of the Iranian Sarmatians in the 1st and 2nd centuries AD, between the upper Vistula and Dnieper rivers. Slavs – called Antes and Sclaveni – first appear in Byzantine records in the early 6th century AD. Byzantine historiographers of the era of the emperor Justinian I (r. 527–565), such as Procopius of Caesarea, Jordanes and Theophylact Simocatta, describe tribes of these names emerging from the area of the Carpathian Mountains, the lower Danube and the Black Sea to invade the Danubian provinces of the Eastern Empire.

Jordanes, in his work Getica (written in 551 AD), describes the Veneti as a "populous nation" whose dwellings begin at the sources of the Vistula and occupy "a great expanse of land". He also describes the Veneti as the ancestors of Antes and Slaveni, two early Slavic tribes, who appeared on the Byzantine frontier in the early-6th century.

Procopius wrote in 545 that "the Sclaveni and the Antae actually had a single name in the remote past; for they were both called Sporoi in olden times". The name Sporoi derives from Greek σπείρω ("to sow"). He described them as barbarians, who lived under democracy and believed in one god, "the maker of lightning" (Perun), to whom they made sacrifice. They lived in scattered housing and constantly changed settlement. In war, they were mainly foot soldiers with shields, spears, bows, and little armour, which was reserved mainly for chiefs and their inner circle of warriors. Their language is "barbarous" (that is, not Greek), and the two tribes are alike in appearance, being tall and robust, "while their bodies and hair are neither very fair or blond, nor indeed do they incline entirely to the dark type, but they are all slightly ruddy in color. And they live a hard life, giving no heed to bodily comforts..."

Jordanes describes the Sclaveni as having swamps and forests for their cities. Another 6th-century source refers to them living among nearly-impenetrable forests, rivers, lakes, and marshes.

Menander Protector mentions Daurentius (r. c. 577 – 579) who slew an Avar envoy of Khagan Bayan I for asking the Slavs to accept the suzerainty of the Avars; Daurentius declined and is reported as saying: "Others do not conquer our land, we conquer theirs – so it shall always be for us as long as there are wars and weapons".

The Slavic languages are a relatively recent offshoot of the Indo-European Baltic languages, which in turn may be the most direct descendants of the language(s) of the Corded Ware culture (ca. 3000 BCE to 2350 BCE).

Eurogenes reports on new ancient DNA driven discoveries drawn from the earliest ancient Slavic DNA at his blog.

A paper dealing with the origin of Slavic speakers, titled Ancient DNA connects large-scale migration with the spread of Slavs, was just published at Nature by Gretzinger et al. (see here).

The dataset from the paper includes ten fascinating ancient samples from Gródek upon the Bug River in Southeastern Poland. These individuals are dated to the so called Tribal Period (8th – 9th centuries), and, as far as I know, they represent the earliest Slavic speakers in the ancient DNA record.

The really interesting thing about these early Slavs is that they already show some Germanic and other Western European-related ancestries. Nine of the samples made it into my G25 analysis (see here). In the Principal Component Analysis (PCA) plots . . . five of them cluster near present-day Ukrainians, while the rest are shifted towards present-day Northwestern and Western Europeans. . . .  GRK015, a female belonging to Western European-specific mtDNA haplogroup H1c, shows Scandinavian ancestry. On the other hand, GRK014, a female belonging to the West Asian-specific mtDNA haplogroup U3b, probably has Southern European ancestry.
These results aren't exactly shocking, because the people who preceded the early Slavs in the Gródek region were Scandinavian-like and associated with the Wielbark archeological culture. In other words, they were probably Goths who also had significant contacts with the Roman Empire.

However, it's not a given that the ancestors of the Tribal Period Slavs mixed with local Goths. It's also possible that they brought the western admixture, or at least some of it, from the Slavic homeland, wherever that may have been.

That's because the early Slavs who migrated deep into what is now Russia also showed Western European-related admixture. This is what Gretzinger et al. say on page 74 of their supplementary info (emphasis is mine):
The only deviation from this pattern is observed for ancient samples from the Russian Volga-Oka region, where we measure higher genetic affinity between present-day Southern/Western Europeans and the SP population compared to the pre-SP population (Fig. S17). This agrees with the pattern observed in PCA and ADMIXTURE that, in contrast to the Northwestern Balkan, Eastern Germany, and Poland-Northwestern Ukraine, the arrival of Slavic-associated culture in Northwestern Russia was associated with a shift in PCA space to the West, a decrease of BAL [Baltic] ancestry, and the introduction of Western European ancestries such as CNE [Continental North European] and CWE [Continental Western European].
Thus, it's highly plausible that the Tribal Period Slavs from Gródek were very similar, perhaps even practically identical, to the proto-Slavs who lived in the original Slavic homeland. Hopefully we won't have to wait too long to discover whether that's true or not. More Migration period and Slavic period samples from the border regions of Belarus, Poland and Ukraine are needed to sort that out.

Eurogenes goes on to criticize a suggestion in the supplemental materials to the Slavic ancient DNA paper that suggests that 

Sycthian groups from Ukraine show varying fractions of South Asian ancestry (between 5% and 12%), a component present in many ancient individuals from Moldova, Ukraine, Western Russia, and the Caucasus, but (nearly) absent in the SP genomes from Central and East-Central Europe (<5%). [Ed. references to specific samples showing this omitted.]

Eurogenes, rightly, explains that the data are really showing European introgression into South Asia arising from the Indo-Aryan invasion of the region in the Bronze Age, and before that from Iran. 

Wednesday, September 3, 2025

The Climate Driven Narrative Of The Prehistory of India

 


These maps are useful in shedding light on the prehistory of India. 

The Harappans

The Gangetic plain is predominantly in the temperate zone which has the highest population density in India and the most agriculturally oriented economy. 

This region had agriculture very early on, probably no later than 4000 BCE and possibly sooner, initially using Fertile Crescent Neolithic crops, via the Caucuses and Iran in its formative era, whose migrating farmers account for a significant share of the genetic roots of of the Indus Valley Civilization (a.k.a. the Harappan civilization a.k.a. the Meluhha). The Harappan civilization later adopted rice as a crop as well, via Austroasiatic migrants from Southeast Asia who gave rise to the Munda languages of India, ca. 2000 BCE.

We know from well dated residues on Harappan pots, by the way, that curry was a Harappan invention that predated the Indo-Aryans and the Munda people.

The prevailing view is that Harappan society was united politically, perhaps in a federation of city-states, and was largely free of war or fortifications, apart from some trade outposts on its frontiers, until it collapsed. It has relatively modern plumbing for the Bronze Age and its cities lack obviously palatial complexes, that one might associate with a more hierarchal society dominated by local kings who simply enriched themselves. 

Harappan society had its own script. The majority view is that this was a proto-script used for accounting and trade purposes, similar to the Vinca script in the Neolithic Balkans and the earliest Sumerian cuneiform inscriptions, but not a full fledged written language.

Harappans had trade connections to Sumeria, where they had a historically attested trade colony, and based on those records, know that the Harappans called themselves the Meluhha

Meluḫḫa or Melukhkha (Sumerian: 𒈨𒈛𒄩𒆠 Me-luḫ-ḫaKI) is the Sumerian name of a prominent trading partner of Sumer during the Middle Bronze Age. . . . most scholars associate it with the Indus Valley Civilisation. . . . Sumerian texts repeatedly refer to three important centers with which they traded: Magan, Dilmun, and Meluhha. The Sumerian location of Magan is now accepted to be the area currently encompassing the United Arab Emirates and Oman. Dilmun was a Persian Gulf civilization which traded with Mesopotamian civilizations. The current scholarly consensus is that Dilmun encompassed Bahrain, Failaka Island and the adjacent coast of Eastern Arabia in the Persian Gulf.

The Harappans also had trade (and possibly a sphere of influence) in the adjacent region of Central Asia known as the BMAC (for Bactria–Margiana Archaeological Complex a.k.a. Oxus culture).


The BMAC culture (ca. 2400 BCE to 1700 BCE with some dispute over the dates on each end of this range) was wedged between the Indo-European Andronovo culture (ca. 2000 BCE to 1150 BCE) and the terminal Harappan/Indo-Aryan transition Cemetery H (ca. 1900 BCE to 1300 BCE), and Painted Gray Ware (ca. 1300 BCE to 300 BCE) cultures. 

The BMAC would have fallen to Indo-European advances before the Indo-Aryans arrived in what remained of the Harappan region, and before the proto-Indo-Iranians arrived in Iran.

The Yaz culture, which had previously been part of the BMAC cultural region, was abruptly replace by an Indo-Iranian early Iron Age culture around 1500 BCE, and is a likely source of the Avestan language and what became the Zoroastrian religion associated with it. The Yaz culture persisted until it was absorbed by the Achaemenid Empire in the 300s BCE.

The Ghaggar-Hakra river f.k.a. the Saraswati River, which features prominently in the Rig Vedas, once coursed through what is now arid Indian steppe along the dashed orange route shown in the map above. Its now mostly dry bed is the home of many Harappan ruins.

But, the Saraswati River dried up around 2000 BCE, as part of a major climate event which made the region more arid that stretched, at a minimum from Egypt to India, and caused the Middle Bronze Age Late Harappan society to collapse.

As explained, for example, in Sujay Rao Mandavilli, "The Demise of the Dravidian, Vedic and Paramunda Indus Hypotheses: A brief explanation as to why these three Hypotheses are no longer tenable" SSRN (August 25, 2020) and the sources cited therein:
Dravidian languages, Sanskrit or Paramunda languages could not have been candidates for the [language of the] Indus Valley Civilization which flourished from 2600 BC to 1900 BC in the North-West of India and Pakistan.

If anything, the evidence that none of these three historical linguistic hypotheses can be supported by the evidence is stronger now than it was twenty-five years ago. 

Like the Paleo-European languages (of which the only survivor is Basque), the Harappan language can probably never be fully reconstructed, even if we can glean some knowledge of it from inscriptions in Harappan script, and from substrate influences in Sanskrit and related languages, and areal influences on the Munda languages.

The Indo-Aryans

The Indo-Aryans, who had wheeled chariots and domesticated horses, and were used to a more arid climate, rushed in to fill the political vacuum. They may have had some farming, or may have ruled subject societies of farmers, but were ancestrally herders.

The Indo-Aryans arrived from Central Asia, ca. 2000 BCE - 1500 BCE, bringing with them Sanskrit, that later diversified into the various Indo-European languages of South Asia. The language of the Indo-Aryans extinguished the Harappan language as anything but a substrate influence on Sanskrit.

Another branch of the same people at about the same time gave rise to the oldest Indo-Iranians and the Mittani Kingdom in Northeast Mesopotamia in mostly what is now Iran. 

We know that Indo-Aryans were the conquerors and not the conquered, because Indo-Aryan genes are more common in higher castes in India, and because these genes have origins (confirmed with ancient DNA) from outside the Indian sub-continent.

The religion of the Indo-Aryan migrants and the Harappan religion mutually influenced each other to produce the early Vedic religion, that would eventually give rise to modern Hinduism. But, given the differences between the Vedic religion and the religions that emerged in other places that the Indo-Europeans conquered, and the Rig Vedic references to things like the Saraswati river societies that were gone or almost gone by the time that the Indo-Aryans arrived, we know that the Indo-Aryan religious tradition and the Harappan religious tradition both influenced the fused religion that emerged from their fused culture.

Genetic evidence tells us that more than one wave of Indo-Aryan migration affected the area of India where Indo-Aryan languages are now spoken.

The Dravidians

The monsoon driven tropical region (in a medium blue on the first map), that makes up most of Southern India, didn't adopt agriculture until the South Indian Neolithic revolution, around 2500 BCE, had significant reliance upon crops domesticated in the African Sahel, and even then, wasn't as optimal for agriculture. The Fertile Crescent package of crops wasn't naturally suited to this climate region and it took many centuries for these crops, under the close guidance of early farmers, to adapt to this tropical monsoon driven climate.

The Dravidian language family probably has its roots in the South Indian Neolithic revolution, probably from one of the South Asian hunter-gatherer population that was one of the first to adopt agriculture, possibly with some linguistic influence from the Africans who brought the Sahel African crops that made this Neolithic Revolution possible.

While the Harappan society had trading posts at the fringe of what was Dravidian India at the time, mostly along the northwest coast of the Deccan peninsula, the Harappans almost surely did not speak a Dravidian language and had only thin trade ties with Dravidian society.

The first wave of Indo-Aryan migration reached the Dravidian society, leaving traces in genetic admixture found in lower amounts in almost everyone in India, even Dravidians. In this initial wave, the fused Vedic religion took hold (although the preference for vegetarianism found among the formerly Harappan regions did not), and all but a small core of this region probably experienced language shift, with their local Dravidian dialects going extinct. The Dravidian society on the eve of the Indo-Aryan arrival was not as technologically advanced as the Harappan one, but also may not have been in as advanced a state of collapse as the Harappan civilization. The climate event that impacted the Harappans so decisively, may not have affected their part of the Deccan peninsula so strongly. 

But in Southern India, the Indo-Aryans were spread thin, were in an eco-region less familiar to them, and less completely dominated the Dravidian society. The core region )probably within the region where Telugu is now spoken) that held out expanded and reconquered almost all of the former Dravidian territory, bringing its sole surviving Dravidian dialect with it (which is why the Dravidian language family seems much younger than one that stretches back to the South Indian Neolithic). But this reconquest never ended up replacing the Vedic religion that had replaced or absorbed its own religion (which may not have been as well-developed as the Harappan religion, and arose in an illiterate society). Unlike Northern India, which had multiple waves of Indo-Aryan migration, no later big wave of Indo-Aryan migration followed the Dravidian reconquest of Southern India.

The geographic range of the Dravidian languages prior to the arrival of the Indo-Aryans was probably wider than it is today, possibly extending to the fringes of the Indus River Valley civilization, as indicated by toponyms in these regions.

But, the North Dravidian languages (Brahui in what is now Pakistan, and the Kurukh-Malto languages of Northeast India), were probably not part of the original Dravidian language range and were the product of much later colonizations (probably around 1000 CE in the case of Brahui, where it was spoken as a result of an elite driven language shift, similar to the one that occurred at about the same time in what is now Hungary, rather than a mass migration of Dravidians to the region). Oral traditions among the Krukh-Malto peoples, at least, assign their origins to Dravidian homelands further south.

Thursday, August 28, 2025

Some Linguistic Hypotheses

* I think that it is very likely that the Korean language family and the Japanese language family are related, even if it is challenging to find "smoking gun" evidence of it today. Japanese may have also have some Manchurian linguistic influence. The broader Altaic hypothesis has less strong support, but there may be something to it.

* I think that it is very likely that the Dravidian language family was influenced by an African language family, with the vectors of that transmission probably being people from the Horn of Africa who also brought some key African Sahel domesticates to Southern India around the time of the South Indian Neolithic ca. 2500 BCE. 

* The Harappan language is almost surely not Indo-European, not Dravidian, and not Munda as a language family. It could conceivably have some connection to language isolates in the general region known as Indo-Pacific languages, or it might not. It is probably the main substrate influence on Sanskrit and through Sanskrit on the other Indo-European languages of India. The script associate with it was probably a proto-script, like a set of emojis or trademarks, and not a full written language. The same is true of the early Vinca script used in the Neolithic Balkans.

* I think that it is very likely that Indo-Aryans (Sanskrit speaking derived people) conquered almost all of India sometime in pre-history and imposing their language and the Hindu religion (although not as faithfully to some of its tenants like vegetarianism), except a small last stronghold, more or less in the vicinity of the modern city of Visakhapatnam, which then reconquered territory from the Indo-Aryans, restoring their dialect of the Dravidian language, but not effectively displacing the Hindu religion that the Indo-Aryan conquerors brought with them. This is why the Dravidian language family seems younger than it really is; it's historic linguistic diversity was wiped out at this point with most of its variants extinguished at this time. As I noted in a post at Wash Park Prophet:

[A]reas that are linguistically Indo-Aryan are more likely to be vegetarian than areas that are linguistically Dravidian, Munda or Tibeto-Burmese. Meat eating may reflect a thinner Indo-Aryan influence even in places that experienced a language shift to Indo-Aryan languages. Vegetarianism may alternatively reflect a stronger influence from the pre-Indo-Aryan Harappan society.

* Brahui, a Dravidian language pocket found far from the geographic range of the other Dravidian language, probably was not within the historic range of the Dravidian languages. Instead, it is probably a result of language shift through elite dominance around 1000 CE or so, by some foreign Dravidian warlord or king.

* Sometime around the Copper Age (a.k.a. the Eneolithic) in Anatolia, people from the eastern highlands brought the Hattic language (which preceded the Hittite language) to Anatolia. It is related to Kassite, other Iranian highland languages, and more remotely to most of the Caucasian languages (which are related to each other even if the connections are hard to establish), to Sumerian, and probably to Elamite. It is also probably related to Minoan. One of the litmus tests of all of these languages is that they were ergative. 

Hattic probably replaced the Neolithic language(s) of Anatolia, including the Western Neolithic language which spread across Europe in two main branches, the Linear Pottery culture (LBK) to through the rivers of the north, and the Cardial Pottery culture to more or less along the Mediterranean coast, which was very different from Hattic. The Western Anatolian Neolithic languages were the substrate languages for the Indo-European language in most of Europe, but not in Anatolia where the Hattic language was the substrate. Hattic substrate influence is the reason that Anatolian Indo-European languages like Hittite seem so diverged from other Indo-European languages, because the Hattic society was much healthier when the Indo-Europeans arrived than in other places where the Indo-Europeans conquered Neolithic societies in a state of collapse. The most basil branch of Indo-European was probably that spoken in the Tarim Basin, which was on a frontier with almost no substrate influence.

* It is very likely that the languages of the European hunter-gatherers are completely lost. The Uralic languages arrived much later. In the Americas and Japan and Australia, we know that indigenous hunter-gather language substrates had very little impact on the food producing conquerer languages, even when indigenous peoples made a large genetic contribution to the people speaking the food producer languages.

* Basque, therefore, is very unlikely to be an indigenous European hunter-gatherer language. It could be the last survivor of the language family of the first farmers of Europe rooted in Western Anatolia frmo around 6000 BCE to 4000 BCE, or it could reflect a very distant outpost of a Copper Age language probably in the same language family as Hattic and Minoan. I probably lean towards the Neolithic hypothesis, as the corpus of Hattic (which remained a written liturgical language for a thousand years after the Hittites took over) and of Basque are both large enough that a connection would have been established by linguists by now if it was present, even though both are ergative languages, but the rarity of ergative languages outside the West Asian highlands, ancient Mesopotamia, and places to the east of that, favor a copper age origin for it. The Paleo-Hispanic languages may have all been a coherent group and Tartessos in Southwest Iberia was metal rich and a strong candidate for the source for Plato's Atlantis story. The "Tartessian culture was born around the 9th century B.C. as a result of hybridization between the Phoenician settlers and the local inhabitants. Scholars refer to the Tartessian culture as "a hybrid archaeological culture".

* We know the Etruscan, Raetic, and Lemnian (together called the Tyrsenian languages, an areal designation, since while the connection of Etruscan and Raetic is pretty solid, the linguistic family connection to Lemnian is not, and possibly Camunic as well, although it could also be related to Celtic) are also not Indo-European languages and pre-date Indo-European:

  • Etruscan: 13,000 inscriptions, the overwhelming majority of which have been found in Italy; the oldest Etruscan inscription dates back to the 8th century BC, and the most recent one is dated to the 1st century AD.
  • Raetic: 300 inscriptions, the overwhelming majority of which have been found in the Central Alps; the oldest Raetic inscription dates back to the 6th century BC.
  • Lemnian: 2 inscriptions plus a small number of extremely fragmentary inscriptions; the oldest Lemnian inscription dates back to the late 6th century BC.
  • Camunic: may be related to Raetic; about 170 inscriptions found in the Central Alps; the oldest Camunic inscriptions dates back to the 5th century BC.

The ergative substrate influence probably explains its presence in Indo-European Pashto, Kurdish languages and Indo-Aryan languages, which was shared with Basque and is absent from most Indo-European languages. It suggest that Harappan was probably ergative. The Tyresnian languages apparently non-ergative character suggests that they aren't part of the same language family as Basque, and tends to favor a Copper Age origin for Basque rather than a Neolithic origin for it.

But we haven't deciphered them very well since the corpus of those writings has mostly been lost, and what we have left is mostly monolingual and short. We can't even say with completge confidence that they were all in the same language family, although ancient Rhaetic spoken to the north of Etruscan (not linguistically related to the similarly named modern Indo-European minority language of Switzerland) was probably in the same language family with Etruscan. somewhat conflicting historical evidence suggests that Lemnians were migrants from the Alps and/or northern Italy, probably during the Greek dark ages after Bronze Age collapse had run its course.

We also don't know much about the substrate language that influenced Mycenaean Greek.

Monday, August 18, 2025

A Potentially Good New World Population Genetics Study Bumbled

A paper was published last year about the population genetics and historical genetics of the Blackfoot people. It compared a modest sample of Blackfoot affiliated genomes with other New World and Old World genomes. A small sample size isn't a big problem for a paleo-genetic study, however, because each individual's DNA has so many data points and generations of intermarriage in a fairly closed gene pool makes each individual highly representative of the population as a whole. But while the study does lots of things right, but makes a critical error in its analysis, which seriously detracts from the reliability of the analysis. This error arises from a weak review of the literature and deficient peer-review, which leads to an erroneous analysis.

The big problem with the paper is that it makes flawed assumptions about the peopling of the Americas. It relies on a model in which all Native Americans fit into two groups: Native North Americans (ANC-B) and Central and Southern Americans (ANC-A), and tries to determine where the Blackfoot people fit into that model.

The trouble is that the established paradigm is more complicated. While ANC-A is a valid and pretty much unified group that descend from basically Pacific coast route peoples in a primary founding population wave perhaps 14,000 years ago, Native Americans in North America have a more complex ancestry.

North American Native Americans have the lineages found in ANC-A (which results from a serial founder effect) and probably a least two other clades close in time to the initial founding era that spread into different parts of North America. 

Then, around 3500-2400 BCE, the ancestors of the Na-Dene people migrated to Alaska from Northeast Asia and admixed with pre-existing populations (their languages have remote but traceable connections to the Paleo-Siberian Ket people, whose language family is named after the Yenesian River in central Siberia) and are associated with the Saqqaq Paleoeskimo culture who also were the source of the Dorest Paleo-Eskimo populations (see also here and here) About 10% of Na-Dene ancestry is distinct from the initial founding population of the Americas.[2] The Na-Dene, like Inuits, have Y-DNA haplogroups that are specific to them and of more recent origin that the founding Y-DNA haplogroups of the Americas.[3].

And then, a final significant pre-Columbian wave with lasting demographic impact arrived from Northeast Asia, perhaps around 500s and 600s CE, and they are the ancestors of the Inuits (a.k.a. modern Eskimo-Aleut peoples) who have their roots in an Arctic and sub-Arctic population also known as the Thule. The 6th to 7th century CE Berginian Birnirk culture (in turn derived from Siberian populations) is the source of the proto-Inuit Thule people, who were the last substantial and sustained pre-Columbian peoples to migrate to the Americas.

A paper in 2020 refined and confirmed this analysis, and the 2024 paper even adopts its NNA v. SNA classification while failing to recognize the distinct temporal waves involved in the pre-Columbian peopling of the Americas.

See generally:

[1] Maanasa Raghavan, et al., "The genetic prehistory of the New World Arctic", Science 29 August 2014: Vol. 345 no. 6200 DOI: 10.1126/science.1255832.
[2] David Reich, et al., "Reconstructing Native American population history", Nature 488, 370-374 (16 August 2012) doi: 10.1038/nature11258
[4] Erika Tamm, et al., "Beringian Standstill and Spread of Native American Founders", PLOS One DO: 10.1381/journal.pone.0000829 (September 5, 2007).
[5] Alessandro Achilli, "Reconciling migration models to the Americas with the variation of North American native mitogenomes", 110 PNAS 35 (August 27, 2013) doi: 10.1073/pnas.1306290110
[7] Judith R. Kidd, et al., "SNPs and Haplotypes in Native American Populations", Am J. Phys Anthropol. 146(4) 495-502 (Dec. 2011) doi: 10.1002/aipa/21560

The critical problem with the paper is that Athabascans are a poor representative of Northern Native American lineages from the founding era ca. 14,000 years ago, because they have significant Na-Dene wave admixture, also shared, for example, with the Navajo, who migrated in turn migrated from what is now central to western Canada to the American Southeast around 1,000 CE (possibly, in part, due to the push factor of the incoming wave of proto-Inuits). 

In contrast, the vast majority of North American Native Americans have no Na-Dene or Inuit ancestry and are in population genetic continuity with one or more of the several founding populations of North America. Almost any other choice of a North American Native American comparison population would have been much, much better.

In contrast, the Karitiana are indeed representative (and the standard choice to represent) the ANC-A population.

It is entirely plausible that the Blackfoot are indeed from a wave of North American founding population that is under sampled and that their lineage is not represented in prior published works. 

Latin American indigenous peoples (and to a lesser extent and more recently, Canadian First Peoples) have, in general, been more receptive to population genetic work by anthropologists and Native American populations in the United States who have given these researchers the cold shoulder until very recently, due to a historical legacy that has understandably fostered distrust of people associated with the establishment in the U.S. including anthropologists. So, Native Americans in the U.S. are greatly under sampled.

But, because the thrust of the paper heavily relies on comparisons between Blackfoot DNA and Athabascan DNA with misguided assumptions about the Athabascan population histories entering into the calculations and analysis, it is hard to confidently extract reliable conclusions from that analysis. The Athabascan may be mostly ANC-B, but are probably the most divergent sample one could use to represent that population, particularly since no attempt is made to distinguish the ancestry components in that population. This seriously confounds the efforts to pin down the prehistoric time line.

A good quality peer-review should have caught this problem, but peer-review in practice is less effective than it is given credit for being.

Realistically, the only way to really do it right would be to withdraw the 2024 paper and replace it with a new paper that reanalyzes the Blackfoot genetic data by comparing it to a more suitable representative of North American Native American ancestry.




Studies of human genomes have aided historical research in the Americas by providing rich information about demographic events and population histories of Indigenous peoples, including the initial peopling of the continents. The ability to study genomes of Ancestors in the Americas through paleo-genomics has greatly increased the power and resolution at which we can infer past events and processes. However, few genomic studies have been completed with populations in North America, which could be the most informative about the initial peopling process. Those that have been completed in North America have identified Indigenous Ancestors with previously undescribed genomic lineages that evolved in the Late Pleistocene, before the split of two lineages [called the “Northern Native American (NNA)” or “ANC-B” and “Central and Southern American (SNA)” or “ANC-A” lineages] from which all present-day Indigenous populations in the double continent that have been sampled derive much, if not all, their ancestry before European contact. Specifically, the lineage termed “Ancient Beringian” was ascribed to a genome in an Ancestor who lived 11,500 years ago at Xaasaa Na’ (Upward Sun River) and named Xach’itee’aanenh t’eede gaay (USR1) by the local Healy Lake Village Council in Alaska. An Ancestor who lived 9500 years ago at what is now called Trail Creek Caves on the Seward Peninsula, Alaska, also belongs to the Ancient Beringian lineage. In addition, another Ancestor, under the stewardship of Stswecem’c Xgat’tem First Nation, who lived in what is now called British Columbia, belongs to a distinct genomic lineage that predates the NNA-SNA split but postdates the split from Ancient Beringians on the Americas’ genomic timeline. This Ancestor was identified at Big Bar Lake near the Frasier River and lived 5600 years ago. Thus, these previous studies of North American Indigenous Ancestors have successfully helped to identify previously unknown genomic diversity. However, the ancient lineages identified in these studies have not been observed in samples of Indigenous peoples of the Americas living today. Research in Mesoamerica and South America suggests that certain sampled populations (e.g., Mixe) have at least partial ancestry in present-day Indigenous groups from unknown genomic lineages in the Americas, possibly dating as far back as 25,000 years ago. . . .

With multiple genomic analyses showing the ancient Blood/Blackfoot clustering together with present-day Blood/Blackfoot but on a separate lineage from other North and South American groups, we created a demographic model using momi2, which used the site frequency spectra of present-day Blood/Blackfoot, Athabascan (as a representative of Northern Native American lineage), Karitiana (as a representative of Southern Native American lineage), and Han, English, Finnish, and French representing lineages from Eurasia. The best-fitting model shows a split time of the present-day Blood/Blackfoot at 18,104 years ago, followed by a split of Athabascan and Karitiana at 13,031 years ago.

The paper and its abstract are as follows:

Mutually beneficial partnerships between genomics researchers and North American Indigenous Nations are rare yet becoming more common. Here, we present one such partnership that provides insight into the peopling of the Americas and furnishes another line of evidence that can be used to further treaty and aboriginal rights. We show that the genomics of sampled individuals from the Blackfoot Confederacy belong to a previously undescribed ancient lineage that diverged from other genomic lineages in the Americas in Late Pleistocene times. Using multiple complementary forms of knowledge, we provide a scenario for Blackfoot population history that fits with oral tradition and provides a plausible model for the evolutionary process of the peopling of the Americas.
Dorothy First Rider, et al., "Genomic analyses correspond with deep persistence of peoples of Blackfoot Confederacy from glacial times" 10(14) Science Advances (April 3, 2024).

Tuesday, July 8, 2025

Steppe Ancestry In Italy


Blonde hair percentages, at a population statistics level, is a good proxy for Indo-European steppe ancestry levels (it's not as good as autosomal DNA, but the sample size and amount of fine grained geographic detail is much better). You'd need an estimate for the amount of steppe ancestry in Italians to calibrate this litmus test, however.

The first farmers of Europe had essentially 0% blonde hair, much like modern Sardinians, who are their closest genetic match. Blonde hair in Europe arrived more or less exclusively via steppe migration in late Neolithic to early Bronze Age from an ultimate homeland in the vicinity of modern Ukraine, although plenty of migration happened within Europe after this migration and not all steppe migrants had blonde hair. It is also possible to have very little steppe ancestry while still having the blonde hair gene. 

The chart shows the percentage of blond haired people in the regions shown on the map in (or near) Italy. Overall, about 8% of Italians are naturally blonde (another estimate suggests 15%). It suggests that Indo-European migration to Italy was largely north to south (with exceptions for urban centers) and reached southern Italy in far smaller proportions than northern Italy, although it is hard to know how much of the migration was modern, how much was medieval, how much was from the Roman era, and how much dates to pre-history.

Until the late 1870s, Italy was not a unified country, with Southern Europe belonging to the poorer Kingdom of the Two Sicilies with a more agricultural economy, and Northern Europe belonging to a number of smaller and more prosperous states with more mercantile economies, which could have impacted migration patterns by increasing migration from areas with more blonde people. 


From Reddit.

In the medieval era Northern Europeans, including the Normans and Vikings and Germanic tribes, had greater interactions with Northern Italy than with Southern Italy, as well. 

In the Roman era, migration to the Roman capital and its major cities from North Africa, Egypt, and the Levant might have diluted the percentage of people with steppe ancestry.

Shortly before the classical Roman era, there were a number of Greek colonies in Italy, which could be reflected in the purple regions on the map (about 4% of Greeks are naturally blonde), with some blurring out due to admixture with regions near former Greek colonies.


From Wikipedia.