Inferring Historical Linguistic Affiliations With Genes

How does one come to conclusions about what languages were spoken by prehistoric peoples who did not have languages that were ever attested in writing? 

What role does genetic evidence play in assessing an archaeological culture's linguistic affiliations?

Some key assumptions guide my use of genetic data to infer language shift in the absence of historical evidence. I would argue that each of the twelve assumptions set forth below the jump has a solid theoretical and empirical basis.  The parts of these assumptions in the body language of each point related to genetics are in bold.  There are sometimes close calls involved in applying these tests, but these assumptions are not speculative and do not involve a blind equation of genes and linguistic affinity. 

In any given place there are typically only half a dozen or fewer archaeological cultures that were not historically attested.  In any given continent or subcontinental sized region, there are at most a few dozens.  The nature of task of inferring historical linguistic affiliations is to assign a suspected linguistic affiliation to each group of one or more related archaeological cultures based on the nature of their transitions from prior cultures, and to connect them to historically attested languages when possible.  Genetics provide valuable evidence in evaluating these small numbers of transitions in any given language area, but genetics have to be used property and not just blinding associated with a language without other context to support that connection.

Taken together, the sum of these linguistic stories provide an understandable narrrative that shows how the peoples of prehistoric eras provided the cultural source for modern populations and languages.  This post merely states the assumptions involved, rather than applying them, a task of a lifetime left for other posts.

I.  Five Assumptions About Linguistic Continuity

1. Linguistic inertia.  The same language persists in any given place unchanged from generation to generation in the absence of a reason for it to change.

2. The evolution of languages over time.   Languages charge over time for reasons including linguistic drift, substrate and language learner influences, intentional community efforts to develop a distinctive dialect following a schism from another population speaking the same language, areal effects and the adoption of loan words.  Contrary to an assumption found in many computational linguistic models, most language change is a product of outside influences or changing circumstance's that require new words or linguistic features to describe them, not mere random linguistic drift (see, e.g. Icelandic).  Even when changing circumstances change a language, the change is likely to be concentrated at the historical moment of the ethnogenesis or adoption of a new archaeological culture and to relate to the particular circumstances that are changing.  But, in practice, languages do tend to change significantly enough to cease to be mutually intelligible with its older version every few centuries to thousand years or so, because outside idealized circumstances that are language changes associated with language and dialect formation  and there are areal influences and cultural changes that call for linguistic innovation.

3. Sacred languages.  Languages used only by specialists for culturally important purposes within a population, like liturgical languages aka sacred languages, often persist as secondary languages even when there is a primary language shift and sacred languages are conservative in the fact of factors that would otherwise lead to language evolution over time.  Sacred languages in literate cultures can persist for thousands of years even when they are otherwise moribund.

4. Population genetic change without language shift is distinctive. All major population genetic shifts not involving language shift show genetic continuity exclusively from a subset of the prior population accompanied archaeological or paleoclimatic indications of a selective pressure such as aridity, reduced temperature, flooding, internal warfare, declining population density, long distance migration, or disease.

5. "Random" linguistic drift.  The linguistic drift that does occur in each generation of a population that is its own linguistic community, even if its is in limited contact with neighboring linguistic communities, is also a function of population size.  Linguistic drift is more prone to occur when populations slip below a critical mass necessary to reproduce itself fully (probably somewhere in the low hundreds of people in a particular regularly interacting language community at least), and has some relationship (although probably a weaker than linear relationship, perhaps a logarithmic relationship) to population size.  Linguistic drift is more likely to happen with rarely used words than with common words in proportion to their rarity.  Phonetic evolution in languages follows well defined "natural" patterns.  Historically attested languages of people who have not experience mass language shifts distinctively accumulate more features over time that are challenging for second language learners (e.g. many of the Caucasian languages).  Written languages drift less than oral languages, all other things being equal (which they rarely are in fact).

II.  Five Assumptions About Language Shifts

6. Archaeological indicators of language shift.  Language shift in a primary (i.e. non-liturgical everyday) language, as distinct from evolution of the language itself over time, is an all or nothing affair that usually take place in a small number of generations.  Language shift happens place at a community wide level because language is social.  Therefore, language shift does not occur in the absence of a momentous adoption of a larger cultural package that can be discerned as a new archaeological culture in an area.  It always shifts to the language of a superstrate population that includes at least a critical mass of native speakers of the new language. 

7. Linguistic shift takes place almost exclusively at the dawn of new archaeological cultures. Language shift only takes place at the beginning of a new archaeological cultural in a particular place because it always involves a shift in a larger cultural package of which language is only a part. This typically means that there are only a handful of potential moments in the prehistory of a particular place when language shift could have taken place. Archaeological cultures, when properly classified, which is usually the case, represent periods in time and space that are in cultural continuity with each other. A particularly difficult to classify archaeological culture that presents inconsistent indicators regarding its origins may be a sign that it has been inaccurately defined.  An archaeological culture that is not genetically homogeneous by the time it concludes was previously inhabited by separate regional archaeological cultures derived from different ancestral gene pools, one or more of which underwent language shift at some point during an archaeological culture transition.

8. Factors from which language shift can be inferred. New archaeological cultures accompanied by shifts in population genetics and physical anthropology (particularly in the elite male population) and non-local cultural antecedents tend to involve language shift. Each of these instances should be individually examined in a fact intensive way that reflects all of the available evidence regarding the connection of the new archaeological culture and its people to the cultural antecedents of the culture and the genetic antecedents of the people, The genes, strontium levels in bones, and physical anthropology of people and in particular male elites, the genetic and strontium indicators of sources of domesticated flora and fauna, comparison of artifact styles (particularly if non-functional), and the sources of non-local trade goods are among the good indicators of cultural antecedents of an archaeological culture that is otherwise ambiguous.   Genes in human populations tend to be less diverse in populations of earlier eras of prehistory and history, and to have distinct haplogroup or autosomal component characteristics that tend to remain fixed absent admixture or major evolution selection events.  Genes are assumed to have originated at the ancestral home of the modern populations where there is the highest degree of basal genetic diversity in that genetic clade absent evidence to the contrary.

9.  The linguistic implications of genetic admixture.  All significant genetic admixture events between previously separated genetic populations cause a language shift by the substrate population in the admixture, or in rare distinct circumstances to a language shift by both of the admixing populations to a creole or third party lingua franca (that is effectively a superstrate population for both of the admixing populations).  One exception to this rule involves cases where genetically distinct populations already share a common language due to a historical common superstrate population that has given the populations a shared primary cultural source (e.g. if ethnically different Romans from different provinces who already both speak Latin give rise to an ethnic melting pot in the city of Rome). 

10. Linguistic affiliation can be traced to all archaeological cultures which are in continuity with a historically attested language. A linguistic affiliation can be assigned to an archaeological culture and all subsequent linguistic cultures in evolutionary continuity it by tracing it to a historically attested linguistic population. In other cases, the fact of language shift but not the content of the previous language is all that can be determined.  Even a relict language that became extinct shortly after it was first historically attested can sometimes be used to assign a linguistic affinity to a great many archaeological cultures that evolved from a common source.

III.  Two Assumptions About Special Cases

11.  Sustained bilingualism.  Sustained mass bilingualism (as opposed to transitional bilingualism during periods of ordinary language shift, or bilingualism by specialist individuals) is a rare phenomena that appears only in highly particularized circumstances which were rare in prehistory.  Generally sustained mass bilingualism arises when there is a stable but weak superstrate v. substrate relationship that is real, so that there is benefit to be gained from learning the superstrate language, but involves a substrate population that has sufficient cultural pride, group identity and a sufficient linguistic community population to sustain its substrate language indefinitely.  For example, when a cohesive and culturally sophisticated population  such as the Greeks were conquered by a militarily powerful but less culturally unsophisticated population such as the Roman with only a numerically small superstrate population, sustained bilingualism can arise.  Other examples are the preservation of minority languages by the Roma and by the Jews in Europe, both of which are cases that demonstrate that intentional superstrate exclusion of substrate cultures from mainstream society confining them to ethnic ghettos or the equivalent, can support a stable case of sustained bilingualism.

12. Creoles.  Creoles are phenomena involving language formation that appear only in highly particularized circumstances.  Creoles give rise to a linguistically distinctive profile in the language itself.  These circumstances were rare in prehistory (Swahili and Caribbean slave creoles are two of the more notable historical examples distinguished by the lack of a common language between linguistically distinct people who must communicate with each other and relative equality of status.

