What follows is a collection of factual observations about the population genetics of Afro-Asiatic language family speakers and populations in some ways related to or distinct from them, and some analysis of those facts. It is a work in progress towards making sense of the hard to fit together puzzle pieces of a complex linguistic family's origins.
There are four Y-DNA haplogroups associated with Afro-Asiatic language speaking populations to some extent or another: E1b1b, the exclusive distinctive Y-DNA haplogroup in Berbers which is found is all but Chadic speakers, R1b-V88 associated with Chadic speakers, J associated most strongly with Semitic speakers, and T associated most strongly with Cushitic, Ethiosemitic and Coptic speakers (including Arabic or Berber language speakers of Egyptian descent).
R1b-V88, J and T, when found in Africa, are back migrations. E1b1b is East African (probably Ethiopian) in origin. With a few exceptions, the combined set of Afro-Asiatic Y-DNA haplogroups account for 70% or more of all sampled Afro-Asiatic populations available to the public, and in some cases that percentage exceeds 90%. But, apart from one modest sized sample of Algerian Arabs, there are also Y-DNA haplogroups in the population beyond those associated with Afro-Asiatic populations.
The overall Afro-Asiatic percentage and the nature of the "background" haplogroups varies by region. In the Levant and Egypt, the largest components of the background tends to be Y-DNA haplogroups G and I (suggestive of Anatolian and Caucasian influences). In Mesopotamia and Arabia, the largest component of the background tends to be Y-DNA haplogroup R1a (suggestive of West Asian influences). In Chadic, Berber and North African Arab populations, the largest component of the background tends to be Y-DNA haplogroup E1b1a (suggestive of West African influences). In East Africa (Ethiosemtic, Cushitic and Omotoic and well as some Sudanese populations), the largest component of the background tends to be Y-DNA haplogroups A and B (suggestive of a Khoisan language macrofamily and Pygmy macrofamily affinity). The background for Jews tends to reflect their ancestral geographic neighbors (e.g. Europeans).
Of these backgrounds, I find the East African background, which I would have naively expected to be E1b1a dominated, to be the most interesting. It suggests that haplogroups A and B were predominant in the region (and locally, it is mostly heavily one or the other of them), prior to the later in time expansion of E1b1b, and provides inferential evidence against a hypothesis that Niger-Congo languages or something fairly closely related may have been spoken in East Africa before the rise of Afro-Asiatic languages.
The two populations of Afro-Asiatic population descent showing the largest backgrounds in Africa and the Middle East are East Africans and Jews, each of which have some communities background percentages as high as 40%-50% of the male population sampled. The large background in East Africa could either indicate the petering out of a wave of advance at the periphery in the face of indigenous resistance, or could indicate the limited ability of E1b1b bearers to surmount already well established local competition in their expansion.
The "signal" in the case of Berbers and Chadic populations is strongly form a single Y-DNA haplogroup (E1b1b and R1b-V88 respectively), but in other populations the Afro-Asiatic markers tends to include some mix of haplogroups. Isolated traces of haplogroup J in Berbers is probably attributable to Arab/Bedouin influences. Haplogroup T, for example, while strongly associated with Afro-Asiatic populations in Africa, is almost never the exclusive Y-DNA marker associated with an Afro-Asiatic population that is found in a sample.
The great distinction in both signal and background between the populations on either side of the Gate of Tears at the Southern extent of the Red Sea suggests that the back migrations of Eurasian haplogroups that are associated with Afro-Asiatic populations took place via the Levant and across the Sinai or along its coast (and from there either down the Egyptian coast of the Red Sea or via the Nile to its source or parallel to the Nile during wet Sahara periods, or along the North African coast), and to a lesser extent via contacts between North Africans and Iberia (and to a less extent the entire Southern European coast during classical times), rather than at the Gate of Tears location often associated with the Out of Africa event.
It is a bit hard to tell if the Somolian Y-DNA haplogroup T population has coastal origins, or if it followed the Blue Nile to its source and then made the short hop from the Blue Nile basin to the Indian Ocean drainage basin. Y-DNA haplogroup T could have gone Southeast to Egypt and Northeast to Europe, while Y-DNA haplogroup L could have split from T and K somewhere in the Fertile Crescent and migrated to Pakistan as part of the Harappan population (along with R2 which also has a well defined Indus River Basin distribution).
The relatively clear delineation between Chadic populations and their neighbors with religious and food production similarities (e.g. with the Fulani and Saharan subgroups of Nilo-Saharans), relative to other families of Afro-Asiatic populations, suggests to me that they are quite young among the main linguistic subgroups of Afro-Asiatic. Ethiosemitic is also known in an approximately Bronze Age time frame to have emerged from a single language of Southwest Asian origins upon a Cushitic/Coptic substrate. Arabic's expansion likewise, took place in historic times in the 1st and 2nd millenia.
We know that Semitic was thriving as the language of the Akkadian and subsequent Assyrian empires in the Bronze Age. We know from recovered and dated artifacts that there was trade from Arabia (probably via Yemen) as far as Zanzibar as far back as at least 2,400 BCE, that the Indian Ocean trade was operating in the 1st century CE connecting Zanzibar, Arabia and South Asia, at least, that the African groups that fused with Austronesians to settle Madagascar were very likely East African Bantus, and that Zanzibar probably acquired iron age Bantu settlement around the 9th century CE. An island off Kenya that was part of the same trading system as Zanzibar is called Lemu (a name which is suggestive of the legendary homeland of the Tamils called Lemuria and may have been named in the same place naming tradition although the sleepy island is an unlikely homeland for anyone). We know that there was a healthy Atlantic maritime trade network in the pre-Celtic megalithic culture starting around the 4th century BCE and until roughly Bronze Age collapse ca. 1200 BCE, as a proof of existence of technology at that time.
The subhaplogroup structure found in E1b1b suggests that this haplogroup has its origins in Ethiopia, where diversity is greatest and closer to the origins of E1b1a and E2, while Berbers are at the periphery.
Berbers are also at the tail end of expansions of mtDNA M1 and U6, which are probably back migrations to Africa despite the fact that they are largely confined to Afro-Asiatic Africa. The mtDNA haplogroup L3* found in Berbers is old, rather than of recent origin in the Transafrican slave trade. About 10%-20% of Berber mtDNA is Subsaharan African in origin, while about 60% involves haplogroups usually associated with Caucasians of European and Middle Eastern origins. The mtDNA pool of both Chadic, Cushitic and Omotic populations is composed to a great extent of mtDNA haplogroups L2 and L3 similarly to their neighbors who speak different languages, suggesting a male dominated arrival pattern of Afro-Asiatic language expansion.
Ancient DNA shows genetic continuity between Berbers and North African populations from 12,000 years ago.
The notion of Omotic is an admixture of Nilo-Saharan and Cushitic influences is plausible and could explain its out group status.
Also, genetically, some Nilo-Saharan populations look more like Afro-Asiatic populations than they do like Niger-Congo or prototypical Nilo-Saharan populations.
There are several key historical transitions that it would be nice to be able to link to linguistic expansions in Africa.
The arrival of Fertile Crescent food production technologies in Egypt and Egyptian domestication of the donkey (one or two thousand years after they were developed, ca. 6000-7000 BCE). The arrival of certain Fertile Crescent herd animals in North Africa, East Africa and the Sahel (a few centuries later). The Afro-Asiatic linguistic groups seem primarily connected to the Fertile Crescent rather than to the Sahel agricultural complex. Egyptian trade routes which can be corroborated to some extent not only by historical accounts but by the products that they exported to Egypt may have extended as far as modern Uganda and surely reached Ethiopia, at least.
The expansion of Sahel agriculture (by some accounts contemporaneous with the Fertile Crescent agricultural complex and by some accounts several thousand years later, perhaps as late as 4000 BCE). The domestication of select East African domesticates (e.g. coffee and certain Ethiopian grains). There is an argument that Sahel agriculture only came into its own when East African domesticates and Sahel domesticates merged.
It is probably fair to associate the expansion of E1b1a and the Niger-Congo languages and mtDNA haplogroup L2 with the development of Sahel agriculture, and to associate the later Bantu expansion starting 3,000 BCE with the development of tropical agriculture (with some crops of Austronesian sources) and Bantu iron metallurgy.
The proto-Nilo-Saharans, the proto-Chadic peoples, and the proto-Berbers all appear to have been nomadic pastoral cultures. The transition of the Sahara from its last wet phase in the Holocene when Lake Chad was very large to the current arid phase, was probably formative for at least some of them. There was a major regionally disruptive drought around 2000 BCE and another not quite as bad accompanied by other disasters like volcanos as well around 1200 BCE.
The origin stories of the Semitic peoples are also nomadic pastoral, a tradition that may have persisted at least prior to the Akkadian empire, Assyrian empire, and Phoenicians. Biblical accounts of the early Hebrews portray them as an initially nomadic pastoral culture that starts to transition to agriculture as they settled into the Levant after their exile in Egypt (an event that is suggestive, at least, of having some connection with the Hyskos era and the monotheistic Pharaoh of Egypt, although much of Genesis draws on Mesopotamian legends, something not necessarily inconsistent with the Hyskos).
The Coptic historical record is the oldest in existence apart from the Sumerian one, dating to about 3000 BCE, and is corroborated by the Sumerian historical record which is slightly earlier, ca. 3500 BCE.
The Coptic Egyptian record is less than clear about the origins of the Berbers, Cushitic, Omotic, Chadic and Semitic peoples on their fringes, and archaeology helps only a little to fill the gap.
The Afro-Asiatic languages have considerable time depth relative to the Indo-European and at least some Asian and Altaic language family expansions. They were preceded in Mesopotamia by Sumerian, which was also very likely the source for the Harappan civilization in modern day Pakistan at about the same time as the Egyptian civilization, although the linguistic affinity of Harappan is highly disputed.
The Afro-Asiatic language families do not show a clear linear family tree relationship to each other; almost every combination of groupings has been suggested by legitimate professional linguists. The Northern tier of Afro-Asiatic languages (Berber, Coptic and Semitic) are non-tonal. Cushitic and Omotic and Chadic do have grammatical tone. This suggests either substrate influences or areal influences.
My intuition is that Afro-Asiatic languages either arise from a proto-Afro-Asiatic language in and around Jericho that spread to Egypt and from Egypt to the other Afro-Asiatic language families, or that it radiates from Egypt in all directions. Thus, I am ambivalent about the direction of the Semitic-Coptic link. The close connection to the earliest food production centers makes a Levantine origin for Afro-Asiatic attractive, but the early adoption of a written language could have given Coptic and edge and could have been an indigenous fishing population language (as fishing populations were the most culturally advanced societies prior to food production technologies). The lack of a monolithic Y-DNA or mtDNA signature suggests that some of the transitions were predominantly cultural transfers while others were demic.
My intuition is that the Berber E1b1b/mtDNA L3* combination, quite possibly as a unit, arrived in North Africa in the pre-Neolithic times (ca. 12,000 BCE) as a hunter-gather population that in some way made a leap that set it apart from prior cultures in the area and caused it to expand from East Africa before the Afro-Asiatic languages arose, and Berbers then transitioned culturally with very little genetic impact to a nomadic pastoralist society with an Afro-Asiatic language derived from Coptic when herd animals arrived from Egypt ca. 5000-6000 BCE. The mtDNA M1/U6 could be fellow travellers with Y-DNA E1b1b or with T that would have also been a back migration.
My intuition is that Cushitic is the product of the expansion of Coptic society towards the source of the Blue Nile at about the time of agricultural technology and trade influences from Egypt, with quite heavy substrate influence, and that Omotic is basically Cushitic under heavy Nilo-Saharan influence on the border of the two language groups.
My intuition is that Y-DNA J in Africa is a late in time influence (Ethiosemitic and later) driven by Semitic peoples, and that Y-DNA haplogroup T is an ancient one (quite possibly the most ancient Afro-Asiatic marker given its presence in both the Levant and Europe and Egypt and Cushitic areas). Of course, J is almost surely in Southwest Asia much earlier. I suspect that J1 is originally more closely associated with Semitic languages (possibly in connection with of after Y-DNA T), but that J2 introgresses to some extent into the mix. A specific time depth of J in Southwest Asia and West Asia is for another day.
R1b-V88 should probably be similar in time to the expansion of R1b elsewhere to break off from it basally, but if it was present in a refugium in the Dead Sea area, for example, it could have formed a basis for the Chadic peoples later in time - I suspect that the Chadic peoples could have origins in the early period of Egyptian written history and went almost unmentioned or could have origins shortly before Egyptian written history. There is some indication from their distribution that Chadic peoples may have arrived via oasis hopping in a wetter Sahara Holocene period parallel to but West of the Nile, rather than down the Nile, although tracing the White Nile to its source and then hopping into the Chad basin would also make sense.
Note: This post is light on links and actual data, because I want to get my analysis down and my data are in my favorites bar and a hand written journal, neither of which is prone to memory lapses.