A new paper predictably sees the three main ethnicities of Singapore (Chinese, Malay and Indian), with some admixture between these populations (with the Chinese and Indian populations arriving in the historic era as immigrants). The Malay component of ancestry is absent in the 1000 Genomes Project data. There are also genetic signatures of a separate mainland route Neolithic Austroasiatic wave of migration from South China (ca. 4000 years ago), and an island base Austronesian wave of migration from South China (ca. 2000 years ago).
Asian populations are currently underrepresented in human genetics research. Here we present whole-genome sequencing data of 4,810 Singaporeans from three diverse ethnic groups: 2,780 Chinese, 903 Malays, and 1,127 Indians.
Despite a medium depth of 13.7X, we achieved essentially perfect (>99.8%) sensitivity and accuracy for detecting common variants and good sensitivity (>89%) for detecting extremely rare variants with <0.1% allele frequency. We found 89.2 million single-nucleotide polymorphisms (SNPs) and 9.1 million small insertions and deletions (INDELs), more than half of which have not been cataloged in dbSNP. In particular, we found 126 common deleterious mutations (MAF>0.01) that were absent in the existing public databases, highlighting the importance of local population reference for genetic diagnosis.
We describe fine-scale genetic structure of Singapore populations and their relationship to worldwide populations from the 1000 Genomes Project. In addition to revealing noticeable amounts of admixture among three Singapore populations and a Malay-related novel ancestry component that has not been captured by the 1000 Genomes Project, our analysis also identified some fine-scale features of genetic structure consistent with two waves of prehistoric migration from south China to Southeast Asia. Finally, we demonstrate that our data can substantially improve genotype imputation not only for Singapore populations, but also for populations across Asia and Oceania. These results highlight the genetic diversity in Singapore and the potential impacts of our data as a resource to empower human genetics discovery in a broad geographic region.
Degang Wu, et al., "Large-scale whole-genome sequencing of three diverse Asian populations in Singapore" bioRxiv (August 11, 2018).
The juicy bit of the discussion section reads as follows:
Malay represents indigenous people in Southeast Asia and contributes a novel ancestry component that was not captured by the 1000 Genomes Project. We observed a clear north-south clinal pattern of genetic variation in both South Asia and East/Southeast Asia, except for two recent migrant populations--the SG Chinese and SG Indian, which is consistent with previous studies that suggest a strong role of geography in producing human population structure.
Moreover, we found noticeable amounts of admixture among the three major populations in Singapore.
In addition, we identified two closely related ancestral components (components 4 and 5 in Figure 2E) that are prevalent in East and Southeast Asian populations, suggestive of their ancient origins. Based on the geographic distributions of these two components, we speculate that they might reflect two waves of prehistoric migration from south China to Southeast Asia through a mainland route (component 5) and an island route (component 4). This hypothesis is consistent with a complex peopling history of Southeast Asia depicted by a recent ancient DNA study. The study suggested that an expansion from East Asia into mainland Southeast Asia occurred about 4,000 years ago during the Neolithic transition to farming, and that an island route migration corresponding to the Austronesian expansion into Philippines and Indonesia took place about 2,000 years ago.So, the new study adds a few, basically unsurprising, but important, data points to the mix, but provides no big surprises or insights.