Tuesday, October 10, 2017

Which Ancestries Does 23andMe Botch Most Badly?

There is a somewhat widely known problem with the 23andMe ancestry estimation techniques for Koreans. It turns out, that the company's own data confirms this fact. 

The Company has a chart measuring precision and recall for each ethnic category. Precision is the likelihood that a particular genetic unit estimated to be from someone of a particular ethnicity really is from a person of that ethnicity. Recall is the likelihood that a gene from a person of a particular ethnicity is accurately classified as such. Here is the breakdown (with poor performance noted in bold):

POPULATIONPRECISION (%)RECALL (%)
Sub-Saharan African9999
West African9796
East African9589
Central & South African10089
East Asian & Native American9999
Native American9986
East Asian9797
Japanese9892
Korean8662
Yakut9678
Mongolian8953
Chinese9391
Southeast Asian9570
European9999
Northern European9585
British & Irish9039
Finnish9586
French & German788
Scandinavian8634
Southern European9366
Balkan8842
Iberian9251
Italian8850
Sardinian9662
Eastern European9050
Ashkenazi Jewish9793
Middle Eastern & North African9583
Middle Eastern9076
North African9577
South Asian9995
Oceanian10095

Koreans are known to be often misclassified and sure enough, the numbers are bad. Only French and German rates lower than Korean by predictive accuracy. There are less accurate recall rates than those for Koreans, but 62% is nothing to write home about and it is far less accurate than for other East Asian ancestries.

There are also known problems with Southern and Eastern Europeans, all of whom have very low recall rates. Among Northern Europeans, only Finnish people really stand out as distinct.

No comments: