Recently, I asked my Facebook followers to send me their questions about DNA testing. One person wanted to know how AncestryDNA determines ethnicity percentages. In particular, she was interested in what regions of the genome Ancestry uses to draw these conclusions.
First, it is important to understand that Ancestry is not actually sequencing a client’s entire genome. The vast majority of DNA would not be informative, because it is the same in all people. Instead, Ancestry determines the client’s DNA sequence only at specific positions that are known to vary among different ethnic groups. These differences, which are scattered all over the genome, are called single nucleotide polymorphisms (SNPs – pronounced “snips”). Determining an individual’s sequence at a variety of SNPs is called “genotyping”.
Ancestry uses SNPs that were originally identified by comparing genomes from individuals of European, East Asian (Han Chinese and Japanese), and West African (Yoruba) ancestry. Since Ancestry wants to be able recognize other ethnicities too, they had to develop a reference panel of people from a variety of known ethnic backgrounds. To do this, they genotyped people whose ancestors all came from the same geographic region and thus were likely to descend from a single ethnic group. They also incorporated data from the public Human Genome Diversity Project (HGDP), which genotyped individuals from about 50 different populations around the world. When the SNP data from this reference panel was plotted on a graph, it formed clusters corresponding to 26 distinct geographic regions. Ancestry uses these 26 regions to define a client’s ethnicity.
When a client’s DNA is genotyped, the data is compared to the reference panel at 300,000 SNPs (the sites for which the HGDP and Ancestry’s technique both provide information). The most informative SNPs are then subjected to some high-powered statistical analysis. Basically, they calculate the predicted SNP results for all possible proportions of ethnicity and compare those predictions to the client’s actual SNP results to determine which ethnicity combination has the highest probability of producing the client’s results. The “winning” combination is reported to the client as their Ethnicity Estimate.
Obviously the results of this type of analysis are only as good as the reference panel. Ancestry has already upgraded the reference panel once (they are currently using the V2 panel) and additional improvements are in the works.
The quality of results may also vary depending on the ethnicity of the subject. Because of the SNPs that were chosen, the Ancestry ethnicity test works best for people of European ancestry. However, even some regions within Europe are difficult to distinguish due to migration and population mixing. For example, the regions defined as Great Britain and Europe West show a lot of overlap. Ancestry provides a brief history of each of the geographic regions, highlighting population movements that are likely to have affected the genetic makeup of its inhabitants.
In my personal experience, the geographical regions identified by the Ancestry Ethnicity Estimate match up fairly well with what would have been predicted based on standard genealogy. It is important to check the error bars on each region, since they are often quite large. For example, on one test that showed 6% Great Britain, the actual range is 0-21%. As Ancestry upgrades their reference panel and algorithms, these results are likely to improve.
If you are interested in reading about Ancestry’s Ethnicity Estimate in even more detail, check out the white paper describing their method.
If you have other questions about DNA testing for genealogy purposes, comment below or submit questions through the Contact form on this site. You can also send me a message through the KinSeeker Genealogy Services Facebook page. I will try to answer any questions in a future post.
Teresa is the the owner of KinSeeker Genealogy Services. She has a Ph.D. in Biology and a lifelong fascination with genealogy. She been researching her own family history for over 20 years and loves helping others "find their stories."
Please visit the KinSeeker Genealogy Services Facebook page
This blog is owned by Teresa Shippy. Content may not be copied without permission.
©2016, copyright Teresa Shippy