Ascertainment bias in HapMap data: Should we use HapMap data for population genetics studies?

Clark, A. G., M. J. Hubisz, et al. (2005). “Ascertainment bias in studies of human genome-wide polymorphism.” Genome Research 15(11): 1496-1502.

The HapMap project produced dense genotype data of genome-wide polymorphisms.  In the phase 3, 11 world-wide populations were included.  This project was carefully designed, but there are several issues (e.g., representativeness of sampled populations and ascertainment bias of SNPs chosen for genotyping).

In their article, Clark et al. analyzed the phase 1 HapMap data to examine how ascertainment bias affects observed within population heterozygosity (genetic diversity) and FST (population differentiation).  The phase 1 data set includes Yoruban from Nigeria, Chinese from Beijing, Japanese from Tokyo, and Europeans from Utah.

In population genetics, ascertainment bias is sampling bias that usually occurs during SNPs or genetic markers selection for analysis.  Traditionally, European individuals are used for SNP and marker discoveries and then the SNPs and genetic markers discovered from the European samples are used to analyze genetic variation of other populations, such as Asians and Africans.  All the statistical and population genetics analyses (e.g., analysis to examine pattern of population differentiation among three geographical groups) using these markers are biased.

To assess the effects of ascertainment bias, Clark et al. compared the HapMap data to Perlegen data.  The HapMap project was design to find common SNPs that have allele frequency of > 5%, so these SNPs can be used for disease association studies.  On the other hand, Perlegen data was produced from resequncing of individuals from ethnically diverse populations.

They found that observed within population heterozygosity and FST between each pair of populations are inflated in HapMap data set.  HapMap data is often used for population genetics studies, but they argue that we have to be careful with interpretation of the data.  For example, FST is often used to detect the genetic evidence of positive selection (see here).  We can observe increased FST, because of ascertainment bias, not because of localized positive selection.

They think that this ascertainment bias does not affect genetic association studies, but I wonder how ascertainment bias affect association studies, when you are testing association adjusting for population stratification using STRUCTURE or PCA.

Also, we need to think if the CEPH-Human Genome Diversity Project data has ascertainment bias.  If so, world-wide human population structure observed (see here) could be in part due to ascertainment bias.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.