Here, I am reviewing a web based tool, called Haplotter, for human population genetic analysis. The above reference is the paper that describes the analytical methods used in this web tool. Haplotter is very easy to use and may be useful for teaching upper level anthropological genetic or human population courses. It is also useful for generating hypotheses, when you suspect that positive selection has acted on genes for a particular phenotypic trait, but you do not have sufficient genomic data to investigate.
Haplotter uses HapMap Phase 1 and 2 data and look for signature of positive selection in three continental populations (YRI, CEU, and ASN). Chinese and Japanese samples were combined into ASN. You can query by genomic region, gene or SNP.
It provides fourstatistics:
iHS (integrated Haplotype Score) is a statistic for detecting long-range haplotype. It is based on EHH (extended haplotype
homozygosity) that measures decay of identity as a function of distance. It was designed to detect signature when the variants have not reached fixation and are in the intermediate frequency.
Fay and Wu’s H and Tajima’s D are used to examine skew in allele frequency spectrum and large negative values indicate positive selection. Fay and Wu’s H is sensitive, when the selected alleles are close to be fixed in a population, while Tajima’s D detect selection when there are abundant of low frequency polymorphisms.
Population pairwise FST (between three pairs of populations) is used to detect large allele frequency differences between pairs of populations that resulted from selection that acted on loci in one population, but not the other.
See also here for brief explanations of methods. The P-value was obtained from empirical distribution of both Tajima’s D and Fay and Wu’s H in 50 SNPs windows and the rank of the statistic in the window compared with overall genome distribution.
But we have to remember, there are several important issues that we need to consider and I list three here. First, ascertainment
bias that I mentioned earlier (see here) may affect some of the analyses. Second, only three continental populationsused and these three populations may not be representative samples of worldwide human populations. Third, many SNPs in the HapMap data set are common variants with Minor Allele Frequency (MAF) greater than 0.05.