Genetic epidemiological studies in European and Asian populations support findings from genome wide association studies showing vitamin D related gene variants associated with serum vitamin D levels.

Previously, I reviewed if the variants identified in the genome-wide association studies (GWAS) of serum vitamin D levels in people of European descent are replicated in African Americans (here).  In this post, I will review replication and candidate gene studies conducted in other populations.

Two GWAS among people of European descent have found vitamin D metabolic pathway gene variants associated with serum 25(OH)D levels (Ahn et al. 2010; Wang et al. 2010; see here).  In the GWAS, the strongest signal of association was observed in vitamin D binding protein (GC), and Single Nucleotide Polymorphism (SNP) rs2282679 had lowest P value.  Smaller scale replication and candidate gene studies in European, Asian, and Hispanic populations also demonstrated the association of GC variants with serum 25(OH)D levels.  Although rs2282679 consistently show strong association in many studies, in these smaller studies, other SNPs in the GC show stronger association than rs2282679 (Bu et al. 2010; Lu et al. 2011; Zhang et al. 2012).

The region around DHCR7 and NADSYN1 showed the second strongest association the GWAS.  Again these GWAS identified SNPs are replicated well, but other SNPs also show strong signal of association (Zhang et al. 2012).  CYP2R1 variants also showed strong signal of association in GWAS.  Of all CYP2R1 SNPs, rs170741657 had lowest P value in one of the GWAS (Wang et al. 2010) and association of this SNP with serum vitamin D level is replicated in European and Chinese population with the lowest P value in the gene (Cooper et al. 2011; Zhang et al. 2012).  In one study in European Americans, another SNP, rs12794714, had lower P value than rs170741657 (Bu et al. 2010).  Another GWAS showed that two other SNPs, rs2060793 and rs1993116, were associated with plasma vitamin D levels (Ahn et al. 2010), but an attempt to replicate in Chinese population was not successful (Lu et al. 2011).

The results of replication and candidate gene studies suggest that 1) these GWAS identified SNPs are highly linked and high LD patterns in European and Asian population make it difficult to pinpoint the causal SNPs.  the causal variants that affect serum vitamin D levels have not been found.  2) In addition to genetic variants, biological (age, BMI, skin color, etc.), socio-cultural (smoking, dietary intake, supplement use, occupation, outdoor activities, sunscreen use, etc.) and environmental (latitude, climate, etc.) factors affect serum vitamin D.  We need to properly adjust these factors in the analysis and examine the interaction of these factors with genetic variants.3) Additional studies are necessary to examine which mutations alter the function of these genes and affect serum vitamin D level.


Ahn, J., K. Yu, et al. (2010). “Genome-wide association study of circulating vitamin D levels.” Human Molecular Genetics 19(13): 2739-2745.

Bu, F.-X., L. Armas, et al. (2010). “Comprehensive association analysis of nine candidate genes with serum 25-hydroxy vitamin D levels among healthy Caucasian subjects.” Human Genetics 128(5): 549-556.

Cooper, J. D., D. J. Smyth, et al. (2011). “Inherited Variation in Vitamin D Genes Is Associated With Predisposition to Autoimmune Disease Type 1 Diabetes.” Diabetes 60(5): 1624-1631.

Lu, L., H. Sheng, et al. (2012). “Associations between common variants in GC and DHCR7/NADSYN1 and vitamin D concentration in Chinese Hans.” Human Genetics 131(3): 505-512.

Wang, T. J., F. Zhang, et al. (2010). “Common genetic determinants of vitamin D insufficiency: a genome-wide association study.” The Lancet 376(9736): 180-188.

Zhang, Y., X. Wang, et al. (2012). “The GC, CYP2R1 and DHCR7 genes are associated with vitamin D levels in northern Han Chinese children” Swiss Med Wkly 142: w13636.

Two genome-wide association studies in European descent populations identifies SNPs associated with serum vitamin D level in three vitamin D metabolic pathway genes (GC, DHCR7, and CYP2R1)

Ahn, J., K. Yu, et al. (2010). “Genome-wide association study of circulating vitamin D levels.” Human Molecular Genetics 19(13): 2739-2745.

Wang, T. J., F. Zhang, et al. (2010). “Common genetic determinants of vitamin D insufficiency: a genome-wide association study.” The Lancet 376(9736): 180-188.

The vitamin D hypothesis of evolution of skin color suggests that lighter color skin among Europeans evolved as a response to lower UV exposure in northern latitude, because low vitamin D level can cause bone disorders as well as many other common disease (See here).  However, Jablonski and Chaplin did not explain evolutionary mechanisms very well and they did not mention variation of vitamin D metabolic and signaling pathway genes, which may contribute to evolution of pigmentation traits.  I think that one step to understand the mechanisms of evolution of pigmentation characteristics is identifying genetic variants associated with vitamin D level and vitamin D deficiency in European populations

There are two genome-wide association studies done among people of European descent (Ahn et al., 2010 and Wang et al., 2010), and these studies identified single nucleotide polymorphisms (SNPs) in vitamin D pathway genes associated with serum vitamin D, 25-hydroxy-vitamine D [25(OH)D].  25(OH)D is the primary form of vitamin D in the serum, which is converted to biologically active form, 1,25-dihydroxy-vitamine D [1,25(OH)2D in the kidney.  In both studies, associations with transformed (log or square-root) serum 25(OH)D level were tested using linear regression model adjusting for age, sex, BMI, and season.  In addition, Ahn et al adjusted for supplement intake, dietary vitamin D intake, region/latitude, vitamin D assay batch.

In both studies, they found SNPs in or near major vitamin D metabolic pathway genes, including vitamin D binding protein (GC), DHCR7, and CYP2R1.  Also, in both study, the SNP that showed the strongest association was rs2282679 located in the GC gene.

When the skin is exposed to the UVR, 7-dehydrocholesterol is converted to Pre-vitamin D3.  Also, in the skin,DHCR7 (7-dehydrocholesterol reductase) catalyzes the conversion of 7-dehydrocholesterol to cholesterol, so increased activity of DHCR7 potentially lower the 7-dehydrocholesterol available for vitamin D synthesis.  The vitamin D binding protein (GC) binds to vitamin D and its plasma metabolites and transports them to target tissues.  CYP2R1 is one of the enzymes that converts vitamin D to 25-hydroxy-vitamine D, 25(OH)D, in the liver.

It is far from understanding how vitamin D affected the course of evolution of skin color.  We do not know variation of these genes in the world and we do not know if these genes show evidence of selection.   These SNPs may not be the ones that change the function of genes or affect serum vitamin D level, but are linked to causal variants.  However, the findings are very important.  Now, we can examine variation of these genes in other populations and examine if these variants are associated with vitamin D level in other populations.  Fine mapping of these genes will help identify causal SNPs that affect function of genes and vitamin D level.


Genes associated with human pigmentation traits in Genome-Wide Association Studies (GWAS)

Although there are many issues, Genome-Wide association study (GWAS) has been a powerful method to identify genetic variants associated with phenotypic traits.  GWAS is generally used to find genetic variants associated with disease, but it also found variants associated with anthropometric traits, such as height and BMI.  Also, there are several GWAS mainly among people of European descents aiming to find genetic variants associated with pigmentation characteristics (hair, eye, and skin color, freckles, and skin sensitivity to sun or tanning ability) (e.g., Eriksson, 2010; Han, 2008; Kayser, 2008; Liu, 2010; Nan, 2009; Sulem, 2008; Sulem, 2007)   These GWAS identified variants associated with pigmentation characteristics on SLC45A2 (Chr5), IRF4 (Chr6), TYRP1 (Chr9), TYR (Chr11), KITLG (Chr12), SLC24A4 (Chr14), OCA2/HERC2 (Chr15), MC1R (Chr16), and ASIP (Chr20).  These studies showed very strong association of variants in these genes with hair color, eye color, freckles, sensitivity to the sun, and tanning ability.

However, because the skin color does not vary much in European populations, these GWAS were not very successful showing the association between genetic variants and skin pigmentation, and only one of these studies, in which people of non-European descents were included, successfully showed the association between skin color and an IRF4 variant (Han et al. 2008).

Another GWAS among South Asians demonstrated the association of skin color with variants in two genes (SLC45A2 and TYR), but the study also found that another gene, SLC24A5 (Chr15) is associated with skin color (Stokowski et al. 2007).  The association of SLC24A5 variants with skin color in African Americans has been reported previously (Lamason et al. 2005).  More recently, Kenny et al. report that an amino acid change in TYRP1 associated with blond hair among Solomon Islanders (Kenny, 2012).

Identifying these genetic variants is important not only to understand major human phenotypic variation and the mechanism of evolution of pigmentation traits, but also to find variants that may be associated with skin cancer and to understand the risk factors for vitamin D deficiency.  Because of admixture, African Americans exhibit a great range of skin color, so they are desirable for genetic study of skin pigmentation.

Eriksson, N., J. M. Macpherson, et al. (2010). “Web-Based, Participant-Driven Studies Yield Novel Genetic Associations for Common Traits.” PLoS Genet 6(6): e1000993.

Han, J., P. Kraft, et al. (2008). “A Genome-Wide Association Study Identifies Novel Alleles Associated with Hair Color and Skin Pigmentation.” PLoS Genet 4(5): e1000074.

Kayser, M., F. Liu, et al. (2008). “Three Genome-wide Association Studies and a Linkage Analysis Identify HERC2 as a Human Iris Color Gene.” American Journal of Human Genetics 82(2): 411-423

Kenny, E. E., N. J. Timpson, et al. (2012). “Melanesian Blond Hair Is Caused by an Amino Acid Change in TYRP1.” Science (New York, N.Y.) 336(6081): 554.

Lamason, R. L., M. A. Mohideen, et al. (2005). “SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans.” Science (New York, N.Y.) 310(5755): 1782-1786.

Liu, F., A. Wollstein, et al. (2010). “Digital Quantification of Human Eye Color Highlights Genetic Association of Three New Loci.” PLoS Genet 6(5): e1000934.

Nan, H., P. Kraft, et al. (2009). “Genome-Wide Association Study of Tanning Phenotype in a Population of European Ancestry.” J Invest Dermatol 129(9): 2250-2257.

Stokowski, R. P., P. V. K. Pant, et al. (2007). “A Genomewide Association Study of Skin Pigmentation in a South Asian Population.” American Journal of Human Genetics 81(6): 1119-1132.

Sulem, P., D. F. Gudbjartsson, et al. (2008). “Two newly identified genetic determinants of pigmentation in Europeans.” Nat Genet 40(7): 835-837.

Sulem, P., D. F. Gudbjartsson, et al. (2007). “Genetic determinants of hair, eye and skin pigmentation in Europeans.” Nat Genet 39(12): 1443-1452.

Imputing untyped SNPs with a program IMPUTE

IMPUTE is a program to estimate the genotype of untyped SNPs, usually in disease-SNP association studies.  Currently, commercially available whole-genome genotyping array allows genotype data for 500K to 2M single nucleotide polymorphisms (SNPs) or single nucleotide variants (SNVs), but there are more SNPs than these arrays can capture.  Whole genome sequencing is currently very costly and has an issue with accuracy determining the alleles of rare variants.  Therefore, in many cases, it is better to impute untyped SNPs.  IMPUTE and another similar program, MACH, allows imputing using the HapMap and/or 1000 Genomes data as reference.  Here, I am reviewing the IMPUTE2 for imputation using 1000 Genomes data.

As of today (1/6/2012), the most recent version, IMPUTE v2.2 beta, is available for three platforms (Windows, Mac, and Linux).  To use 1000 Genomes data for reference panel, you need to use the most recent version.  Previously, imputation was most accurately performed using combined reference data (HapMap and1000 Genomes data together).  Now, 1000 Genomes have genotype data for enough individuals from various ethnic backgrounds, so it is no longer necessary to use combined data.

A unique feature of IMPUTE2 is use of multi-population reference panels, so you do not need to choose a population that you want to use for reference panel.  The program can choose which reference haplotype to use.  Basically, population labels or information on relatedness of individuals are not used in the program, but the program looks for the haplotype sequence in reference best match the study samples.  Regardless of the ancestry, the program looks for a shared haplotype between reference and study sample, while identifying and ignoring the highly diverged haplotypes.  Then, IMPUTE2 uses that information to impute untyped or missing SNPs.  Therefore, this method is not sensitive to the ancestry composition of reference panel.   According to authors of the program, this process works well with homogeneous or admixed populations.  They also argue that genotype of low frequency alleles (MAF<0.05) can be imputed more accurately.

Imputation could be a useful method in anthropological genetics and genomics, first because we can explore the association of untyped SNPs in genome-wide association study with anthropologically interesting phenotypes, such as skin color, weight, height, etc.  Second, the untyped SNPs could be naturally selected, so as a result, SNPs show significant association with phenotypes in genome-wide association studies.

TYR and OCA2: two genes associated with skin pigmentation in African Americans

Shriver, M. D., E. J. Parra, et al. (2003). “Skin pigmentation, biogeographical ancestry and admixture mapping.” Human Genetics 112(4): 387-399.

Previously, I wrote about correlation between West African Ancestry (WAA) estimates and skin color among African Americans and African Caribbeans (here).  They used 33 ancestry informative markers (AIMs) that have large frequency differences between African and European populations.  Three of these markers are candidate genes for skin pigmentation (TYR, OCA2, and MC1R), so they examined, if these skin color candidate genes are associated with skin color (Melanin Index measure using the DermaSpectrometer).

Two pigmentation candidate genes (TYR and OCA2) and many other AIMs were associated with M Index without adjusting for WAA.  When they adjust for WAA, only TYR remained significant.  Then, they used ADMIXMAP, admixture mapping software, to find segments of genome that are associated with skin pigmentation because of the differences in their genetic ancestry.  In this analysis, TYR and OCA2 are associated with skin color, but not MC1R.

Their analyses demonstrated that two pigmentation candidate genes (TYR and OCA2) likely to cause differences in skin color between African and European populations.  TYR produces an enzyme, tyrosinase, which catalyzes the first two reactions in the melanin synthesis pathway.  Mutations in OCA2, or P gene, cause the common type of albinism.

I hope to review follow-up research projects later to further understand genes involved in production of dark skin in African and African American populations.

Evidence of ancient admixture between the Denisova and anatomically modern human from Southeast Asia, Oceania, and New Guinea

Reich, D., R. E. Green, et al. (2010). “Genetic history of an archaic hominin group from Denisova Cave in Siberia.” Nature 468(7327): 1053-1060.

Reich, D., N. Patterson, et al. (2011). “Denisova Admixture and the First Modern Human Dispersals into Southeast Asia and Oceania.” American Journal of Human Genetics 89(4): 516-528.

I went to a session that David Reich talked about his research on the Denisova during the American Society of Human Genetics annual meeting in Montreal last month and I had a chance to talk to him briefly after the session.

These articles are the results of collaboration of leading scientists, David Reich (Harvard University), Svante Paabo (Max Planck Institute), Mark Stoneking (Max Planck Institute), Montgomery Slatkin (University of California, Berkeley).  It is really a dream team of scientists.

The most important finding from this project is that they found the evidence of ancient admixture between the Denisova and anatomically modern human from Southeast Asian, New Guinea, Australia, and Oceania.  They estimated that the Denisova contributed up to 7% of genetic materials of modern people from the areas.

Considering that the Denisova was found in southern Siberia, the mechanism of interaction between the Denisova and modern human is difficult to understand.  From reading the articles and talking to David Reich, I am guessing that they considered many scenarios of interaction, but based on their available data, they believe the interaction took place in Southeast Asia.

Another important thing from this project is that now we have better understanding of the relationship between the Denisova and Neanderthals and between the Denisova and anatomically modern human.  The Denisova is more closely related to the Neanderthals than modern human, and they shared an ancestor about 640,000 years ago.  Modern human shared an ancestor with the Denisova and Neanderthals about 804,000 years ago.  The phylogenetic tree constructed from whole genome data was very different from the tree based on mtDNA genome data (See here).

Ancient genome data from archaic human is still limited, but current data favors the Multiregional model and suggests that both Denisova and Neanderthal (go here for the Neanderthal genome) contributed genetic material to the gene pool of anatomically modern human.  If we have a lot more ancient genome data, we may find evidence of substantial genetic contributions from archaic human, completely rejecting simplistic view of Out-of-Africa model.

HGDP Selection Browser: Web based tool to analyze Human Genome Diversity Project data to detect signature of positive selection

Pickrell, J. K., G. Coop, et al. (2009). “Signals of recent positive selection in a worldwide sample of human populations.” Genome Research 19(5): 826-837.

I am reviewing another web based tool, called HGDP Selection Browser, for human population genetic analysis.  The above reference is the paper that describes the analytical methods used in this web tool.  Like Haplotter, HGDP Selection Browser is very easy and it uses Human Genome Diversity Project SNP data generated by Li et al. (2008) using Illumina 650K platform for 53 populations.

It provides four statistics:

FST is estimated using AMOVA (Analysis of Molecular Variance) approach and population grouping identified by Rosenberg et al. (2002).  The –long10 of the empirical P-values are plotted.

Heterozygosity, genetic diversity of populations is compared.  When a genomic region is selected in a population, heterozygosity in the selected region of the population is reduced compared to that of other populations.

iHS (integrated Haplotype Score) is a statistic for detecting long-range haplotype.  It is based on EHH (extended haplotype homozygosity) that measures decay of identity as a function of distance.  It was designed to detect signature when the variants have not reached fixation and are in the intermediate frequency.  However, it is not sensitive to detect the selection, when selected alleles are closed to fixation.  It also loses its power, when sample size is small.

XP-EHH (Cross Population Extended Haplotype Homozygosity) is another method for detecting long-range haplotype, and it is sensitive when selective sweep is near fixation and have more power than iHS, when the sample size is small.

There are several problems using the HGDP samples.  One of the problems with HGDP data set is small sample size, so in some of the analyses, closely related populations are pooled together.  Another problem is low density of SNPs genotyped compared to HapMap data set.  Also, we have to consider possible effects of ascertainment bias, nonrandom population sampling, etc.