Imputing untyped SNPs with a program IMPUTE

IMPUTE is a program to estimate the genotype of untyped SNPs, usually in disease-SNP association studies.  Currently, commercially available whole-genome genotyping array allows genotype data for 500K to 2M single nucleotide polymorphisms (SNPs) or single nucleotide variants (SNVs), but there are more SNPs than these arrays can capture.  Whole genome sequencing is currently very costly and has an issue with accuracy determining the alleles of rare variants.  Therefore, in many cases, it is better to impute untyped SNPs.  IMPUTE and another similar program, MACH, allows imputing using the HapMap and/or 1000 Genomes data as reference.  Here, I am reviewing the IMPUTE2 for imputation using 1000 Genomes data.

As of today (1/6/2012), the most recent version, IMPUTE v2.2 beta, is available for three platforms (Windows, Mac, and Linux).  To use 1000 Genomes data for reference panel, you need to use the most recent version.  Previously, imputation was most accurately performed using combined reference data (HapMap and1000 Genomes data together).  Now, 1000 Genomes have genotype data for enough individuals from various ethnic backgrounds, so it is no longer necessary to use combined data.

A unique feature of IMPUTE2 is use of multi-population reference panels, so you do not need to choose a population that you want to use for reference panel.  The program can choose which reference haplotype to use.  Basically, population labels or information on relatedness of individuals are not used in the program, but the program looks for the haplotype sequence in reference best match the study samples.  Regardless of the ancestry, the program looks for a shared haplotype between reference and study sample, while identifying and ignoring the highly diverged haplotypes.  Then, IMPUTE2 uses that information to impute untyped or missing SNPs.  Therefore, this method is not sensitive to the ancestry composition of reference panel.   According to authors of the program, this process works well with homogeneous or admixed populations.  They also argue that genotype of low frequency alleles (MAF<0.05) can be imputed more accurately.

Imputation could be a useful method in anthropological genetics and genomics, first because we can explore the association of untyped SNPs in genome-wide association study with anthropologically interesting phenotypes, such as skin color, weight, height, etc.  Second, the untyped SNPs could be naturally selected, so as a result, SNPs show significant association with phenotypes in genome-wide association studies.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s