Principal Component Analysis (PCA)

Price, A. L., N. J. Patterson, et al. (2006). “Principal components analysis corrects for stratification in genome-wide association studies.” Nat Genet 38(8): 904-909.

Principal Component Analysis (PCA) is another commonly used method to detect population structure and understand human population history.  PCA is a method that summarizes high dimensional genetic data into plots with minimum loss of the data.  The plots created are believed to show the genetic relationship of the populations analyzed and past migration events, as Cavalli-Sforza and his colleagues demonstrated over several decades.   PCA was originally applied to human population genetic studies as an alternative method of phylogenetic trees.  The relationship between human populations cannot be analyzed with phylogenetic trees properly, because several populations can be derived from a single population and gene flow is very common.  These problems can be avoided with PCA and other related methods, such as correspondence analysis and multidimensional scaling analysis.  

Price et al. (2006) developed a new program EIGENSTRAT.  Traditional PCA uses populations as sample units and allele frequencies of populations are analyzed to plot the populations on the graph.  On the other hand, EIGENSTRAT focuses on individual genotypic data and plot individuals on the graph.   As for STRUCTURE (Pritchard et al., 2000; Falush et al., 2003), EIGENSTRAT was developed to control population structure for association studies and test the correlation between genotype and phenotype after adjusting genetic ancestry of the individuals.

Again, I do not know much about biostatistics and population genetic theory, and I do not know what the underlying assumptions are.  My guess is there is no important assumption that seriously affects the method.  If anyone knows important assumptions that negatively affect the method and interpretations, please let me know.

However, interpretation of data is very subjective.  To understand if the plots show evidence of ancient migration events, you have to look at archaeological and linguistic evidence and there is no good way to access if the proposed correlations between PCA plot patterns and prehistoric migration events are real.

Advertisement

One Response to Principal Component Analysis (PCA)

  1. [...] component analysis (PCA) using such programs as EIGENSTRAT program developed by Price et al. (2006) is often employed for reconstruction of human migration events.  However, many researchers warned [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.