What does Cavalli-Sforza say about biological basis of human race?

Cavalli-Sforza, L. L. (2005). “The Human Genome Diversity Project: past, present and future.” Nature Reviews: Genetics 6: 333-340.

Cavalli-Sforza, L. L., R. Menozzi, et al. (1994). The History and Geography of Human Genes. Princeton, NJ, Princeton University Press.

Cavalli-Sforza’s research projects and Human Genome Diversity Project (HGDP) are often criticized by social scientists and specialists in bioethics, because of the danger of scientific racism.  Cavalli-Sforza (2005) respond back to these and says “Concern that HGDP data would feed ‘scientific racism’ was also expressed by naïve observers, despite the fact that half a century of research into human variation has supported the opposite point of view – that there is no scientific basis for racism.”  Here are quotes from the bock, “The History and Geography of Human Genes” by Cavalli-Sforza and his colleagues (1994) that show Cavalli-Sforza’s position on human race.

“Human races are still extremely unstable entities in the bands of modern taxonomists…”  He thinks how many racial groups you have is subjective and depends on personal preference of researchers who like to lump many populations together or split into many groups.  Also, he recognizes the great variation exists within any human populations.  “As one goes down the scale of the taxonomic hierarchy toward the lower and lower partitions, the boundaries between clusters become even less clear…There is great genetic variation in all populations, even in small ones.”

“From a scientific point of view, the concept of race has failed to obtain any consensus…the major stereotypes, all based on skin color, hair color and form, and facial traits, reflect superficial differences that are not confirmed by deeper analysis with more reliable genetic traits and whose origin dates from recent evolution mostly under the effect of climate and perhaps sexual selection.”

To study human evolution, Cavalli-Sforza and his colleagues use clustering approach with phylogenetic trees.  The phylogenetic trees offer “a simple graphic aid for visualizing those relationships [relationship between different populations] and a path to infer the possible evolutionary history behind them.” However, the clusters they identity in phylogenetic trees are not same as racial groups, because there is not enough genetic differences among human groups.  He says “…we can identify ‘clusters’ of populations and order them in a hierarchy that we believe represents the history of fissions in the expansion to the whole world of anatomically modern humans.  At no level can clusters be identified with races…there is no discontinuity that might tempt us to consider a certain level as a reasonable, though arbitrary, threshold for race distinction.  Minor changes in the genes or methods used shift some populations from one cluster to the other.”

As shown above, despite the criticisms, Cavalli-Sforza believes the arbitrariness of racial classification which is not supported by genetic data.  In his 2005 article, he emphasizes the importance of HGDP in understanding human history and biomedical research.  It is very clear that Cavelli-Sforza is against racial classification, but why is he often criticized by many people?

DNA genome of an unknown hominin from southern Siberia

Krause, J., Q. Fu, et al. (2010). “The complete mitochondrial DNA genome of an unknown hominin from southern Siberia.” Nature advance online publication.

This article is very interesting and also covered by Anthropology.net and Prancing Papio.  I believe the research findings presented in this article provide an interesting perspective on the human evolution and genetic diversity existed in the past.

The complete mitochondrial DNA genome of the fossil remain from Denisova Cave in the Altai region of Russia dated to 48 to 30 kyr ago was analyzed.  Their results of analyses show that the Denisova individual was genetically very different from Neanderthals or modern humans.  An average of nucleotide position differences was 385 between the Denisova individual and modern human, which is about twice as many difference between Neanderthals and modern human (202 positions) (Figure 2).

The phylogenetic treesof complete mtDNA show that the ancestors of the Denisova individual sprit from the ancestors of Neanderthals and modern human, before archaic human lineages began diverge (Figure 3).  TMRCA of all three lineages is about one million years ago (mean=1,04,900 with 95% C.I. ranging 779,300-1,313,500).

So, who is this Denisova individual?  Home erectus left Africa and around 1.9 myr ago and was in Asia by 1.7 myr ago, so the Denisova individual was probably not H. erectus (TMRC of three lineages is about one myr ago.  That is after H. erectus spread into East Asia).  If the Denisova is H. erectus much older TMRCA is expected (> 1.9 myr?).  Homo heidelbergensis, probable ancestors of Neantherthals, emerged after divergence of three lineages.  However, since the 95% C.I. of TMRCA slightly overlaps with the time that H. heidelbergensis existed, so we cannot reject the hypothesis of the Denisova individual = a descendant of H. heidelbergenesis, but if H. heidelbergenesis were ancestors of Neanderthals, the ancestors of Neanderthals and the Denisova individual were genetically quit different.

The findings from this project generally support Huff et al. (2010) and these two projects have shown that great genetic diversity existed in the past (> 30,000 years ago).  It is very interesting that there were many species or subspecies of Home may have co-existed in some parts of the world.  Around time the Denisova individual lived, there is also possible existence of Neanderthal and anatomically modern human in the area (Don’t forget H. erectus existed in East Asia about same time).  However, only anatomically modern human survived and others disappeared without leaving clear genetic evidence of ancient admixture.

Update (April 1, 2010)

I forgot about H. ergaster and that is another possibility in addition to H. heidelbergensis.  If we believe that Asian H. erectus was a different species from African H. ergaster who were direct ancestors of H. heidelbergenesis, H. neanderthalensis, and H. sapiens, the Denisova individual could be a descendant of H. ergaster who took very different evolutionary path from Neanderthals and Anatomically modern human.  The 95% C.I. of TMRCA (1.-0.7) also slightly overlap with the time H. ergaster existed in East Africa (1.8-1.3 mya).  If we believe this scenario, first there was an out of Africa event of H. erectus into Asia and then another out of Africa event of H. ergaster into Western Eurasia.  However, TMRCA is too young for Asian H. erectus and all others to share the common ancestor that recent, so mtDNA of the Denisova individual is not that of H. erectus.  Of course, we are talking about only maternal side of evolutionary history.

Updata (April 3, 2010)

I considered the possibility of an unsampled Neanderthal, but I thought that the TMRCA is too old, considering that Neanderthals analyzed so far is genetically not diverse and effects of drift affecting mtDNA is strong because of small effective population size of mtDNA.  If, in fact, the Denisova individual was a Neanderthal, Neanderthal was genetically much more diverse than many genetic researchers thought and phylogenetic tree suggests that Neanderthals were ancestors of modern human.   Judging from the genetic evidence we have, this is unlikely scenario.  Of course, we should not conclude that the Denisova individual was not Neanderthals, because we do not know enough about this individual or human evolution.

Mobile elements reveal small population size in the ancient ancestors of Homo sapiens

Huff, C. D., J. Xing, et al. (2010). “Mobile elements reveal small population size in the ancient ancestors of Homo sapiens.” Proceedings of the National Academy of Sciences 107(5): 2147-2152

Huff et al. (2010) analyzed genome variation of two samples, focusing on the SNPs around the mobile element insertion areas.  The theory behind this project is that mobile element insertions (Alu and LINE1) are much rarer, so they have deep genealogies (ancient coalescent time). 

Their research basically supports this theoretical point.  First, TMRCA estimated based on 9,609 SNPs in the 10 kb around insertion was 462 k years old, which is older than the TMRCA estimated from other genomic regions.  Second, more interestingly, they estimated significantly larger ancient effective population size than modern effective population size.  They used a coalescent-Maximum likelihood based method to estimate three demographic parameters.

Modern effective population = 8,500

Ancient effective population = 18,500 (C.I. 14,500-26,000)

Time of population size change = 1.2 M years

This means that effective population size before 1.2 M years ago was 18,500.  The small effective population size of modern human support many previous genetic studies, but it is interesting to see that modern human have genetic evidence that suggests that ancestors of modern human, such as Homo erectus, had much larger effective population size and they were much more genetically diverse than anatomically modern human.  Since effective population size of modern humans is much smaller than Chimpanzee, it has been suggested that our ancestors experienced series of bottleneck, but this research data show the significant reduction in the population size occurred after 1.2 M years ago.  Jorde actually said in the NIH Genome Center Lecture series that our ancestors almost became extinct.

Principal Component Analysis (PCA), Part 2: Do principal component analyses reveal human migration events?

Novembre, J. and M. Stephens (2008). “Interpreting principal component analyses of spatial population genetic variation.” Nat Genet 40(5): 646-649.

 Reich, D., A. L. Price, et al. (2008). “Principal component analysis of genetic data.” Nat Genet 40(5): 491-492.

 Principal component analysis (PCA) using such programs as EIGENSTRAT program developed by Price et al. (2006) is often employed for reconstruction of human migration events.  However, many researchers warned against simplistic interpretation of PCA.

Although PCA is useful for detecting population structure and controlling for population structure in association studies, Novembre and Stephens (2008) argues that gradient and wave patterns that Cavalli-Sforza and his colleagues observed on their PC maps can be created by mathematical artifacts.  Cavalli-Sforza and his colleagues created PC maps using principal component values and they hypothesized that gradient and wave pattern in Europe was created by the Neolithic demic expansion.  Based on computer simulations, Novembre and Stephens argue that in addition to population structure, geographical distribution of samples and amounts data affect PC map and isolation-by-distance model as well as ancient migration events produce similar PC maps.

Reich et al. (2008), who developed EIGENSTRAT program, respond to Novembre and Stephens (2008) by saying that PCA is still useful for reconstruction of human migration history, but it requires integration of archaeological, anthropological, linguistic, and geographical data.  Unfortunately, this may lead to circular arguments by human geneticists, archaeologists, and linguists citing each other for support of their owe findings without testing hypothesis.

Principal Component Analysis (PCA)

Price, A. L., N. J. Patterson, et al. (2006). “Principal components analysis corrects for stratification in genome-wide association studies.” Nat Genet 38(8): 904-909.

Principal Component Analysis (PCA) is another commonly used method to detect population structure and understand human population history.  PCA is a method that summarizes high dimensional genetic data into plots with minimum loss of the data.  The plots created are believed to show the genetic relationship of the populations analyzed and past migration events, as Cavalli-Sforza and his colleagues demonstrated over several decades.   PCA was originally applied to human population genetic studies as an alternative method of phylogenetic trees.  The relationship between human populations cannot be analyzed with phylogenetic trees properly, because several populations can be derived from a single population and gene flow is very common.  These problems can be avoided with PCA and other related methods, such as correspondence analysis and multidimensional scaling analysis.  

Price et al. (2006) developed a new program EIGENSTRAT.  Traditional PCA uses populations as sample units and allele frequencies of populations are analyzed to plot the populations on the graph.  On the other hand, EIGENSTRAT focuses on individual genotypic data and plot individuals on the graph.   As for STRUCTURE (Pritchard et al., 2000; Falush et al., 2003), EIGENSTRAT was developed to control population structure for association studies and test the correlation between genotype and phenotype after adjusting genetic ancestry of the individuals.

Again, I do not know much about biostatistics and population genetic theory, and I do not know what the underlying assumptions are.  My guess is there is no important assumption that seriously affects the method.  If anyone knows important assumptions that negatively affect the method and interpretations, please let me know.

However, interpretation of data is very subjective.  To understand if the plots show evidence of ancient migration events, you have to look at archaeological and linguistic evidence and there is no good way to access if the proposed correlations between PCA plot patterns and prehistoric migration events are real.

Non-Existence of Human Races: The Apportionment of Human Diversity

Lewontin, R. D. (1973). “The apportionment of human diversity.” Evolutionary Biology 6: 381-397.

Lewontin, an evolutionary biologist, analyzed variation of17 genes among human to examine biological basis of racial classification.  Based on non-genetic factor, such as linguistic, historical, cultural, and morphological data, he uses 7 racial categories (Caucasians, Black Africans, Mongoloids, South Asian Aborigines, Amerinds, Oceanians, and Australian Aborigines), but it was not clear how exactly he came up with these categories.

His analysis shows that humans are genetically very similar to each other, even when they come from different populations or races.  From his analysis, he found that most genetic variation is found within each population (85.4%).  How you categorize human populations into racial groups influences between-race genetic variation, but he estimated that only 6.3% of human genetic variation is explained by differences among human racial groups.  He concludes that “Since such racial classification is now seen to be of virtually no genetic or taxonomic significance either, no justification can be offered for its continuance.”

Many biological anthropologists and human geneticists use population genetic analytical approach similar to Lewontin’s to show that the greatest genetic variation is found within each population and genetic differences between populations and racial groups are very small.

It seems like he think that human populations are not reproductively isolated and gene flow was common between different populations and racial groups.  “…we must conclude that there is no internal evidence that sparse aboriginal populations are more genetically isolated from their neighbors than are more continuously distributed large races.”  Lewontin’s view supports Livingstone’s view (Livingstone, 1962), but Neil Risch, Marcus Feldman, and L.L. Cavalli-Sforza (interestingly they were in Stanford University together at some points of time) view that endogamy was more common.

What does Neil Risch think about race? Part 2

This is the Part 2 of “What does Neil Risch think about race?”  Check part 1 here .

Cluster or clines?

Risch’s racial categorization based on continental groups represented in the evolutionary tree (Figure 1) seems phylogenetic or cladistic one.  Race, or subspecies, in biology and systematics implies that lack of gene flow between racial groups causes for each of them to become significantly different from each other, without intermediate populations.  Risch et al. (2002) recognize this problem and state “migrations have blurred the strict continental boundaries.”  They also admit that populations that occupy at the boundaries of the continental divisions, such as Ethiopians and Somali, are difficult to categorize and Ethiopians and North Africans tend to be admixture of sub-Saharan Africans and Caucasians.  They insist that “the existence of such intermediate groups should not, however, overshadow that fact that the greatest genetic structure that exists in the human population occurs at the racial level.”  Maybe, he still has typological thinking.

Does he understand social and cultural aspects of racial and ethnic groups?

I believe he is a great geneticist, but maybe he does not understand social and cultural aspects of race/ethnicity, probably because he was trained in mathematics and population genetics.  For example, it seems that he assumes that human groups, racial/ethnic groups tend to be endogamous.  Risch et al. says “From the genetic perspective, the important concept if mating pattern, and the degree to which racially or ethnically defined groups remains endogamous.”  However, when they talk about genetic categorization in the U.S., they say

“…mating patterns are far from random.  The tendency toward endogamy is reflected within the 2000 US Census…97.6% of subjects reported themselves to be of one race, while 2.4% reported themselves to be of more than one race…”

He is right about non-random mating in the U.S., but from census report you can understand social constructed racial/ethnic identity, not accurate mating pattern.  Mixed race individuals are underestimated, because they tend to pick only one.

Summary of his arguments

  1. Racial/ethnic categories are important for medical research.
  2. Genetic data do not support genetic sameness argument.
  3. Although his racial categorization seems typological, they are not typological, are operational.