What does Cavalli-Sforza say about biological basis of human race?

Cavalli-Sforza, L. L. (2005). “The Human Genome Diversity Project: past, present and future.” Nature Reviews: Genetics 6: 333-340.

Cavalli-Sforza, L. L., R. Menozzi, et al. (1994). The History and Geography of Human Genes. Princeton, NJ, Princeton University Press.

Cavalli-Sforza’s research projects and Human Genome Diversity Project (HGDP) are often criticized by social scientists and specialists in bioethics, because of the danger of scientific racism.  Cavalli-Sforza (2005) respond back to these and says “Concern that HGDP data would feed ‘scientific racism’ was also expressed by naïve observers, despite the fact that half a century of research into human variation has supported the opposite point of view – that there is no scientific basis for racism.”  Here are quotes from the bock, “The History and Geography of Human Genes” by Cavalli-Sforza and his colleagues (1994) that show Cavalli-Sforza’s position on human race.

“Human races are still extremely unstable entities in the bands of modern taxonomists…”  He thinks how many racial groups you have is subjective and depends on personal preference of researchers who like to lump many populations together or split into many groups.  Also, he recognizes the great variation exists within any human populations.  “As one goes down the scale of the taxonomic hierarchy toward the lower and lower partitions, the boundaries between clusters become even less clear…There is great genetic variation in all populations, even in small ones.”

“From a scientific point of view, the concept of race has failed to obtain any consensus…the major stereotypes, all based on skin color, hair color and form, and facial traits, reflect superficial differences that are not confirmed by deeper analysis with more reliable genetic traits and whose origin dates from recent evolution mostly under the effect of climate and perhaps sexual selection.”

To study human evolution, Cavalli-Sforza and his colleagues use clustering approach with phylogenetic trees.  The phylogenetic trees offer “a simple graphic aid for visualizing those relationships [relationship between different populations] and a path to infer the possible evolutionary history behind them.” However, the clusters they identity in phylogenetic trees are not same as racial groups, because there is not enough genetic differences among human groups.  He says “…we can identify ‘clusters’ of populations and order them in a hierarchy that we believe represents the history of fissions in the expansion to the whole world of anatomically modern humans.  At no level can clusters be identified with races…there is no discontinuity that might tempt us to consider a certain level as a reasonable, though arbitrary, threshold for race distinction.  Minor changes in the genes or methods used shift some populations from one cluster to the other.”

As shown above, despite the criticisms, Cavalli-Sforza believes the arbitrariness of racial classification which is not supported by genetic data.  In his 2005 article, he emphasizes the importance of HGDP in understanding human history and biomedical research.  It is very clear that Cavelli-Sforza is against racial classification, but why is he often criticized by many people?

DNA genome of an unknown hominin from southern Siberia

Krause, J., Q. Fu, et al. (2010). “The complete mitochondrial DNA genome of an unknown hominin from southern Siberia.” Nature advance online publication.

This article is very interesting and also covered by Anthropology.net and Prancing Papio.  I believe the research findings presented in this article provide an interesting perspective on the human evolution and genetic diversity existed in the past.

The complete mitochondrial DNA genome of the fossil remain from Denisova Cave in the Altai region of Russia dated to 48 to 30 kyr ago was analyzed.  Their results of analyses show that the Denisova individual was genetically very different from Neanderthals or modern humans.  An average of nucleotide position differences was 385 between the Denisova individual and modern human, which is about twice as many difference between Neanderthals and modern human (202 positions) (Figure 2).

The phylogenetic treesof complete mtDNA show that the ancestors of the Denisova individual sprit from the ancestors of Neanderthals and modern human, before archaic human lineages began diverge (Figure 3).  TMRCA of all three lineages is about one million years ago (mean=1,04,900 with 95% C.I. ranging 779,300-1,313,500).

So, who is this Denisova individual?  Home erectus left Africa and around 1.9 myr ago and was in Asia by 1.7 myr ago, so the Denisova individual was probably not H. erectus (TMRC of three lineages is about one myr ago.  That is after H. erectus spread into East Asia).  If the Denisova is H. erectus much older TMRCA is expected (> 1.9 myr?).  Homo heidelbergensis, probable ancestors of Neantherthals, emerged after divergence of three lineages.  However, since the 95% C.I. of TMRCA slightly overlaps with the time that H. heidelbergensis existed, so we cannot reject the hypothesis of the Denisova individual = a descendant of H. heidelbergenesis, but if H. heidelbergenesis were ancestors of Neanderthals, the ancestors of Neanderthals and the Denisova individual were genetically quit different.

The findings from this project generally support Huff et al. (2010) and these two projects have shown that great genetic diversity existed in the past (> 30,000 years ago).  It is very interesting that there were many species or subspecies of Home may have co-existed in some parts of the world.  Around time the Denisova individual lived, there is also possible existence of Neanderthal and anatomically modern human in the area (Don’t forget H. erectus existed in East Asia about same time).  However, only anatomically modern human survived and others disappeared without leaving clear genetic evidence of ancient admixture.

Update (April 1, 2010)

I forgot about H. ergaster and that is another possibility in addition to H. heidelbergensis.  If we believe that Asian H. erectus was a different species from African H. ergaster who were direct ancestors of H. heidelbergenesis, H. neanderthalensis, and H. sapiens, the Denisova individual could be a descendant of H. ergaster who took very different evolutionary path from Neanderthals and Anatomically modern human.  The 95% C.I. of TMRCA (1.-0.7) also slightly overlap with the time H. ergaster existed in East Africa (1.8-1.3 mya).  If we believe this scenario, first there was an out of Africa event of H. erectus into Asia and then another out of Africa event of H. ergaster into Western Eurasia.  However, TMRCA is too young for Asian H. erectus and all others to share the common ancestor that recent, so mtDNA of the Denisova individual is not that of H. erectus.  Of course, we are talking about only maternal side of evolutionary history.

Updata (April 3, 2010)

I considered the possibility of an unsampled Neanderthal, but I thought that the TMRCA is too old, considering that Neanderthals analyzed so far is genetically not diverse and effects of drift affecting mtDNA is strong because of small effective population size of mtDNA.  If, in fact, the Denisova individual was a Neanderthal, Neanderthal was genetically much more diverse than many genetic researchers thought and phylogenetic tree suggests that Neanderthals were ancestors of modern human.   Judging from the genetic evidence we have, this is unlikely scenario.  Of course, we should not conclude that the Denisova individual was not Neanderthals, because we do not know enough about this individual or human evolution.

Mobile elements reveal small population size in the ancient ancestors of Homo sapiens

Huff, C. D., J. Xing, et al. (2010). “Mobile elements reveal small population size in the ancient ancestors of Homo sapiens.” Proceedings of the National Academy of Sciences 107(5): 2147-2152

Huff et al. (2010) analyzed genome variation of two samples, focusing on the SNPs around the mobile element insertion areas.  The theory behind this project is that mobile element insertions (Alu and LINE1) are much rarer, so they have deep genealogies (ancient coalescent time). 

Their research basically supports this theoretical point.  First, TMRCA estimated based on 9,609 SNPs in the 10 kb around insertion was 462 k years old, which is older than the TMRCA estimated from other genomic regions.  Second, more interestingly, they estimated significantly larger ancient effective population size than modern effective population size.  They used a coalescent-Maximum likelihood based method to estimate three demographic parameters.

Modern effective population = 8,500

Ancient effective population = 18,500 (C.I. 14,500-26,000)

Time of population size change = 1.2 M years

This means that effective population size before 1.2 M years ago was 18,500.  The small effective population size of modern human support many previous genetic studies, but it is interesting to see that modern human have genetic evidence that suggests that ancestors of modern human, such as Homo erectus, had much larger effective population size and they were much more genetically diverse than anatomically modern human.  Since effective population size of modern humans is much smaller than Chimpanzee, it has been suggested that our ancestors experienced series of bottleneck, but this research data show the significant reduction in the population size occurred after 1.2 M years ago.  Jorde actually said in the NIH Genome Center Lecture series that our ancestors almost became extinct.

Principal Component Analysis (PCA), Part 2: Do principal component analyses reveal human migration events?

Novembre, J. and M. Stephens (2008). “Interpreting principal component analyses of spatial population genetic variation.” Nat Genet 40(5): 646-649.

 Reich, D., A. L. Price, et al. (2008). “Principal component analysis of genetic data.” Nat Genet 40(5): 491-492.

 Principal component analysis (PCA) using such programs as EIGENSTRAT program developed by Price et al. (2006) is often employed for reconstruction of human migration events.  However, many researchers warned against simplistic interpretation of PCA.

Although PCA is useful for detecting population structure and controlling for population structure in association studies, Novembre and Stephens (2008) argues that gradient and wave patterns that Cavalli-Sforza and his colleagues observed on their PC maps can be created by mathematical artifacts.  Cavalli-Sforza and his colleagues created PC maps using principal component values and they hypothesized that gradient and wave pattern in Europe was created by the Neolithic demic expansion.  Based on computer simulations, Novembre and Stephens argue that in addition to population structure, geographical distribution of samples and amounts data affect PC map and isolation-by-distance model as well as ancient migration events produce similar PC maps.

Reich et al. (2008), who developed EIGENSTRAT program, respond to Novembre and Stephens (2008) by saying that PCA is still useful for reconstruction of human migration history, but it requires integration of archaeological, anthropological, linguistic, and geographical data.  Unfortunately, this may lead to circular arguments by human geneticists, archaeologists, and linguists citing each other for support of their owe findings without testing hypothesis.

Principal Component Analysis (PCA)

Price, A. L., N. J. Patterson, et al. (2006). “Principal components analysis corrects for stratification in genome-wide association studies.” Nat Genet 38(8): 904-909.

Principal Component Analysis (PCA) is another commonly used method to detect population structure and understand human population history.  PCA is a method that summarizes high dimensional genetic data into plots with minimum loss of the data.  The plots created are believed to show the genetic relationship of the populations analyzed and past migration events, as Cavalli-Sforza and his colleagues demonstrated over several decades.   PCA was originally applied to human population genetic studies as an alternative method of phylogenetic trees.  The relationship between human populations cannot be analyzed with phylogenetic trees properly, because several populations can be derived from a single population and gene flow is very common.  These problems can be avoided with PCA and other related methods, such as correspondence analysis and multidimensional scaling analysis.  

Price et al. (2006) developed a new program EIGENSTRAT.  Traditional PCA uses populations as sample units and allele frequencies of populations are analyzed to plot the populations on the graph.  On the other hand, EIGENSTRAT focuses on individual genotypic data and plot individuals on the graph.   As for STRUCTURE (Pritchard et al., 2000; Falush et al., 2003), EIGENSTRAT was developed to control population structure for association studies and test the correlation between genotype and phenotype after adjusting genetic ancestry of the individuals.

Again, I do not know much about biostatistics and population genetic theory, and I do not know what the underlying assumptions are.  My guess is there is no important assumption that seriously affects the method.  If anyone knows important assumptions that negatively affect the method and interpretations, please let me know.

However, interpretation of data is very subjective.  To understand if the plots show evidence of ancient migration events, you have to look at archaeological and linguistic evidence and there is no good way to access if the proposed correlations between PCA plot patterns and prehistoric migration events are real.

Non-Existence of Human Races: The Apportionment of Human Diversity

Lewontin, R. D. (1973). “The apportionment of human diversity.” Evolutionary Biology 6: 381-397.

Lewontin, an evolutionary biologist, analyzed variation of17 genes among human to examine biological basis of racial classification.  Based on non-genetic factor, such as linguistic, historical, cultural, and morphological data, he uses 7 racial categories (Caucasians, Black Africans, Mongoloids, South Asian Aborigines, Amerinds, Oceanians, and Australian Aborigines), but it was not clear how exactly he came up with these categories.

His analysis shows that humans are genetically very similar to each other, even when they come from different populations or races.  From his analysis, he found that most genetic variation is found within each population (85.4%).  How you categorize human populations into racial groups influences between-race genetic variation, but he estimated that only 6.3% of human genetic variation is explained by differences among human racial groups.  He concludes that “Since such racial classification is now seen to be of virtually no genetic or taxonomic significance either, no justification can be offered for its continuance.”

Many biological anthropologists and human geneticists use population genetic analytical approach similar to Lewontin’s to show that the greatest genetic variation is found within each population and genetic differences between populations and racial groups are very small.

It seems like he think that human populations are not reproductively isolated and gene flow was common between different populations and racial groups.  “…we must conclude that there is no internal evidence that sparse aboriginal populations are more genetically isolated from their neighbors than are more continuously distributed large races.”  Lewontin’s view supports Livingstone’s view (Livingstone, 1962), but Neil Risch, Marcus Feldman, and L.L. Cavalli-Sforza (interestingly they were in Stanford University together at some points of time) view that endogamy was more common.

What does Neil Risch think about race? Part 2

This is the Part 2 of “What does Neil Risch think about race?”  Check part 1 here .

Cluster or clines?

Risch’s racial categorization based on continental groups represented in the evolutionary tree (Figure 1) seems phylogenetic or cladistic one.  Race, or subspecies, in biology and systematics implies that lack of gene flow between racial groups causes for each of them to become significantly different from each other, without intermediate populations.  Risch et al. (2002) recognize this problem and state “migrations have blurred the strict continental boundaries.”  They also admit that populations that occupy at the boundaries of the continental divisions, such as Ethiopians and Somali, are difficult to categorize and Ethiopians and North Africans tend to be admixture of sub-Saharan Africans and Caucasians.  They insist that “the existence of such intermediate groups should not, however, overshadow that fact that the greatest genetic structure that exists in the human population occurs at the racial level.”  Maybe, he still has typological thinking.

Does he understand social and cultural aspects of racial and ethnic groups?

I believe he is a great geneticist, but maybe he does not understand social and cultural aspects of race/ethnicity, probably because he was trained in mathematics and population genetics.  For example, it seems that he assumes that human groups, racial/ethnic groups tend to be endogamous.  Risch et al. says “From the genetic perspective, the important concept if mating pattern, and the degree to which racially or ethnically defined groups remains endogamous.”  However, when they talk about genetic categorization in the U.S., they say

“…mating patterns are far from random.  The tendency toward endogamy is reflected within the 2000 US Census…97.6% of subjects reported themselves to be of one race, while 2.4% reported themselves to be of more than one race…”

He is right about non-random mating in the U.S., but from census report you can understand social constructed racial/ethnic identity, not accurate mating pattern.  Mixed race individuals are underestimated, because they tend to pick only one.

Summary of his arguments

  1. Racial/ethnic categories are important for medical research.
  2. Genetic data do not support genetic sameness argument.
  3. Although his racial categorization seems typological, they are not typological, are operational. 

What does Neil Risch think about race? Part 1

Gitschier, J. (2005). “The Whole Side of It—An Interview with Neil Risch.” PLoS Gen 1(1): e14.

Risch, N., E. Burchard, et al. (2002). “Categorization of humans in biomedical research: genes, race and disease.” Genome Biology 3(7): comment2007.2001-2007.2012.

Neil Risch is a genetic epidemiologist who has studied genetic difference among racial/ethnic groups to understand the confounding effects of differences in genetic ancestry on association studies.  Risch and his colleagues have shown that small but enough genetic differences exist among racial/ethnic groups (Tang et al., 2005).  To examine what Neil Risch’s position is on racial/ethnic classification and existence of biological race, in addition to his 2005 article (Tang et al., 2005), I reviewed an article by Risch et al. (2002) (this article is covered by Matilda’s anthropology blog) and Neil Risch’s interview by Jane Gitschier, human geneticists and PLOS editor (currently both Risch and Gitschier are in U of California, San Francisco).

Goals of his research from studying genetic differences among racial/ethnic groups

As a genetic epidemiologist, he is interested in finding disease causing genes and underlying genetic differences affecting drug response.  The gene variants can be common among all racial groups or within a particular racial group, but these biomedical research projects are confounded by genetic differences among human groups and underrepresentation of racial minority individuals in biomedical research projects.  He is strongly against the idea of racial inequality (I understand his position, because he is an Ashkenazi Jew).  He says “What is not scientific is a value system attached to any such findings [genetic differences between races and ethnic groups].  Great abuse has occurred in the past with notions of ‘genetic superiority’ of one particular group over another” (Risch et al 2002).  In the interview, he says “The problem is that others could use that information [his research data showing genetic differences between races and ethnic groups] to create division.”

Definition of race

Clearly, the problem is the definition of race.  Gitschier asks Risch about the genetic basis of race and Risch responds back with questions “What is your definition of race?”  He continues, “Scientists always disagree!…In our own studies, to avoid coming up with our own definition of race, we tend to use the definition others have employed, for example, the US census definition of race.  There is also the concept of the major geographical structuring that exists in human populations-continental division-which has led to genetic differentiation.”  You have to note that ‘race’ defined based on the US census definition and based on continental division are very different.  Risch knows the problem and says “Any category you come up with is going to be imperfect.”

In the 2002 article, Risch tries to clarify how he uses the terms, race, ethnicity, and ancestry.  Race is defined “on the basis of the primary continent of origin…”  Risch et al. list five groups; sub-sharan Africans, Caucasians (European, Middle Eastern, North Africans, and Indians), Asian, Pacific Islanders (Australian Aborigines, New Guineans, Melanesians, and Polynesians), and Native American.  Ethnicity is “a self-defined construct that may be based on geographic, social, cultural and religious grounds.”  Ancestry is “the race/ethnicity of an individual’s ancestors, whatever the individual’s current affiliation.”

I wonder if ‘race’ is the best terminology to describe continental groups, because it has many negative connotations and it implies typology.  Many other human geneticists avoid using the term.  I also wonder if the evolutionary tree of the five human geographic groups (Figure 1 in Risch et al., 2002) is realistic representation of human genetic variation and evolutionary history that summarizes the history of migration, gene flow, genetic drift, selection, etc.  Is this classification system socially and culturally influenced? 


Figure 1 (Risch et al., 2002)

Nonetheless, racial/ethnic categorization seems to be operational, rather than typological, or sampling strategies that can be used for biomedical research projects.  By using racial/ethnic categories, underlying genetic variation and socio-cultural factors affecting common diseases can be understood.

Non-Existence of Human Races

Livingstone, F. B. (1962). “On the non-existence of human races.” Current Anthropology 3: 279-281.

This is a classic article by Frank Livingstone, one of the founders of anthropological genetics in the US.  In this article, he argues that although there are genetic differences between different human populations, there is no biological basis for racial classification and human biological variations is best explained by clines created through selection and gene flow.  He says “Variation is concordant if the geographic variation of the genetic characters is correlated, so that a classification based on one character would reflect the variability in any other.”  So, from a phylogenetic or cladistic point view, human races do not exist. 

There is growing evidence suggesting that there are genetic differences among human groups (e.g., Tang et al, 2005), but the basic point that Livingstone argues is still valid and many anthropological geneticists (e.g., Weiss and Long, 2009) still support his view.  The clines explain human genetic variation better than clustering, especially when intermediate populations are included or individuals rather than populations is focus of analyses.

Does human biological race exist? Part 2

Tang, H., T. Quertermous, et al. (2005). “Genetic Structure, Self-Identified Race/Ethnicity, and Confounding in Case-Control Association Studies.” American Journal of Human Genetics 76: 268-275.

Tang et al. (2005) analyzed 326 microsatellite markers of 3,636 individuals from varying racial/ethnic groups in 15 different locations to examine the correlation between racial/ethnic identity and genetic ancestry.  The racial/ethnic groups included for analysis are white, African American, Hispanic, and Asians (Chinese, including Taiwanese from Taiwan, and Japanese). 

Their analyses show very high correspondence between racial/ethnic identity and genetic ancestry.  Using genetic ancestry, more than 99% of individuals were correctly categorized into self-identified racial categories.  They found the clustering of racial/ethnic groups on the multidimensional plots based genetic distance calculated as well as using STRUCTURE.

The main concern for them is application to association studies to find disease causing genes, not to examine if there is biological basis for racial classification, though Neil Risch, one of the lead investigators of this project, expressed his thoughts on problem defining race in the interview with Jane Gitschier, human geneticists and PLOS editor. 

However, the clusters identified in this study could be statistical constructs because of their poor sampling strategies (Weiss and Long, 2009).  Why didn’t they include Native Americans, Asian Indians, Central Asians, and Middle Eastern?  But why did they include Taiwanese from Taiwan?  Also, I am not sure if they chose right model for cluster analysis.  They used STRUCTURE (Pritchard et al., 2000) for cluster analysis and they used the NOADMIX option, “so that the entire genome of each individual was assumed to have been derived from a single homogeneous population.”  There should be some level of admixture between each racial/ethnic group, except for Taiwanese, and is each individual from all the racial/ethnic groups derived from a single homogeneous population?  Later they explain

“We note that this analysis was not based on determination of individuals’ “racial” ancestry (e.g., estimating individual European, African, and Native American ancestry for the African American and Hispanic subjects).  To do so would require inclusion of the nonadmixed ancestral groups (such as Africans and Native Americans) and the use of the “ADMIX” option of structure.  What our results do show is that the (admixed) groups included have approximated within-group random mating sufficiently long enough to give rise to distinct genetic clusters.”  Are they saying individuals within subgroups (African Americans and Hispanic) are randomly mating?  I wish that they explained how they used STRUCTURE and how they interpret the data little further.

Moreover, they did not consider the social processes why genetic differences are maintained between different ethnic groups, showing lack of collaboration between biomedical geneticists and social scientists.