{"responseHeader":{"status":0,"QTime":10,"params":{"q":"{!q.op=AND}id:\"704878\"","hl":"true","hl.simple.post":"","hl.fragsize":"5000","fq":"!embargo_tdt:[NOW TO *]","hl.fl":"ocr_t","hl.method":"unified","wt":"json","hl.simple.pre":""}},"response":{"numFound":1,"start":0,"docs":[{"volume_t":"34","date_modified_t":"2008-11-25","ark_t":"ark:/87278/s6tm7vhw","date_digital_t":"2006-07-14","setname_s":"ir_uspace","subject_t":"Tree of descent; Mismatch distributions; Simulations; Findings; Intermatch distributions; Younger and older populations","restricted_i":0,"department_t":"Anthropology; Human Genetics; Biology","format_medium_t":"application/pdf","creator_t":"Rogers, Alan R.; Harpending, Henry C.","identifier_t":"ir-main,318","unid_t":"28366","date_t":"2001-09-15","bibliographic_citation_t":"Harpending, H.C., Sherry, S.T., Rogers, A.R. & Stoneking, M. (1993). The genetic structure of ancient human populations. Current Anthropology, 34(4), 483-496.","mass_i":1515011812,"publisher_t":"University of Chicago Press","description_t":"Discusses mitochondrial DNA (mtDNA) sequences as important source of data about the history of human species.","first_page_t":"483","rights_management_t":"(c) 1993 by University of Chicago Press http://www.journals.uchicago.edu/loi/ca","title_t":"Genetic structure of ancient human populations","id":704878,"publication_type_t":"Journal Article","parent_i":0,"type_t":"Text","subject_lcsh_t":"Mitochondrial DNA; Evolution","thumb_s":"/36/22/3622b7b9853b2f5df0f628d311d70e8b8945301d.jpg","last_page_t":"496","oldid_t":"uspace 2702","metadata_cataloger_t":"AM; KME; mfb","format_t":"application/pdf","modified_tdt":"2012-06-13T00:00:00Z","school_or_college_t":"College of Science; School of Medicine; College of Social & Behavioral Science","language_t":"eng","issue_t":"4","file_s":"/4b/57/4b571183a5ce20e170c3c9c1a1d8c9114479c2e4.pdf","other_author_t":"Sherry, Stephen T.; Stoneking, Mark","created_tdt":"2012-06-13T00:00:00Z","_version_":1679953479583399936,"ocr_t":"The Genetic Structure of Ancient Human Populations1 HENRY C. HARPENDING; STEPHEN T. SHERRY; ALAN R. ROGERS, AND MARK STONEKING Department of Anthropology, Pennsylvania State University, University Park, Pa. 16802 (Harpending, Sherry, and Stoneking)/Department of Anthropology, University of Utah, Salt Lake City, Utah 84112 (Rogers), U.S.A. Differences among human mitochondrial DNA (mtDNA) sequences are an important source of data about the history of our species. Since mtDNA sequences are not broken and reformed by recombination, they are tips of a tree of descent. There are several approaches to using mtDNA sequences to infer properties 1. © 1993 by The Wenner-Gren Foundation for Anthropological Research. All rights reserved 0011-3204/93/3404-0007/$ 1.00. We are grateful for comments and suggestions from Stan Ambrose, Adam Connor, James Crow, Richard Klein and Jeffrey Kurland, Ozzie Pearson, and Naoko Takezaki. Laboratory research was supported in part by NSF grant 90-20567 to Mark Stoneking.484 | CURRENT ANTHROPOLOGY of the tree of descent and relate those properties to the history of the population in which the tree is embedded. The direct approach to inferring properties of the tree is to compute a reconstruction of it using one of a number of algorithms that make trees from differences among objects. Cann, Stoneking, and Wilson (1987) used a maximum-parsimony algorithm to reconstruct the tree of descent of a sample of mtDNA from many different human groups, and they were led to suggest that our mtDNAs all descend from an African who lived approximately 200,000 years ago. Although Vigilant et al. (1991) have supported this result, it has been subject to a number of criticisms, the most important of which is that current methods do not reliably reconstruct the tree (Hedges et al. 1992, Templeton 1992, Maddison, Ruvolo, and Swofford 1992). ' The task of relating the properties of the tree to properties of the population has not been handled carefully in much of the literature, especially in the commentary that followed the original work in the popular press. Some authors have assumed that the coalescence at 200,000 years implied that a new population arose at that time, but the genetics suggests no such thing. The age of the coalescent (the common ancestor) reflects population size in the past, in this case suggesting that the effective number of females in the late Middle Pleistocene was on the order of 1,000 to 10,000. It is not related in any simple way to population origins. In fact, the concept of the origin of a population is not clear, but it seems to mean growth from a small partially isolated subpopulation of the parent species. In these terms we can distinguish three models of the origin of modern humans. The strong Garden of Eden hypothesis posits that modern humans appeared in a subpopulation of Homo erectus, perhaps as a new species, and spread continuously over much of the Old World. The weak Garden of Eden hypothesis posits again that modem humans appeared in a subpopulation and spread slowly over several tens of thousands of years, then later expanded from separated daughter populations bearing modern technologies such as those of the African Late Stone Age or the European Upper Paleolithic. The multi- regional hypothesis posits that the entire H. erectus gene pool contributed to the gene pool of modern humans. In this paper we use a new method of analyzing mtDNA sequences that is based on a theory of how mismatch distributions-histograms of the number of pairwise differences in a sample of DNA sequences (Hard and Clark 1989)-should preserve a record of population expansions and separations in the remote past (Rogers and Harpending 1992). We use \"mismatch distribution\" here to refer to differences among sequences within a population, and we call mismatch distributions between sequences from two different populations intermatch distributions. (Elsewhere in the literature mismatch distributions have been called distributions of pairwise differences.) Human mtDNA mismatch distributions preserve a record of past population dynamics. We show that they are incompatible with the strong Garden of Eden hypothesis, marginally compatible with the multi- regional hypothesis, and easily compatible with the weak Garden of Eden hypothesis. Other genetic evidence and ecological evidence, however, denies the multiregional hypothesis unless there was a marked worldwide population bottleneck within the past 200,000 years that reduced the total size of the species to a few thousand without leading to extinction of regional subpopulations. Mismatch distributions are also incompatible with the hypothesis that the relatively recent common ancestry of human mtDNA reflects replacement by a new selectively advantageous variant. First, we consider mtDNA differences within and between major racial groups in the sample used by Cann, Stoneking, and Wilson (1987) and Stoneking et al. (1990). These mtDNAs were typed with restriction enzymes by high-resolution methods that surveyed mutations at approximately 1,500 nucleotides. The widely cited divergence rate (twice the mutation rate) is 2-4% per million years (Wilson et al. 1985). Rogers and Harpending (1992) show that this corresponds to mutation rates (u) for the whole array of assayed sites of 3 x io~s to 6 x 10_5 per year,- we use the average of these two figures, u = 4.5 x 10 per year. We measure time in mutational time units (t); t = lut, where t is time in years, and therefore one unit of mutational time corresponds to 11,000 years. We also consider samples for which sequences from the hypervariable segment (HVS) of noncoding mtDNA are available. For one set of populations, sequences are available from region 1, which contains 360 nucleotide sites. Since there are many cases in which parts of this region were not typed, any particular comparison of populations is based on fewer than 360 sites. When we present computed estimates of time, these have all been normalized to the same scale. For this region, HVS 1, the divergence rate is estimated to be 30% per million years (Ward et al. 1991), corresponding to u = 5.4 x io'0 per year; thus one unit of mutational time corresponds to 9,300 years. Finally, sequences are available from HVS 1 and HVS 2, with a total of 751 nucleotide sites. The divergence rate estimate for these regions pooled is 23% per million years (Stoneking, Sherry, and Vigilant 1992), corresponding to u = 8.6 x io\"5; thus a unit of mutational time is 5,800 years. Details of these samples, methods of typing, sources, etc., are given in Sherry et al. (1993) and summarized in table 1. Sherry et al. also discuss the bases of the mutation rate estimates. Here we use simulation to assess the extent to which overall patterns of leading intermatch distributions and heterogeneity in expansion times provide reliable information about ancient population structure. SIMULATION Although there is an explicit theory of pairwise comparisons (Rogers and Harpending 1992), the theory is not as useful as it seems because it predicts the number of differences between members of a single random pair Copyright © 1993. All rights reserved. Volume 34, Number 4, August-October 1993 | 485 TABLE I Samples and Expansion Times Data Typea and Sample Normalized Expansion Population Size Tau and S.E.b Time (yrs.) Reporting Study RFLP Australian 21 4.40 ± 3.41 49,000 Cann, Stoneking, and Wilson (1987) African-1 20 8.95 ± 4.90 99,800 Cann, Stoneking, and Wilson (1987) Asian-2 34 7.88 3.02 87,800 Cann, Stoneking, and Wilson (1987) European-2 47 4-99 2.44 55,600 Cann, Stoneking, and Wilson (1987) Papua New Guinea-2 119 4.26 ± 2.58 47,500 Stoneking et al. (1990) HVS 1 Asian 34 9-03 0.91 83,600 Horai and Hayasaka (1990), Vigilant et al. (1989, 1991) Bantu-speakers 41 7-41 ± 1.25 68,600 unpublished data Middle East 42 5.78 0.95 53,500 DiRienzo and Wilson (1991) Herero 39 0.80 0.29 7,400 Vigilant et al. (1989, 1991), unpublished data !Kung 64 5.65 1.27 52,300 Vigilant et al. (1989, 1991), unpublished data Nuu Chah Nulth 63 3-70 1.07 34,200 Ward et al. (1992) HVS 1/2 Bantu-speakers 41 15.46 ± 2.41 89,200 unpublished data Papua New Guinea-1 32 9.56 4.67 55,200 Vigilant et al. I1991) Stoneking, Sherry, and Vigilant (1992) Asian-1 24 8.19 ±_ 2.27 47,3oo Vigilant et al. (1989, 1991) European-1 20 7-13 ■± 2.18 41,700 Vigilant et al. (1989, 1991) IKung 64 6.87 ■+; 2.23 39,600 Vigilant et al. (1989, 1991), unpublished data aRFLP, restriction fragment length polymorphism assayed by high-resolution restriction enzymes; HVS 1, sequences from region x of the hypervariable segment of mtDNA; HVS 1/2, sequences from regions 1 and 2 of the hypervariable segment. bStandard errors are derived from simulations by a method described by Rogers (1993) and assume that mutation rates are estimated without error. of DNA sequences. In practice we consider all possible pairwise differences within a sample or between samples, and these pairwise differences are not independent of each other. There is only one history of human mtDNA, and all the mtDNAs that we can observe are part of the same tree of human descent. Simulations show that expected pairwise difference distributions from a single locus such as mtDNA are much more erratic than the smooth distributions predicted by the theory. The theoretical distributions emerge clearly when many simulations are averaged, but so far mtDNA is the only part of our genome for which suitable data are available. Without parallel universes we have no prospect of testing the theory with independent samples of mtDNA. Our simulations use the coalescent algorithm (Hudson 1983, Watterson, 1984) described simply and elegantly by Hudson (1990). We simulate two populations that can independently undergo drastic size changes. They exchange migrants, and the number of migrants that they exchange is also variable over time. The coalescent algorithm allows us to simulate molecular evolution in large populations without having to have large populations in the computer. It uses properties of the tree of descent of n alleles in a sample rather than properties of all N alleles in a population. Simulation of a single population is particularly simple. If we have a sample of two mtDNAs, we know that sometime in the past they shared an ancestor. This common ancestor is the coalescence of the two. The probability that they coalesced in the previous generation is i/N, the probability that they coalesced two generations ago (1 - i/N)(i/N], and so forth. The distribution of coalescence times is a geometric distribution with mean N, and in practice this is approximated by an exponential distribution with mean N. If instead of two mtDNAs we have a sample of n mtDNAs, Hudson shows that the time until some pair coalesces is approximately exponential with mean N/(f). To simulate the history of n mtDNAs, we generate a random exponential deviate t with this mean, pick two mtDNAs at random, and replace them with a single mtDNA. The population is now at t units in the past, and there are n - 1 mtDNAs in the sample. The process is repeated until only one remains. Match distributions summarize mutations that occur along the branches of the tree. To simulate the effects of mutation, we assume that every mutation occurs at Copyright © 1993. All rights reserved. 486 | CURRENT ANTHROPOLOGY a site that has not mutated previously. If the mutation rate is u, then the number of mutations that occur along a branch of length t is a Poisson random variable with mean ut. Match distributions are computed by adding mutations that have occurred over all segments of the tree that link any pair of mitochondria. It is often appropriate in studies of molecular evolution to correct for the possibility of repeat mutations at a single site. Rogers (1993), examining the effect of multiple hits on match distributions generated from high-resolution re- striction-enzyme analysis of mtDNA as reported by Cann, Stoneking, and Wilson 11987), found that multiple hits could seriously bias the results if just a few hot spots were generating the differences we observe. Reviewing published and unpublished estimates of the distribution of mutation rates among sites, however, he concluded that the error introduced was less than 3%. For this reason we neither correct for multiple hits in our data nor incorporate them into our simulations. Simulating a pair of populations that exchange migrants is only slightly more complex. We start with samples of size na and nb from populations A and B, whose sizes are Na and Nh. The number of migrants exchanged between them each generation is M. As before, we generate a simulated history going backward in time. At each stage of the iteration four events are possible: (1) a coalescence in population A, (2) a coalescence in population B, (3) a migration from A to B, and (4) a migration from B to A. The time until coalescence is computed for each population as above, and the time until a past migration event is also exponential, with mean depending on the number of migrants M, the sample size n, and the population size N. The time until a past migration from A to B has mean value NJnJrf. and the time until a past migration from B to A has mean NJn^M. An exponential random variable is generated to represent the time until each of these events, and the minimum of these is taken to be what happened (Stro- beck 1987). If it is a coalescence, the sample size of the population in which it occurred is decreased by one. If it is a migration event, the sample size of the source population is decreased by one and the sample size of the recipient population is increased by one. The process is then repeated until all the mitochondria have coalesced. Changes in population size and migration rate introduce complications to these simulations. Both of these have effects that are equivalent to rescaling time, transforming the simulated times to accommodate the changes. For example, if the population shrank by a factor of ten at 50,000 years ago, we generate a random exponential deviate to represent a coalescence time. If it is, say, 30,000 years, then coalescence occurs 30,000 years ago. If it is 60,000 years, then it is rescaled as 50,000 + (10,000) * (10) = 150,000, since coalescence times back past 50,000 years are ten times as long in a population ten times as large. The simulations in this paper are of samples of 40. In Sherry et al. (1993) we show that the standard deviation of estimates of time decreases slowly with sample size and that there is hardly any improvement in samples larger than 25 or so. This is because no matter how many sequences we have we are still sampling from the same unique human tree of descent, and a larger sample size does not provide much more information about that history. We need sequences from other parts of the genome for more accurate reconstruction of population history. The parameters of the simulation are population sizes before and after expansion, migration rates that may vary over time, and the mutation rate. These latter enter as the number of migrants per generation and twice the number of mutations per generation. The number of migrants is the product of the size of the female population (since mitochondria are maternally transmitted) N and the migration rate per generation m. Twice the number of mutants is conventionally represented by 0 = 2Nu. (For a nuclear, hence diploid, genetic locus the corresponding parameter would be 4Nu, and N would be the effective size of the total population rather than the effective number of women.) Since each of these parameters is proportional to the mutation rate u, and this rate is the probability that a mutation occurs anywhere in the sequence, sequencing a larger part of the mtDNA has the same effect as increasing the mutation rate. We simulate populations that expand from 0 = 1 to 0 = 1,000 here, but our results do not depend very much on these assumptions. We use an initial 0 of 1 because our estimates of the initial population size and those given by Rogers and Harpending (1992) all suggest that the initial 0 was between o and 4. The estimated mutation rates are about io~3 per generation, and therefore the suggestion is that the effective size for a number of generations at any rate was 0/212 ~ io3 or 1,000 women. The estimate of ancestral effective size of 1,000 to 10,000 is generally accepted from studies of polymorphisms in nuclear genes (Takahata 1993) as well as from mtDNA (Stoneking et al. 1990). Our results suggest that the lower end of this range is the more likely. If ancestral effective size had been as high as 10,000 we would expect more mitochondrial diversity and more pairwise differences than have been observed in human mtDNAs. Takahata (1993), considering HLA diversity, suggests that the effective size of our lineage was about ro3 for most of the Cenozoic but may have been smaller (but never less than 100) during the Pleistocene. Our results are not incompatible with his suggestion. PROPERTIES OF MISMATCH DISTRIBUTIONS The main findings from our mismatch distributions are shown in tables 2-4. The estimated times are directly proportional to estimated mutation rates, which are not known with much certainty. They could easily be in error by a factor of two in either direction, which would double or halve the estimates in these tables. Furthermore, even if the mutation rates were known, there are large biases and standard errors associated with time estimates from match distributions. The numbers in these tables should be viewed as very rough estimates. Figure Copyright © 1993. All rights reserved. Volume 34, Number 4, August-October 1993 | 487 TABLE 2 Estimated Separation Times (in Thousands of Years) and Expansion Times from High-Resolution Restriction-Enzyme Mapping for Five Populations African Papua New Guinea Asian Australian European African Papua New Guinea Asian Australian European 93 48 113 87 94 68 88 49 93 73 86 67 56 note: Separation times above the diagonal, expansion times on the diagonal. TABLE 3 Estimated Separation Times (in Thousands of Years), Expansion Times, and Number of Sites Sequenced in HVS 1 for Six Populations Asian Bantu Herero IKung Middle East Nuu Chah Nulth Asian 84 115 78 124 81 88 Bantu 177 69 73 88 85 80 Herero 2X0 229 7 76 74 69 IKung 212 238 276 52 100 96 Middle East 234 244 292 286 54 72 Nuu Chah Nulth 234 H4 . 292 286 360 34 note: Separation times above the diagonal, expansion times on the diagonal, sites below. TABLE 4 Estimated Separation Times (in Thousands of Years), Expansion Times, and Number of Sites Sequenced in HVS 1/2. for Five Populations Bantu Papua New Guinea Asian European IKung Bantu 89 141 1x4 104 90 Papua New Guinea 373 55 119 91 137 Asian 423 400 47 57 92 European 421 409 509 42 98 IKung 42.3 400 552 490 40 note: Separation times above the diagonal, expansion times on the diagonal, sites below. 1 shows mismatch distributions from hypervariable segment region 1 in samples from six populations. Except for the Herero, these are relatively smooth distributions with peaks at around five differences. They suggest that our species underwent a population expansion approximately 60,000 years ago (Rogers and Harpending 1992, Sherry et al. 1993, Rogers 1993). The simulated data in figure 2 distinguish populations that underwent expansion from stationary populations with the same average number of pairwise differences, and the empirical distributions look more like the former. It is possible to construct an ad hoc statistical measure of the raggedness that distinguishes distributions from constant-size populations and distributions that reflect expansions. The visual impression of raggedness in distributions from stationary populations corresponds to rapidly fluctuating changes from one position on the ordinate to the next. An estimate of the amount of this rapid fluctuation can be computed by differencing the distribution and summing the squares of the differenced series-that Copyright © 1993. All rights reserved. 488 | CURRENT ANTHROPOLOGY Fig. i. Empirical mismatch distributions from sequencing HVS i. A, Asian, 234 sites, 84,000 years; B, Southern Bantu, 244 sites, 69,000 years-, C, Middle East, 360 sites, 54,000 years; D, Hereto, 292 sites, 7,000 years; E, IKung Bushmen, 286 sites, 52,000 years; F, Nuu Chah Nulth, 360 sites, 34,000 years. k....................... /\\ V A............. I. -................. yv _ ....... /\\___________ /V 1 u*_ /V _ /V L _ A A AA - 0 20 0 20 0 20 0 20 Fig. 1. Mismatch distributions generated by simulation. The populations in the left two columns grew from 6 = 1 to 6 = 1,000 at t = 5 units of mutational time in the past. The populations in the right two columns have been of constant size 6 = 5 for a very long time. Copyright © 1993. All rights reserved. Volume 34, Number 4, August-October 1993 | 489 is, constructing from the mismatch distribution m0, mh m2, .... the differenced series ml - m0, m2 - mv . . ., then forming the sum of squares of the differenced series. This crude estimate of \"high-frequency\" variation in the series readily discriminates distributions that result from expansion and distributions from long-term stationary populations. There is almost no overlap between the distributions in the left panel of figure 2, with a mean raggedness of 0.012, a standard deviation of 0.006, and a maximum in 100 replicates of 0.03, and those in the right panel, with a mean raggedness of 0.26, a standard deviation of 0.20, and a minimum in 100 replicates of 0.03. All of our data look like distributions generated by population growth except for the Herero, who are thought to have undergone a recent bottleneck (Pennington and Harpending n.d.). Among the populations reported by Sherry et al., the Eastern and Western Pygmy samples and the Hadza sample also look somewhat like the Herero-displaying a high level of mtDNA identity together with pairs that differ at many sites. This is the pattern expected following a bottleneck in a previously large population (Rogers and Harpending 1992). The raggedness statistics for the Herero, Hadza, and Eastern and Western Pygmy distributions are 0.10, 0.28, 0.04, and 0.01 respectively for HVS 1/2. By this measure the Pygmy data, even though from small numbers of mtDNAs, look like distributions from populations that have undergone expansion (contra DiRienzo and Wilson 1991). Raggedness statistics for our data indicate that the observed distributions are the kinds of mismatch distributions produced by ancient population expansions. For the populations for which we have both HVS 1 and HVS 2 sequences the maximum raggedness is among Europeans, 0.010,• for the populations with only HVS 1 the maximum (apart from the Herero, who clearly have a different population history) is from the IKung, 0.018; and for populations for which we have high-resolution restriction-enzyme typing the maximum raggedness is again among Europeans, 0.015. Other, more familiar population genetic statistics can be computed from match distributions. For example, the mean of a mismatch distribution is the mean sequence divergence (MSD). MSD is a common statistic for describing sequence diversity, but simply looking at the mean discards information that is contained in the form of the distribution. There are other ways of summarizing mismatch distributions. The maximum sequence difference between any pair of mtDNAs is a useful statistic, because the theory of coalescence predicts that this will converge in expectation as the sample size becomes large. The maximum can provide an estimate of the time since separation of the most divergent pair of sequences-the coalescence time, the age of the common ancestor of all the sequences. Since the leading edge of the mismatch distribution after an expansion is determined by the mismatch distribution of the initial population, the coalescence time is greatly influenced by the size of the initial population and is a poor indicator of demographic history. While coalescence times are very important in contemporary population genetics theory, they do not provide useful information for anthropologists interested in demographic history and prehistory. The often quoted 200,000-year estimate of human mtDNA coalescence leads to an estimate of effective size in the remote past, but it does not correspond to any population event. The Rogers and Harpending theory shows that an episode of population growth generates a wave that moves to the right. The \"old\" distribution of pairwise differences moves right because the number of differences between descendants of pairs can only accumulate through time. New pairs with no differences appear because of their recent common descent, and their rate of appearance is inversely proportional to the \"new\" population size. Hence the left or trailing edge of the distribution reflects the new population size after expansion, the right or leading edge preserves the form of the old distribution, and the peak reflects the age of the expansion in mutational time units. These contributions to sequence divergence are confounded in the mean and in the coalescence time. For example, sequence diversity and coalescence are both greater within populations from Africa than within populations from the other continents. Does this greater diversity mean that African populations are older or that the effective size of the population of Africa before the expansion was larger? In the former case, the theory suggests, the peak of the mismatch distribution from Africa should be farther to the right, while in the latter the peak should be as far along as the peak from other populations but the leading edge should be broader. To estimate the time since separation of populations or since expansion, we use Rogers's (1993J method-of- moments estimator. The ith moment of a random variable X is the expectation of X1, and the method is to equate theoretical moments with observed moments from the data. Our theory provides formulae for the theoretical moments of the mismatch distribution. For the full three-parameter model, the equation must be solved numerically using an algorithm that frequently fails to converge. Here we use a simpler two-parameter model obtained from the three-parameter one by letting the population size after expansion approach infinity. Rogers shows that this two-parameter model provides an excellent approximation when the population size after expansion is large. In addition, it is exact for mismatch distributions comparing pairs of individuals from separate populations that do not exchange genes. The two- parameter estimates are 0 = Vv - m and t = m - 0, where m and v are the mean and the variance respectively of the observed mismatch distribution. The estimate 0 is in effect the mean sequence divergence less a \"correction\" for the size of the initial population. Under many circumstances this estimator behaves much like the mean sequence divergence, but under others it has a significantly smaller variance over many simulations of the same history. The estimates of expansion and separation times that we present were computed as Rogers's t statistic. Copyright © 1993. All rights reserved. 4 9 0 | CURRENT ANTHROPOLOGY In waves resulting from population growth (see figure 2), the peak provides only a poor estimate of the time since the growth started. The peaks in the left column of figure 2 do not reliably occur at five differences. This suggests that it might be possible to distinguish growth that occurred x 00,000 years ago from growth that occurred 50,000 years ago but no finer resolution is possible from a single locus (mtDNA). We are in somewhat the same position with mtDNA data as a paleontologist with a collection of fragments informative about a single trait such as browridges. We can discuss what they look like and use them to generate estimates of relationships among populations, but we have no information about how this trait is related to others. Several of the expansion populations in figure 2 have more than one mode. Multiple modes correspond to more or less distinct lineages in the data, with one mode corresponding to within-lineage differences and the other to between-lineage differences. Distinct lineages often appear in simulations of single random mating populations because of the way in which the deep branching points of the tree happen to be arranged. Several researchers have found distinct mitochondrial lineages and suggested that they correspond to colonization episodes or the like (Torroni et al. 1992, Ward et al. 1991, Horai et al. 1993), but they can appear as well in samples from single populations. Reconstructions of colonization episodes from mtDNA lineages should be regarded with caution. RESULTS We have used simulation to investigate two characteristics of pairwise comparisons of populations: (1) the common occurrence of leading intermatch distributions in which the wave of differences between populations is smoothly in front of the wave within populations and (2) heterogeneity in age estimates of populations. Our simulations suggest that both patterns are signatures of real historical phenomena and not meaningless statistical accidents. Intermatch distributions. The right panel of figure 3 shows the match distributions of IKung and Nuu Chah Nulth. These are based on 286 sites in HVS 1. The mismatch distributions have modes at five and six differences, while the intermatch has a mode at nine differences. Using Rogers's moment estimators and correcting for the availability of only 286 shared base pairs in the available sequences, these distributions suggest that IKung ancestors underwent an expansion 48,000 years ago, Nuu Chah Nulth ancestors 36,000 years ago, and that they are derived from an ancestral population that split 96,000 years ago. Simulations suggest that none of these numbers should be taken as a very precise estimate of anything, but there is a clear pattern here of the intermatch distribution's leading the mismatch distributions. There are many more sequence differences between populations than there are within populations. The left panel of figure 3 shows match distributions in a comparison of Asian and Middle Eastern populations at 234 positions in HVS x. All three distributions are very similar to the eye, and Rogers's estimator suggests that their common ancestor underwent an expansion 80,000 years ago. The intermatch distribution does not lead the mismatch distribution-indeed, comparisons between the two populations have nearly the same distribution as comparisons within either one except that shared types occur only within populations. Do these pictures provide reliable information about population history? The comparison of IKung and Nuu Chah Nulth shows in extreme form a pattern that is common to all our sets of data: the intermatch distribution \"leads\" the mismatch distributions (see tables 2-4]. The theory developed by Rogers and Harpending (1992! immediately suggests why this should be so. If a population splits into two isolated daughter populations, gene identity cannot accumulate between the populations: as differences accumulate between them a rightward- moving wave should be generated that is much like the wave from a population expansion but simpler. Consider pairwise comparisons of samples from populations that have a common origin but then exchange Fig. 3. Mismatch distributions from (left) Asian (dashes) and Middle Eastern (dots) populations along with the intermatch distribution (solid line) and (right) the same for IKimg Bushmen of southern Africa (dots) and the Nuu Chah Nulth of North America (dashes). The intermatch distribution leads the mismatch distributions by several units of mutational time in the right-panel comparison, while in the left all three distributions are nearly coincident. Copyright © 1993. All rights reserved. Volume 34, Number 4, August-Octobez 1993 | 491 no migrants: Immediately after they separate, a fraction of these pairwise comparisons will be between identical sequences. After one unit of mutational time, the descendants of these will differ, on average, by one mutation; after two units of mutational time the descendants will differ by two mutations; and so on. The theoretical intermatch wave does not disperse and leave behind a mass near the origin that corresponds to new identity by descent within a population. It simply travels forever to the right, although it does become flatter and flatter. Applying this theory to the mismatch distributions in figure 3 suggests that the ancestral population from which IKung and Nuu Chah Nulth were derived split several tens of thousands of years before the expansion of the daughter populations, while the same logic suggests that Asians and Middle Easterners expanded from a single ancestral population. In tables 2-4 the elements above the diagonal-Rogers's estimates of time from the intermatch distributions-are on average greater than the diagonal entries. This pattern suggests that these populations are derived from ancestral populations that were isolated from each other for many millennia. For our high-resolution restriction-enzyme typing, the average estimate of expansion time is 68,000 years while the average estimate of separation time is 86,000 years. For HVS 1 the estimates are 50,000 and 87,000 years respectively (the mean expansion time estimate being arti- factually low because it includes the Herero at 7,000 years), and for HVS 1/2 the estimates are 55,000 and 104,000 years. Figure 4 shows the results of simulating a pair of populations that exchanged ten migrants per generation (i.e., were essentially equivalent to one random mating population) before ten units of mutational time in the past, after which they exchanged no migrants, and underwent expansion five units of mutational time ago. The intermatch distributions regularly lead the mismatch distributions,- in other words, we can generate simulated data that look like the IKung and Nuu Chah Nulth in figure 3 by separation several units of mutational time before expansion. At the same time, figure 5 and table 5 show that populations that separate at the same time as they expand from a single population almost never exhibit this leading intermatch distribution. What we have called the strong Garden of Eden hypothesis-that ancestral anatomically modem humans grew and dispersed rapidly from a single origin-is incompatible with the mtDNA data. The simulations in figure 5 show distributions from pairs of populations that expanded five units of mutational time in the past from ancestral populations that were partially isolated from each other, exchanging one migrant per generation. Prior partial isolation does not generate leading intermatch distributions. We must postulate migration rates on the order of one every ten generations between ancestral populations before leading intermatch distributions appear in our simulations. Table 5 shows the results of large numbers of simulations of these conditions. Each row summarizes the results of 100 simulations with various migration rates before and after a population expansion five units of mutational time ago. (Each unit of mutational time in the simulation corresponds to about 12,000 years.) If the migration rate were as low as 1 woman per 10 generations in the remote past, then leading intermatch distributions could occur. But if the rate were as high as 1 woman per generation, an intermatch leading the mismatches by 3 units of mutational time would be very rare. If the ancestral populations were exchanging as many as 10 women per generation it would almost certainly never occur. If, for example, the migration rate were 10 per generation before 5 units of time in the past and only 1 per 10 generations since the expansion, the average lead of the intermatch would be 0.09 units of time with a standard Fig. 4. Mismatch distributions and intermatches from six simulations of a pair of populations that grew from 6 = 1 to 0 = 1,000 at t = 5 units of mutational time in the past, exchanging 10 migrants per generation before 10 units of mutational time ago and no migrants since then. Copyright © 1993. All rights reserved. 4 9 2 | CURRENT ANTHROPOLOGY expanded from 6 = i to 6 = i,ooo