当前位置:首页 >> 计算机硬件及网络 >>

熊猫测序的nature文章


letters

Whole-genome sequencing of giant pandas provides insights into demographic history and local adaptation
Shancen Zhao1,2,10, Pingping Zheng1,3,10, Shanshan Dong2,10, Xiangjiang Zhan1,10, Qi Wu1,10, Xiaosen Guo2, Yibo Hu1, Weiming He2, Shanning Zhang4, Wei Fan2, Lifeng Zhu1, Dong Li2, Xuemei Zhang2, Quan Chen2, Hemin Zhang5, Zhihe Zhang6, Xuelin Jin7, Jinguo Zhang8, Huanming Yang2, Jian Wang2, Jun Wang2,9 & Fuwen Wei1
The?panda?lineage?dates?back?to?the?late?Miocene1?and? ultimately?leads?to?only?one?extant?species,?the?giant?panda? (Ailuropoda melanoleuca).?Although?global?climate?change? and?anthropogenic?disturbances?are?recognized?to?shape? animal?population?demography2,3?their?contribution?to?panda? population?dynamics?remains?largely?unknown.?We?sequenced? the?whole?genomes?of?34?pandas?at?an?average?4.7-fold? coverage?and?used?this?data?set?together?with?the?previously? deep-sequenced?panda?genome4?to?reconstruct?a?continuous? demographic?history?of?pandas?from?their?origin?to?the?present.? We?identify?two?population?expansions,?two?bottlenecks?and? two?divergences.?Evidence?indicated?that,?whereas?global? changes?in?climate?were?the?primary?drivers?of?population? fluctuation?for?millions?of?years,?human?activities?likely? underlie?recent?population?divergence?and?serious?decline.? We?identified?three?distinct?panda?populations?that?show? genetic?adaptation?to?their?environments.?However,?in?all?three? populations,?anthropogenic?activities?have?negatively?affected? pandas?for?3,000?years. We carried out whole-genome resequencing of 34 wild giant pandas (Fig. 1a and Supplementary Table 1). This sample constitutes ~2% of the current estimates of the entire wild panda population5, the highest percentage of individuals assessed for existing animal population genomics studies. Genome alignment indicated an average of 91.5% sequencing coverage and 4.7-fold depth for each individual relative to the panda’s 2.25-Gb genome4. To improve SNP inference quality, we estimated the probabilities of individual genotypes and population allele frequencies for each site6 and identified a total of 13,020,055 SNPs with ≥99% probability of being variable over the panda population. We inferred three distinct genetic clusters—Qinling (QIN), Minshan (MIN) and Qionglai-Daxiangling-Xiaoxiangling-Liangshan
1Key

(QXL)—among the current panda population using frappe7, Admixture8 and an allele-shared matrix (Online Methods) (Fig. 1 and Supplementary Fig. 1). Previous studies only showed a distinct QIN cluster9; our larger study revealed that the MIN and QXL populations were also genetically distinct. We found no population substructure present in the QIN or MIN population but detected two subpopulations within the QXL population (K = 4; Fig. 1b and Supplementary Fig. 1): one comprising Xiaoxiangling and some Qionglai individuals and the other comprising Daxiangling, Liangshan and the remaining Qionglai individuals. The fixation index (FST)10 strongly supported this three-population stratification (Supplementary Table 2). Principalcomponents analysis (PCA)11 provided additional corroborative evidence. The first eigenvector separated these three genetic populations (P < 0.05) (Fig. 1c and Supplementary Fig. 2). The second eigenvector indicated that the Liangshan population was separate from the other populations, but this assignment was ambiguous because of limited sampling in the Liangshan population (n = 2 individuals). Overall, the three populations showed similar genetic diversity (1.04–1.30 × 10?3 for Watterson’s estimator (θw) and 1.13–1.37 × 10?3 for the average pairwise diversity within populations (θπ); Supplementary Table 3) as humans12, confirming the results from a study using ten microsatellite loci that indicated that the panda has substantial genetic variability9. To reconstruct the demographic history of the giant panda, we used the pairwise sequentially Markovian coalescent (PSMC) model13 to examine changes in the local density of heterozygotes across the panda genome4. PSMC analysis showed a well-defined demographic history from 8 million to 20,000 years ago (Fig. 2a), a period covering the chronological distribution of three fossil panda species or subspecies (primal panda Ailurarctos lufengensis, pygmy panda Ailuropoda microta and baconi panda Ailuropoda melanoleuca baconi)1,14. Considering the time since the origin of the panda, demography showed population peaks at ~1 million years ago and

npg

? 2013 Nature America, Inc. All rights reserved.

Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China. 2Shenzhen Key Laboratory of Transomics Biotechnologies, BGI-Shenzhen, Shenzhen, China. 3College of Life Sciences, University of the Chinese Academy of Sciences, Beijing, China. 4China Wildlife Conservation Society, Beijing, China. 5China Conservation and Research Center for the Giant Panda, Wolong, China. 6Chengdu Research Base of Giant Panda Breeding, Chengdu, China. 7Shaanxi Wild Animal Rescue and Research Center, Louguantai, Xi’an, China. 8Beijing Zoo, Beijing, China. 9Department of Biology, University of Copenhagen, Copenhagen, Denmark. 10These authors contributed equally to this work. Correspondence should be addressed to F.W. (weifw@ioz.ac.cn) or Jun Wang (wangj@genomics.org.cn). Received 13 March; accepted 14 November; published online 16 December 2012; doi:10.1038/ng.2494

Nature GeNetics? VOLUME 45 | NUMBER 1 | JANUARY 2013

67

letters a
N Minshan Mountains Qinling Mountains K=3

b
QIN K=2 MIN

Qinling Mountains Minshan Mountains Qionglai Mountains QXL

Liangshan Mountains Daxiangling Mountains Xiaoxiangling Mountains

K=4

Q M io ou ng nt la ai i ns

Chengdu

K=5

Daxiangling Mountains

K=6

Xia ox Mo iangl in unt ain g s

Liangshan Mountains

Kilometers

? 2013 Nature America, Inc. All rights reserved.

c
0.3 0.2 0.1 Principal component 2 0 –0.1 –0.2 –0.3 –0.4 –0.5 –0.6 –0.7 –0.35 –0.30 –0.25 –0.20 –0.15 –0.10 –0.05 Principal component 1 0 0.05 0.10 0.15

d
GP51
M IN

Polar bear

GP10

GP5

12

GP

GP

3
G P7
GP 6
GP8
GP4
GP13

38

GP

27

GP23

GP

npg

Figure 1 Current geographic populations of the giant panda and inferred genetic populations. (a) Sampling sites and genetic structure detected by frappe analysis (K = 3 populations) were mapped using ArcGIS v9.2 on the basis of the proportion of an individual’s ancestry attributed to a given population. The genetic QIN population is shown in red, the MIN population is shown in yellow, and the QXL population is shown in green. Inset, the shaded area represents current panda habitats. (b) Genetic populations of the studied pandas inferred by frappe analysis. The number of populations (K) was predefined from 2 to 7. Symbols following each panda ID indicate where sampling occurred. (c) Results obtained from PCA using autosomal SNPs. Principal components 1 and 2 are shown. (d) A rooted neighbor-joining tree constructed from the allele-shared matrix of SNPs among the wild pandas, with the polar bear as an outgroup. The scale bar represents the p distance.

~40,000 years ago and population bottlenecks at ~0.2 million years ago and ~20,000 years ago (Fig. 2a). Notably, we found that these fluctuations in effective population size (Ne) were significantly negatively correlated with changes in the amount of atmospheric dust, as inferred by the mass accumulation rate (MAR) of Chinese loess15 (Pearson’s correlation R = ?0.30, P < 0.05), an index indicating cold and dry or warm and wet climatic periods in China. The first population expansion coincided with a dietary switch to bamboo ~3 million years ago when pygmy pandas emerged16. Fossil evidence indicates that the earliest (primal) pandas were omnivores or carnivores, living in swamp habitats lacking bamboo1, whereas pygmy pandas mainly ate bamboo, as indicated by their specialized cranial and dental adaptations16,17. This hypothesis is supported at
68?

the molecular level by the concurrent pseudogenization of the umami taste gene Tas1r1 associated with the pandas’ decreased reliance on meat18. The low levels of MAR during that time (Fig. 2a) indicate warm and wet weather conditions, which were ideal for the spread of bamboo forests. The panda population declined around 0.7 million years ago, and the first bottleneck occurred about 0.2 million years ago (Fig. 2a), around the same time as the two largest Pleistocene glaciations in China, the Naynayxungla Glaciation (0.78–0.50 million years ago) and the Penultimate Glaciation (0.30–0.13 million years ago)19. Additionally, fossil evidence indicated that, from ~0.75 million years ago, the pygmy panda had been replaced by the subspecies A. melanoleuca baconi, which has the largest body size of all the panda species 14.
VOLUME 45 | NUMBER 1 | JANUARY 2013 Nature GeNetics

GP3 GP4 GP5 GP6 GP7 GP8 GP10 GP12 GP15 GP16 GP17 GP19 GP51 GP18 GP14 GP33 GP28 GP25 GP13 GP30 GP36 GP24 GP2 GP22 GP31 GP39 GP23 GP27 GP38 GP29 GP26 GP35 GP37 GP52
Q IN

Study regions Sampling sites River 80 40 0 80

K=7

GP1 8

GP 16
G P1 9
17

GP
GP 15
4

GP1

GP33
GP52

GP37
GP2 6
28

GP2
GP3 6

GP

GP
P2 5

GP

24

G

30

GP 29

GP

GP2

35

GP39

GP31

2

0.05 QXL

letters
Figure 2 Demographic history of the giant panda reconstructed from the reference and population resequencing genomes. (a) PSMC result showing demographic history from the panda’s origin to 10,000 years ago. The red line represents the estimated effective population size (Ne), and the 100 thin blue curves represent the PSMC estimates for 100 sequences randomly resampled from the original sequence. The brown line shows the MAR of Chinese loess15. Generation time (g) = 12 years, and neutral mutation rate per generation (?) = 1.29 × 10?8. The approximate chronological ranges of three fossil panda species or subspecies (primal, pygmy and baconi panda) are shaded in pink, orange and blue, respectively. Note that PSMC simulation cannot detect population changes more recent than 20,000 years ago. (b) ?a?i result showing the demographic history of the panda from ~300,000 years ago to the present. The density of the heatmap shows fluctuations in effective population size. Two population divergences (304,664 and 2,777 years ago) and a population peak (38,879 years ago) are indicated by dashed lines. The average number of migrants per year between any two populations in each time interval is shown beside each arrow.

a
Effective population size (×10 )
4

8 A. melanoleuca baconi 7 6 5 4 3 2 1 0 104 105 Years before the present 106

40

A. microta

A. lufengensis 35 30 25 20 15 10 5 0 107

MAR (g/cm /1,000 years)

3

b

Ne QIN 0.066 0.014 0.053 0.373 QXL 0.108 0.000

105

? 2013 Nature America, Inc. All rights reserved.

MIN

0.091 0.055

0.617 0.003

10

4

103

A cold climate, as evidenced by high MAR (Fig. 2a), might have contributed to the Present extinction of pygmy pandas and facilitated the origin of baconi pandas, or, possibly, the larger baconi pandas evolved from pygmy pandas as they adapted to the extreme weather. The second population expansion occurred after the retreat of the Penultimate Glaciation19 (Fig. 2a, MAR decline), and the panda population reached its pinnacle between 30,000–50,000 years ago. The warm weather during the Greatest Lake Period (30,000–40,000 years ago) could have contributed to the population expansion, as would the alpine conifer forests, the primary habitat for pandas20, having reached their greatest extent at this time (Supplementary Fig. 3)21. The second population bottleneck occurred during the last glacial maximum (~20,000 years ago), when substantial alpine glaciations (for example, Gongga glacial II; ref. 19) would likely have resulted in extensive loss of panda habitats. Reconstruction of more recent panda demographic history could not be carried out using the PSMC approach because the power of this approach is greatly reduced for events occurring more recently than 20,000 years ago (Fig. 2a), owing to the limited number of recombination events in a single genome in this relatively short time interval13. We therefore used diffusion approximations for demographic inference (?a?i)22 to simulate recent demographic fluctuations on the basis of the SNPs we identified in our panda populations. The results of ?a?i analysis overlapped with and supported the PSMC findings of the second population expansion and its subsequent decline (Supplementary Fig. 4) and provided information on panda population history up to the present. This simulation showed that the QIN and non-QIN populations diverged ~0.3 million years ago (95% confidence interval (CI) of 0.1–0.7 million years ago; Fig. 2b), corresponding with the onset of the Penultimate Glaciation19. About 40,000 years ago (CI = 4,900– 58,900 years ago), the non-QIN population expanded by 300%, while the QIN population lost ~80% of its initial effective size; this occurred at a time when there was marked concurrent habitat expansion in the
Nature GeNetics? VOLUME 45 | NUMBER 1 | JANUARY 2013

2,777 102 103
Years before the present

38,879 104 105

304,664

102

regions inhabited by non-QIN pandas (Supplementary Fig. 3). After this event, the non-QIN population began to decline, while the QIN population remained stable. Our data showed that, about ~2,800 years ago (CI = 400–4,100 years ago), the non-QIN cluster diverged into the MIN and QXL populations, which gave rise to today’s pattern of three genetically distinct panda populations. These three populations further fluctuated but in different ways: the QIN population decreased, the MIN population increased slightly, and the QXL population increased more substantially (Fig. 2b and Supplementary Table 4). The QIN population’s decline correlated with the most extensive linkage disequilibrium (LD; Supplementary Fig. 5). Probable causes of the QIN population decline include habitat loss and human activities. Arboreal pollen studies have indicated that there was a continuing and extensive decline in forest habitats in northern China, including the area populated by QIN pandas, around 4,000 years ago23; however, paleobiological studies have indicated that this was unlikely to have been caused by concurrent changes in climate, as there was no differential impact on the populations of wet habitat– adapted species (for example, Pinus species) and dry habitat–adapted species (for example, Quercus species)24. Instead, there is evidence indicating that deforestation in the QIN-populated area was associated with anthropogenic disturbances. At the beginning of the Spring and Autumn Period in China (770–486 BC), revolutionary improvements in farming technology greatly advanced local agricultural development and increased the capacity to reclaim woodland (Supplementary Note). For the next 2,500 years (~500 BC–present), the northern Qinling region remained one of the most prosperous areas in central China, giving rise to centralized geopolitical power and extensive human settlements24, but resulting in depletion of the surrounding forest (Supplementary Note). Additionally, humans hunted and raised giant pandas for entertainment, sacrifice (during the Han Dynasty), and rewards (during the Tang Dynasty) (Supplementary Note).
69

npg

letters
25

Directional (QIN versus non-QIN) Balancing (QIN versus non-QIN) Directional (MIN versus QXL)

20

15

10

5

0 Sensory system Immune system Development Environmental adaptation Digestive system Endocrine system Excretory system Circulatory system Nervous system Signaling molecules and interaction Signal transduction Membrane transport Cell communication Cell motility Cell growth and death Transport and catabolism Lipid metabolism Metabolism of other amino acids Xenobiotics biodegradation and metabolism Nucleotide metabolism Glycan biosynthesis and metabolism Amino acid metabolism Carbohydrate metabolism Energy metabolism Metabolism of cofactors and vitamins Folding, sorting and degradation Translation Replication and repair Transcription

Figure 3 Annotation of genes containing selected SNPs on the basis of the KEGG database. Red bars, the number of genes under directional selection between the QIN and non-QIN panda populations; yellow bars, the number of genes under balancing selection between the QIN and non-QIN populations; blue bars, the number of genes under directional selection between the MIN and QXL populations.

Together, along with lack of evidence for a climatic effect, these human activities seem to have had an important role in the decline of the QIN population. It does remain possible that stochastic ecological events (for example, large-scale bamboo flowering and dieoff or infectious disease) or panda-relevant physiological problems (for example, degenerative reproductive ability) contributed to this decline. However, there is no solid evidence to indicate that panda reproductive ability was compromised9 nor any support for stochastic events, leaving anthropogenic disturbance as the most plausible explanation for the decline. The non-QIN (MIN and QXL) clusters were geographically divided by the Min River valley, along which the ancient Shu people established a kingdom (6,700–2,300 years ago) and built the most important road connecting their kingdom with the outside world (Supplementary Fig. 6)25. Such a geographic barrier, accompanied by regional deforestation and human activity, might have established the initial separation of the two populations ~2,800 years ago (Fig. 2b). The increase in the MIN and QXL populations (Fig. 2b) coincided with the retreat of the Shu people from panda habitats to the lowlands and abandonment of the road25. Regional reduction in human activities should have allowed habitat recovery in these regions. Alternatively, colonization of new habitats would have enabled population expansion. About 2,400 years ago, a new north-south road was established in the Qinling Mountains (Supplementary Fig. 6), which would have placed more anthropogenic pressure on the QIN pandas, resulting in the decline of this population. To examine local population adaptation, we used coalescence-based simulation methods26,27, a Bayesian test28 and a ‘model-free’ global FST test29 to detect selection signals in coding DNA sequences (CDS) across the whole genome (Online Methods). Between QIN and non-QIN populations, a total of 111 (134 SNPs) and 152 (212 SNPs) genes were
70?

npg

Environmental information processing

Genetic information processing

Organismal systems

Cellular processes

Metabolism

found to be under directional and balancing selection, respectively (Supplementary Table 5). KEGG annotation showed that the largest groups of selected genes were involved in the sensory system (Fig. 3). Of these, two genes, Tas2r49 and Tas2r3, were directionally selected across the two panda populations (Supplementary Table 6). Studies in humans have indicated that these genes are functionally relevant to bitter taste30, and they may have a similar role in pandas. A derived allele frequency (DAF) test31 indicated that Tas2r49 was positively selected in the QIN population. Consistent with this finding, field observations showed that QIN pandas, compared to non-QIN (for example, QXL32) pandas, consume more bamboo leaves33, which are higher in alkaloids (a major bitter component) than other parts of the plant. We also found 8 olfactory receptor genes under directional selection and 24 under balancing selection (Supplementary Tables 6 and 7). Odor perception as a form of olfactory communication is crucial for panda reproduction and survival, given their solitary existence in dense forest34. However, only ligands of the gene OR52R1 have been identified in giant panda scent marks35 (Supplementary Table 7). More detailed analysis of the function of the other olfactory receptor genes might therefore be worthwhile. The MIN and QXL populations, compared to the QIN and nonQIN populations, have fewer directionally selected genes (n = 44; Supplementary Table 8), indicating less variation in the selection processes between them, which is consistent with their lower interpopulation habitat heterogeneity5 and genetic differentiation (Supplementary Table 2). The largest group of selected genes in these populations is related to the sensory system (Fig. 3), but we saw no directional selection signals for the two receptor genes involved in bitter taste, Tas2r3 and Tas2r49. We also identified eight olfactory receptor genes under directional selection in the MIN and QXL populations, but only OR51L1 overlapped with those identified in the QIN and non-QIN populations (Supplementary Tables 6 and 8). Giant pandas once inhabited most of China and neighboring countries in southeast Asia, but today they are confined to six relatively isolated mountain habitats in western China36,37. In this study, integration of genomic and population genomics approaches provided a continuous outline of the history of the panda population and demonstrated that recent anthropogenic disturbances are likely a major reason for the panda’s current endangered status. Although the presence of substantial genetic diversity in panda populations improves our chances of saving this iconic species, human activities have already fragmented some populations (for example, the QXL population) into small geographically isolated populations (for example, Xiaoxiangling), putting them at greater risk of extinction in the long term37. For such small populations, translocation of wild-caught individuals or release of captive-bred individuals might be a useful means for genetic rescue by reestablishing gene flow. However, our data indicate that it will be important to monitor the evidence for selection and local adaptation in these fragmented panda populations, as reintroduction candidates ill-suited to a particular environment will be unlikely to promote the development of a robust population. This study may also serve as a model for other endangered species in assessing and establishing the most effective long-term conservation solutions. URLs. Giant panda genome, http://gigadb.org/giant-panda/; bear Ursus maritimus genome, http://gigadb.org/polar-bear/; LASTZ (at the Miller Lab website), http://www.bx.psu.edu/miller_lab/; TreeBeST, http://treesoft.sourceforge.net/treebest.shtml; frappe, http://med.stanford.edu/tanglab/software/frappe.html; Ensembl, http://asia.ensembl.org/index.html; KEGG, http://www.genome.jp/ kegg/; GenBank, http://www.ncbi.nlm.nih.gov/genbank/.
VOLUME 45 | NUMBER 1 | JANUARY 2013 Nature GeNetics

? 2013 Nature America, Inc. All rights reserved.

Gene number

letters
METHods Methods and any associated references are available in the online version of the paper. Accession codes. Panda resequencing reads have been deposited in the NCBI Short Read Archive (SRA) under accession SRA053353.
Note: Supplementary information is available in the online version of the paper. ACknoWLeDGmentS This study was supported by grants from the National Natural Science Foundation of China (31230011), the Knowledge Innovation Program of the Chinese Academy of Sciences (KSCX2-EW-Z-4) and the State Forestry Administration of China. We thank the Chongqing Zoo, the Fuzhou Research Center of the Giant Panda, the Shanghai Zoo, the Shanghai Wildlife Park and the Zhengzhou Zoo for assistance during sample collection. We acknowledge T. Meng for generation of the panda distribution map, R.N. Gutenkunst for suggestions on analysis with ?a?i, H. Li for suggestions on PSMC simulations and L. Goodman, J. Elser, M. Holyoak, S. Kumar and R.R. Swaisgood for comments and revisions of this manuscript. We also thank G. Tian, M. Jian, H. Jiang, M. Zhao, Q. Zhang, B. Wang, Y. Huang, G. Wang, C. Lin and F. Xi for laboratory assistance and B. Li for assistance on polar bear data analysis.
10. Weir, B.S. & Cockerham, C. Estimating F-statistics for the analysis of population structure. Evolution 38, 1358 (1984). 11. Patterson, N., Price, A.L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006). 12. The Bovine HapMap Consortium. Genome-wide survey of SNP variation uncovers the genetic structure of cattle breeds. Science 324, 528–532 (2009). 13. Li, H. & Durbin, R. Inference of human population history from individual wholegenome sequences. Nature 475, 493–496 (2011). 14. Wang, J. On the taxonomic status of species, geological distribution and evolutionary history of Ailuropoda. Acta Zool. Sinica 20, 191–201 (1974). 15. Sun, Y.B. & An, Z.S. Late Pliocene-Pleistocene changes in mass accumulation rates of eolian deposits on the central Chinese Loess Plateau. J. Geophys. Res. 110, D23101 (2005). 16. Pei, W. Evolutionary history of giant pandas. Acta Zool. Sinica 20, 188–190 (1974). 17. Jin, C. et al. The first skull of the earliest giant panda. Proc. Natl. Acad. Sci. USA 104, 10932–10937 (2007). 18. Zhao, H., Yang, J., Xu, H. & Zhang, J. Pseudogenization of the umami taste receptor gene Tas1r1 in the giant panda coincided with its dietary switch to bamboo. Mol. Biol. Evol. 27, 2669–2673 (2010). 19. Zheng, B., Xu, Q. & Shen, Y. The relationship between climate change and Quaternary glacial cycles on the Qinghai-Tibetan Plateau: review and speculation. Quat. Int. 97–98, 93–101 (2002). 20. Hu, J. & Wei, F. Comparative ecology of giant pandas in the five mountain ranges of their distribution in China. in Giant Pandas: Biology and Conservation (eds. Lindburg, D. & Baragona, K.) 137–148 (University of California Press, London, 2004). 21. Zhan, X., Zheng, Y., Wei, F., Bruford, M.W. & Jia, C. Molecular evidence for Pleistocene refugia at the eastern edge of the Tibetan Plateau. Mol. Ecol. 20, 3014–3026 (2011). 22. Gutenkunst, R.N., Hernandez, R.D., Williamson, S.H. & Bustamante, C.D. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 (2009). 23. Ren, G. & Beug, H.J. Mapping Holocene pollen data and vegetation of China. Quat. Sci. Rev. 21, 1395–1422 (2002). 24. Ren, G. Decline of the mid-to-late Holocene forests in China: climatic change or human impact? J. Quaternary Sci. 15, 273–281 (2000). 25. Ren, N. Illustrations and Annotations of Huayang Guo Zhi (Shanghai Ancient Books Publishing House, Shanghai, 1987). 26. Excoffier, L. & Lischer, H.E.L. Arlequin suite ver3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol. Ecol. Resour. 10, 564–567 (2010). 27. Beaumont, M.A. & Nichols, R.A. Evaluating loci for use in the genetic analysis of population structure. P. Roy. Soc. Lond. B. Bio. 263, 1619–1626 (1996). 28. Foll, M. & Gaggiotti, O. A genome-scan method to identify selected loci appropriate for both dominant and codominant markers: a Bayesian perspective. Genetics 180, 977–993 (2008). 29. Cockerham, C.C. & Weir, B. Estimation of gene flow from F-statistics. Evolution 47, 855–863 (1993). 30. Meyerhof, W. et al. The molecular receptive ranges of human TAS2R bitter taste receptors. Chem. Senses 35, 157–170 (2010). 31. Sabeti, P.C. et al. Genome-wide detection and characterization of positive selection in human populations. Nature 449, 913–918 (2007). 32. Schaller, G.B., Hu, J., Pan, W. & Zhu, J. The Giant Panda of Wolong (University of Chicago Press, Chicago, 1985). 33. Pan, W. et al. A Chance for Lasting Survival (Beijing University Press, Beijing, 2001). 34. Swaisgood, R.R. et al. Chemical communication in giant pandas. in Giant Pandas: Biology and Conservation (eds. Lindburg, D.G. & Baragona, K.) 106–120 (University of California Press, Berkeley, CA, 2004). 35. Hagey, L. & MacDonald, E. Chemical cues identify gender and individuality in giant pandas (Ailuropoda melanoleuca). J. Chem. Ecol. 29, 1479–1488 (2003). 36. Zhan, X. et al. Molecular censusing doubles giant panda population estimate in a key nature reserve. Curr. Biol. 16, R451–R452 (2006). 37. Zhu, L. et al. Drastic reduction of the smallest and most isolated giant panda population: implications for conservation. Conserv. Biol. 24, 1299–1306 (2010).

? 2013 Nature America, Inc. All rights reserved.

AUtHoR ContRIBUtIonS F.W. designed the research and interpreted data. Jun Wang led the genome sequencing and supervised the analysis. P.Z., S. Zhang, L.Z., H.Z., Z.Z., X.J. and J.Z. prepared the samples. S. Zhao, P.Z., X. Zhan, Y.H., Jian Wang and H.Y. performed research. S. Zhao, Q.W., S.D., X. Zhan, P.Z., X.G., W.H., W.F., D.L., X. Zhang and Q.C. analyzed the data. X. Zhan, F.W., P.Z., S. Zhao, Q.W. and S.D. wrote and revised the manuscript. COMPETING FINACIAL INTERESTS The authors declare no competing financial interests.
Published online at http://www.nature.com/doifinder/10.1038/ng.2494. Reprints and permissions information is available online at http://www.nature.com/ reprints/index.html.
1. Qiu, Z. & Qi, G. Ailuropoda found from the late Miocene deposits in Lufeng, Yunnan. Vertebrata Palasiatica 27, 153–169 (1989). 2. Root, T.L. & Schneider, S.H. Ecology and climate: research strategies and implications. Science 269, 334–341 (1995). 3. Hewitt, G. The genetic legacy of the Quaternary ice ages. Nature 405, 907–913 (2000). 4. Li, R. et al. The complete genome sequence of the giant panda. Nature 463, 311–317 (2010). 5. State Forestry Administration. The 3rd National Survey Report on Giant Panda in China (Science Press, Beijing, 2006). 6. Yi, X. et al. Sequencing of 50 human exomes reveals adaptation to high altitude. Science 329, 75–78 (2010). 7. Tang, H., Peng, J., Wang, P. & Risch, N.J. Estimation of individual admixture: analytical and study design considerations. Genet. Epidemiol. 28, 289–301 (2005). 8. Alexander, D.H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009). 9. Zhang, B. et al. Genetic viability and population history of the giant panda, putting an end to the ‘‘Evolutionary Dead End’’? Mol. Biol. Evol. 24, 1801–1810 (2007).

npg

Nature GeNetics? VOLUME 45 | NUMBER 1 | JANUARY 2013

71

oNLINE?METHods

Sampling information. Blood and tissue samples were obtained from 34 wild giant pandas. Sampling covered the 6 main geographic distributions, with 8 individuals from the Qinling Mountains, 7 from the Minshan Mountains, 15 from the Qionglai Mountains, 2 from the Liangshan Mountains, 1 from the Daxiangling Mountains and 1 from the Xiaoxiangling Mountains (Supplementary Table 1). Library construction and sequencing. Genomic DNA was extracted from blood or muscle samples. For each individual, 1–3 ?g of DNA was sheared into fragments of 200–800 bp with the Covaris system. DNA fragments were then treated according to the Illumina DNA sample preparation protocol: fragments were end repaired, A-tailed, ligated to paired-end adaptors and PCR amplified with 500-bp inserts for library construction. Sequencing was performed on the Illumina HiSeq 2000 platform, and 100-bp paired-end reads were generated. Read alignment. We used the Burrows-Wheeler Aligner38 to map paired-end reads to the reference genome4. First, the reference was indexed. Second, the command ‘aln –t 3 –e 10’ was used to find the suffix array coordinates of good matches for each read. Third, the command ‘sampe –a 500 –o 1000’ converted suffix array coordinates to pseudochromosomal coordinates and paired reads. Other parameters were set to the default. Population SNP detection. We adopted an algorithm using a Bayesian approach to detect population SNPs6. Details of SNP calling are provided in the Supplementary Note. SNP validation. A total of 366 SNPs from 4 pandas were validated by PCR and Sanger sequencing. In the test, 355 SNPs were confirmed to be polymorphic, 8 of which were homozygous; 3 SNPs were erroneously inferred. The error rate for population SNP calling was estimated to be 0.82–3.01%. Principal-components analysis. We conducted PCA on autosomal biallelic SNPs using EIGENSOFT3.0 software11. Eigenvectors from the covariance matrix were generated with the R function reigen, and significance levels were determined using the Tracey-Widom test (Supplementary Table 9). Phylogenetic tree inference. We identified homologous regions between the panda and polar bear genomes using LASTZ (see URLs) and extracted SNPs within syntenic regions. Genotypes at the polar bear SNPs were considered to be the outgroup at corresponding positions. A neighbor-joining rooted tree was generated by TreeBeST (see URLs). Population structure analyses. Genetic structure was inferred using the programs frappe7 and Admixture8, which implement an expectationmaximization algorithm and a block-relaxation algorithm, respectively. To explore the convergence of individuals, we predefined the number of genetic clusters K from 2–7 and ran both programs 5 times. The maximum iteration of the expectation-maximization algorithm was set to 10,000 in the frappe analysis. Default methods and settings were used in Admixture analysis.

events, and a total of 1,680,757 heterozygous loci were used to reconstruct demographic history with the PSMC model15. Parameters were set as follows: ?N30, ?t15, ?r5 and ?p ‘4+25*2+4+6’. The estimated time to the most recent common ancestor (TMRCA) is given in units of 2N0 time, and the relative population size (Ne) at state t was scaled to N0 (the present effective population size). The neutral mutation rate ? was used to infer N0 and scale the TMRCA and Ne values into chronological time. The sequence divergence between the panda and polar bear was estimated to be 3.53%. Divergence between the two species was estimated to have occurred 16.4 million years ago (Supplementary Note) and a mean generation time (g) for pandas was set at 12 years42. Therefore, we calculated ? = (0.0353 × 12)/(2 × 16.4 × 106) = 1.29 × 10?8 mutations per generation for the giant panda. Following Li’s procedure13, we applied a bootstrapping approach, repeating sampling 100 times to estimate the variance of simulated results. Correlation statistics for Ne inferred by PSMC results and MAR. We extracted Ne values for every time interval from the PSMC simulation results. We then averaged the corresponding MAR for the same intervals. We used Pearson’s correlation in SPSS16.0 software to estimate the correlation of the two factors (n = 46). Recent demographic history inference using ?a?i. Of the SNPs identified in the 34 resequenced pandas, we only considered those from intergenic regions in autosomal sequences to ensure their neutrality. To minimize the effect of low-coverage sequencing, SNPs with more than 40-fold sequencing coverage at the population level were retained for the ?a?i22 simulations. The polar bear genome sequence was used to infer ancestral alleles, and a statistical procedure was performed to correct ascertainment bias of the ancestral state, in which the trinucleotide substitution matrix specific for carnivores was kindly provided by the authors43 (D.G. Hwang, University of Washington). Four divergence models among three genetic populations of pandas were considered (Supplementary Note). The model with the maximum loglikelihood value was chosen as the optimal one (Supplementary Table 10). The ancestral population size (Na) was estimated on the basis of the calculated θ value and the mutation rate. Population size and chronological split time were derived from parameters scaled by Na. Nonparametric bootstrapping was performed 50 times to determine the variance of each parameter using the 50 new data sets with equal numbers of loci (111,161) sampled with replacement from the original data set. Detecting SNPs under selection for pairwise populations. We performed FST-based approaches to investigate the selection signals across the whole genome. On the basis of the population structure detected, we defined two pairs of populations to detect the selection signals: (i) the QIN and non-QIN (MIN and QXL) pair and (ii) the MIN and QXL pair. We chose SNPs from CDS regions and excluded those with minor allele frequency of <0.05. A total of 37,999 and 37,405 loci were used for the analyses of the two comparisons, respectively. First, the two pairs of populations were tested using the finite island model (with the FDIST approach27) in Arlequin26 to detect outliers. Considering the hierarchical genetic structure within the QIN and non-QIN populations (Fig. 2b), this pair was also analyzed using the hierarchical island model implemented in the same software26 to estimate loci that were outliers with respect to FST between the two populations as well as outliers with respect to pairwise FCT values (the proportion of total genetic variance due to differences among groups of populations). Parameters were set as default values, with the exception of our setting 200,000 simulations and allowing 10% missing data in each test. The P value for each locus was estimated using a kernel density approach. After completing the analysis, we performed false discovery rate (FDR) correction of P values, and each locus received a q value44. We considered the loci with q value < 0.05 as possible outliers (Supplementary Figs. 7a–c and 8a). Then, a Bayesian test was performed using the program BayeScan28. We ran 20 pilot runs of 50,000 iterations with an additional burn-in of 500,000 iterations and a thinning interval of 20. Other parameters were set at the default values. Because recent studies suggested that BayeScan is a conservative estimate 45,46, loci with FDR q < 0.1 were considered to be outliers in this analysis (Supplementary Figs. 7d and 8b). For each pair of populations, we also measured the pairwise global FST values29 for every locus across the whole genome to detect the highly differential

npg

? 2013 Nature America, Inc. All rights reserved.

θπ, θw and FST calculations. The average pairwise diversity within a population (θ π)39 and Watterson’s estimator (θw)40 were calculated with sliding windows of different sizes (10, 100 and 500 kb) that had 90% overlap between adjacent windows. Population differentiation was measured by pairwise FST among three panda populations10.
Linkage disequilibrium. To evaluate LD decay, the correlation coefficient (r2) between any two loci was calculated using Haploview41. Parameters were set as follows: ?maxdistance 100, ?dprime, ?minMAF 0.01, ?hwcutoff 0.0001 and ?minGeno 0.6. Average r2 was calculated for pairwise markers with the same distance, and LD decay was drawn using an R script. Demographic history reconstruction using the PSMC approach. For the autosomal sequences4, scaffolds shorter than 50 kb (~2.6% of all scaffolds) were excluded to improve the accuracy of inferring historical recombination

Nature GeNetics

doi:10.1038/ng.2494

SNPs between populations. The top 1% of SNPs ranked with raw FST values were considered to be potential outliers (Supplementary Figs. 7e and 8c). To minimize the detection of false positives, we considered those loci identified by two or more methods to be outliers (Supplementary Tables 11 and 12) as true selected loci. In addition, for the QIN and non-QIN populations, we generated a derived allele frequency distribution for (i) the alleles under balancing selection, (ii) the alleles under directional selection and (iii) the unselected alleles. As expected, compared with directionally or unselected alleles, there was an enrichment in intermediate frequency for the derived allele frequency of alleles subjected to balancing selection (Supplementary Fig. 9), indicating that the false positives in the SNP set of balancing selection should be limited. Derived allele frequency test. We used the DAF test31 to localize the signal of selection to populations. First, we inferred ancestral alleles using the polar bear genome. We then calculated and compared the derived allele frequency of each locus under directional selection between two populations to detect which population harbored a higher frequency of derived alleles. Populations with higher frequencies of derived alleles were assumed to be under positive selection. Annotation of loci under selection. We annotated genes with selected SNPs using Ensembl (see URLs) and KEGG (see URLs) and then classified each
? 2013 Nature America, Inc. All rights reserved.

gene according to the KEGG pathways and KEGG Brite function databases (Supplementary Tables 13 and 14). Olfactory and taste receptor genes were examined using Ensembl, KEGG and GenBank annotations.

38. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009). 39. Tajima, F. Evolutionary relationship of DNA sequences in finite populations. Genetics 105, 437–460 (1983). 40. Watterson, G.A. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7, 256–276 (1975). 41. Barrett, J.C. et al. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2005). 42. Wei, F. et al. A study on the life table of wild giant pandas. Acta Theriol. Sinica 9, 81–86 (1989). 43. Hwang, D.G. & Green, P. Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. Proc. Natl. Acad. Sci. USA 101, 13994–14001 (2004). 44. Storey, J.D. A direct approach to false discovery rates. J. R. Stat. Soc., B 64, 479–498 (2002). 45. Huang, K., Whitlock, R., Press, M.C. & Scholes, J.D. Variation for host range within and among populations of the parasitic plant Striga hermonthica. Heredity 108, 96–104 (2012). 46. Buckley, J., Butlin, R.K. & Bridle, J.R. Evidence for evolutionary change associated with the recent range expansion of the British butterfly, Aricia agestis, in response to climate change. Mol. Ecol. 21, 267–280 (2012).

npg
doi:10.1038/ng.2494

Nature GeNetics


相关文章:
大熊猫基因组测序和从头组装_论文.pdf
大熊猫基因组测序和从头组装 - 2009年12月13日出版的《Nature》杂志,刊登了由深圳华大基因研究院领衔,中国科学院昆明动物研究所、中国科学院动物研究所、成都大熊猫...
转录组测序文章Nature PNAS Plos one.doc
转录组测序文章Nature PNAS Plos one_生物学_自然科学_专业资料。转录组测序文章Nature? PNAS?Plos one? 2015 年 Nature Genetics 发表的文章通过分析正常组织...
...of the giant panda genome. nature-大熊猫-2009.pdf
The sequence and de novo assembly of the giant panda genome. nature-大熊猫-2009_自然科学_专业资料。大熊猫测序及分析结果doi:10.1038/nature08696 ARTICLES...
已经测序的动物.pdf
已经测序的动物_生物学_自然科学_专业资料。截止2013年8月 序号1 2 3 4 5 ...Nature 34.48 2.25Gb 熊猫 2010.01 Science 29.747 295Mb 金小蜂 PLoS ...
三大主流测序仪最新比较nature bt.pdf
三大主流测序仪最新比较nature bt_研究生入学考试_高等教育_教育专区。GS Junior...熊猫测序的nature文章 7页 免费 不同测序仪之间的比较 1页 1下载券 第二...
如何从nature文章中挖掘出新研究思路_图文.pdf
如何从nature文章中挖掘出新研究思路 - 常常喊看文献看文献,到处都在讲文献
Nature:单细胞基因组测序.ppt
Nature:单细胞基因组测序_生物学_自然科学_专业资料。一直以来,研究者都盼望着DNA测序的分辨率可 以达到单个细胞,这对于研究在许多复杂生物系 统内存在的细胞异质性...
全基因组从头测序(de novo测序).doc
全基因组从头测序(de novo测序)_生物学_自然科学_专业资料。全基因组从头测序(...熊猫基因组图谱 Nature. 2010.463:311-317. 案例描述 大熊猫有 21 对染色体,...
高通量测序在生物学的应用.ppt
基因组区域;获得2.7M SNP位点, 证明大熊猫仍然具备很高的杂合率和较高的遗 传多态性; Li et al., Nature (2009) 463:311-317 大熊猫基因组从头测序和组装...
基于新一代测序技术的动植物研究科研思路_图文.ppt
Nature, 2010, 463, 311-317 中科院劢物研究所 ...研究材料: 3岁雌熊猫 测序策略: Solexa测序73× 1...重测序文章表情况 125 100 75 50 25 2000 2001...
组装基础及高等生物de novo测序.pdf
Nature Genetics 7 基于基因组学的系统研究 & SEQUENCING 一个物种基因组计划的...研究材料: 3岁雌熊猫 测序策略: Solexa测序73× 1.组装:contigN50:39.8kb;...
全基因组De novo测序解决方案.doc
过去三 年,华大分别将黄瓜,大熊猫,白菜进行 de novo 测序,它的意义不仅在于...物种的生长、发育、进化、起源等重大问题的认识,文章全部发表在 nature genetics ...
Nature Bio:野生大豆的泛基因组测序_图文.pdf
Nature Bio:野生大豆的泛基因组测序_生物学_自然科学_专业资料。中国农科院与...(Nature 物科学研究所邱丽娟研究员为论文共同通讯作者。 Biotechnology,IF:39.08...
动植物全基因组测序_图文.pdf
并针对复杂基因组开发了全球领先的NOVOheter软件,在全基因组测序领 域具备丰富的经验,已成功完成多个物种的全基因组从头测序工作,相关文章多次发表于Nature、Science...
有关熊猫的英语作文作文150字.doc
熊猫| 作文|有关熊猫的英语作文作文150字_韩语学习_外语学习_教育专区。有关熊猫...Nowadays, the biggest nature park for panda in China is in Sichuan. There...
熊猫.doc
熊猫_计算机软件及应用_IT/计算机_专业资料。Panda Pandas are one of the ...The biggest nature park for panda in China is in Sichuan. Scientists hope...
Science,Nature,Cell封面精选_论文.pdf
Science,Nature,Cell封面精选 - 2010年4月30日出版的Science封面文章是首个已被测序的两栖类基因组的一项新的研究,该研究显示:所有人类基因中的近80%的与遗传性...
测序技术离不开生物学重复.pdf
但是, 随着测序技术应用的推广和成本降低, 杂志编辑对文章的要求也不断提高,严谨 的实验设计会越来越受到优秀杂志的青 睐。 2011 年 7 月发表在《 Nature ...
已经发表动植物基因组项目&技术进展.pdf
文章作者 Julian Parkhill 表示:“基因测序提供了...熊猫 金小蜂 豌豆蚜虫 水螅 非洲爪蟾 珍珠鸟 人类...Nature PNAS PNAS PNAS Science PLoS Genetics Nature...
Giant panda 关于大熊猫的英语作文.doc
Giant panda 关于大熊猫的英语作文_小学作文_小学教育_教育专区。大熊猫的现状,...people set up nature protection area in Sichuan, Shan-xi and Gansu province...
更多相关标签: