Bootstrap.ppt

上传人:ga****84 文档编号:378383 上传时间:2018-09-29 格式:PPT 页数:109 大小:4.33MB
下载 相关 举报
Bootstrap.ppt_第1页
第1页 / 共109页
Bootstrap.ppt_第2页
第2页 / 共109页
Bootstrap.ppt_第3页
第3页 / 共109页
Bootstrap.ppt_第4页
第4页 / 共109页
Bootstrap.ppt_第5页
第5页 / 共109页
点击查看更多>>
资源描述

1、人类群体遗传学基本原理和分析方法,中科院-马普学会计算生物学伙伴研究所,中国科学院上海生命科学研究院研究生课程 人类群体遗传学,徐书华 金 力,第三讲,进化树的构建方法及应用,进化树的构建方法及应用,进化树的概念及相关的术语;进化树的种类;进化树的常用构建方法;进化树的检验方法;进化树的应用;什么情况下使用什么方法最合适?构建进化树的常用软件;练习,进化树的概念及相关的术语,The purpose of a phylogenetic tree is to illustrate how a group of objects (usually genes or organisms) are rel

2、ated to one another,Phylogeny (phylo =tribe + genesis),Phylogeny,Orangutan,Gorilla,Chimpanzee,Human,From the Tree of the Life Website,University of Arizona,Phylogenetic trees are about visualising evolutionary relationships,Phylogenetic trees diagram the evolutionary relationships between the taxa,(

3、A,(B,C),(D,E) = The above phylogeny as nested parentheses,These say that B and C are more closely related to each other than either is to A,and that A, B, and C form a clade that is a sister group to the clade composed ofD and E. If the tree has a time scale, then D and E are the most closely relate

4、d.,Clades,Evolutionary trees depict clades. A clade is a group of organisms that includes an ancestor and all descendents of that ancestor. You can think of a clade as a branch on the tree of life.,Molecular Evolution - Li,Terminology, External nodes: things under comparison; operational taxonomic u

5、nits (OTUs) Internal nodes: ancestral units; hypothetical; goal is to group current day units Root: common ancestor of all OTUs under study. Path from root to node defines evolutionary path Unrooted: specify relationship but not evolutionary path If have an outgroup (external reason to believe certa

6、in OTU branched off first), then can root Topology: branching pattern of a tree Branch length: amount of difference that occurred along a branch,Ancestral Node or ROOT of the Tree,Internal Nodes orDivergence Points (represent hypothetical ancestors of the taxa),Branches or Lineages,Terminal Nodes,A,

7、B,C,D,E,Represent theTAXA (genes,populations,species, etc.)used to inferthe phylogeny,Common Phylogenetic Tree Terminology,Terminology,HomologueOrthologueParalogue,Homologs are commonly defined as orthologs, paralogs, or xenologs.Orthologs are homologs resulting from speciation. They are genes that

8、stem from a common ancestor. Orthologs often have similar functions. SPO11 (Baudat et al. Mol Cell 2000) Paralogs are homologs resulting from gene duplication. They are genes derived from a common ancestral locus that was duplicated within the genome of an organism. Paralogs tend to have different f

9、unctions. CLB1/CLB2 (Brachat et al. GenomeBiology 2003). Xenologs are homologs resulting from the horizontal transfer of a gene between two organisms. The function of xenologs can be variable. VDE (Okuda et al. Yeast 2003),Character-based methods can tease apart types of similarity and theoretically

10、find the true evolutionary tree. Similarity = relationship only if certain conditionsare met (if the distances are ultrametric).,Types of Similarity,Observed similarity between two entities can be due to:Evolutionary relationship:Shared ancestral characters (plesiomorphies)Shared derived characters

11、(synapomorphy)Homoplasy (independent evolution of the same character):Convergent events (in either related on unrelated entities),Parallel events (in related entities), Reversals (in related entities),C,G,G,C,C,G,G,T,Homology and Homoplasy,no hair,no wings,Homology:identity due to shared ancestry(ev

12、olutionary signal),Homoplasy:identity despite separate ancestry(evolutionary noise),paralogs,orthologs,paralogs,orthologs,Erik L.L. Sonnhammer Orthology,paralogy and proposed classification for paralog subtypes TRENDS in Genetics Vol.18 No.12 December 2002http:/ 0168-9525/02/$ see front matter 2002

13、Elsevier Science Ltd. All rights reserved.,The Molecular Clock,For a given protein the rate of sequence evolution is approximately constant across lineagesZuckerkandl and Pauling (1965),This would allow speciation and duplication events to be dated accurately based on molecular data,Local and approx

14、imate molecular clocks more reasonable,Relative Rate Test,Test whether sets of sequences are evolving at equal rates (local molecular clock hypothesis),e.g. RRTree, Robinson-Rechavi http:/pbil.univ-lyon1.fr/software/rrtree.html,进化树的种类,Trees,Diagram consisting of branches and nodes Species tree (how

15、are my species related?) contains only one representative from each species. all nodes indicate speciation eventsGene tree (how are my genes related?)normally contains a number of genes from a single speciesnodes relate either to speciation or gene duplication events,Gene tree, species tree,We often

16、 assume that gene trees give us species trees,a,b,c,A,B,D,Gene tree,Species tree,Gene tree - Species tree,The two events - mutation and speciation- are not expected to occur at the same time. So gene trees cannot represent species tree.,Taxon A,Taxon B,Taxon C,Taxon D,1,1,1,6,3,5,genetic change,Taxo

17、n A,Taxon B,Taxon C,Taxon D,Taxon A,Taxon B,Taxon C,Taxon D,no meaning,Three types of trees,Cladogram Phylogram Ultrametric tree,All show the same evolutionary relationships, or branching orders, between the taxa.,Tree Properties,In simple scenarios, evolutionary trees are ultrametric and phylograms

18、 are additive.,Bacterium 1,Bacterium 3,Bacterium 2,Eukaryote 1,Eukaryote 4,Eukaryote 3,Eukaryote 2,Bacterium 1,Bacterium 3,Bacterium 2,Eukaryote 1,Eukaryote 4,Eukaryote 3,Eukaryote 2,Phylograms show branch order and branch lengths,Cladograms vs Phylograms,Cladograms show branching order - branch len

19、gths are meaningless,Phenetics,Phenetics, when first introduced (Michener and Sokal, 1957), challenged the prevailing view that classifications should be based on comparisons between a limited number of characters that taxonomists believed to be important for one reason or another. Pheneticists argu

20、ed that classifications should encompass as many variable characters as possible, these characters being scored numerically and analyzed by rigorous mathematical methods.,Cladistics,Cladistics (Hennig, 1966) also emphasizes the need for large datasets but differs from phenetics in that it does not g

21、ive equal weight to all characters. The argument is that in order to infer the branching order in a phylogeny it is necessary to distinguish those characters that provide a good indication of evolutionary relationships from other characters that might be misleading. This might appear to take us back

22、 to the pre-phenetic approach but cladistics is much less subjective: rather than making assumptions about which characters are important, cladistics demands that the evolutionary relevance of individual characters be defined. In particular, errors in the branching pattern within a phylogeny are min

23、imized by recognizing two types of anomalous data.,Why Cladistics? Convergent evolution and Derived character states,Convergent evolution,Derived character state,Phenetics versus Cladistics,Phenetics is the study of relationships among a group of organisms on the basis of the degree of similarity be

24、tween them, be that similarity molecular, phenotypic, or anatomical. A tree-like network expressing phenetic relationships is called a phenogram.,Phenetics versus Cladistics,Cladistics can be defined as the study of the pathways of evolution. In other words, cladists are interested in such questions

25、 as: how many branches there are among a group of organisms; which branch connects to which other branch; and what is the branching sequence. A tree-like network that expresses such ancestor-descendant relationships is called a cladogram. Thus, a cladogram refers to the topology of a rooted phylogen

26、etic tree.,Phenetics versus Cladistics,While a phenogram may serve as an indicator of cladistic relationships, it is not necessarily identical to the cladogram. If there is a linear relationship between the time of divergence and the degree of genetic (or morphological) divergence, the two types of

27、trees may become identical to each other.,Cladistics and Phenetics,Trees are drawn based on the conserved charactersTrees are based on some measure of distance between the leaves Molecular phylogenies are inferred from molecular (usually sequence) dataeither cladistic (e.g. gene order) or phenetic,C

28、ladistics and Phenetics,The maximum parsimony method is a typical representative of the cladistic approach, whereas the UPGMA method is a typical phenetic method. The other methods, however, cannot be classified easily according to the above criteria.,Rooted by outgroup,archaea,archaea,archaea,bacte

29、ria outgroup,root,eukaryote,eukaryote,eukaryote,eukaryote,Unrooted tree,archaea,archaea,archaea,Monophyletic group,Monophyleticgroup,Rooted tree,outgroup,Unrooted vs Rooted tree,Rooting the Tree,In an unrooted tree the direction of evolution is unknown.The root is the hypothesized ancestor of the se

30、quences in the tree.The root can either be placed on a branch or at a node.,Inferring evolutionary relationships between the taxa requires rooting the tree:,To root a tree mentally, imagine that the tree is made of string. Grab the string at the root and tug on it until the ends of the string (the t

31、axa) fall opposite the root:,Note that in this rooted tree, taxon A is no more closely related to taxon B than it is to C or D.,Unrooted tree,Now, try it again with the root at another position:,A,B,C,Root,D,Unrooted tree,Note that in this rooted tree, taxon A is most closely related to taxon B, and

32、 together they are equally distantly related to taxa C and D.,C,D,Root,Rooted tree,A,B,An unrooted, four-taxon tree theoretically can be rooted in five different places to produce five different rooted trees,The unrooted tree 1:,A,C,B,D,These trees show five different evolutionary relationships amon

33、g the taxa!,All of these rearrangements show the same evolutionary relationships between the taxa,B,D,A,C,Rooted tree 1a,By outgroup: Uses taxa (the “outgroup”) that are known to fall outside of the group of interest (the “ingroup”). Requires some prior knowledge about the relationships among the ta

34、xa. The outgroup can either be species (e.g., birds to root a mammalian tree) or previous gene duplicates (e.g., a-globins to root b-globins).,There are two major ways to root trees:,A,B,C,D,10,2,3,5,2,By midpoint or distance:Roots the tree at the midway point between the two most distant taxa in th

35、e tree, as determined by branch lengths. Assumes that the taxa are evolving in a clock-like manner. This assumption is built into some of the distance-based tree building methods.,outgroup,d (A,D) = 10 + 3 + 5 = 18Midpoint = 18 / 2 = 9,Rooting and Tree Interpretation,How Many Trees?,(assuming bifurc

36、ation only),How Many Trees?,进化树的常用构建方法,系统发育树构建的基本方法,最大简约法(maximum parsimony,MP)距离法(distance)最大似然法(maximum likelihood,ML),Maximum Parsimony,Check each topologyCount the minimum number of changes required to explain the dataChoose the tree with the smallest number of changes,Maximum Parsimony,ACT,GTT,

37、GTT,GTA,ACA,GTA,1,2,2,MP score = 5,ACA,ACT,GTA,GTT,ACA,ACT,3,1,3,MP score = 7,ACT,ACA,GTT,GTA,ACA,GTA,1,2,1,MP score = 4,Optimal MP tree,Maximum Parsimony: Limitations,With only a few sequences, becomes computationally intractable (“NP-hard”) # of rooted trees = (2n-3)!2n-2(n-2)!# of unrooted trees

38、= (2n-5)!2n-3(n-3)! Number of possible trees (Felsenstein 1978) #of species #rooted trees #unrooted trees211 331 4153 510515 103.44x1072.03x106152.13x1014 7.91x1012208.20x1021 2.21x1020,Maximum Parsimony: Limitations,Long Branches AttractionIn a set of sequences evolving at different rates the seque

39、nces evolving rapidly have been observed to be drawn together.,Long Branches Attraction,NJ tree based on CNVs,Distance Methods,Distance Methods,Distance Method Criteria,Distance methods,Normally fast and simplee.g. UPGMA, Neighbour Joining, Minimum Evolution,UPGMA,UPGMA: Visually,UPGMA: example,UPGM

40、A: example,UPGMA: example,UPGMA weaknesses,UPGMA weaknesses,Neighbor Joining,Neighbor Joining (NJ),8,7,6,5,4,1,2,3,Start off with star tree; pull out pairs at a time,NJ Algorithm,NJ Algorithm,NJ Algorithm,NJ Performance,Minimum Evolution,The total length of all branches in the tree should be a minim

41、um.Neighbour joining is an approximation to minimum evolution.It has been shown that the minimum evolution tree is expected to be the true tree provided branch lengths corrected for multiple hits.,Maximum Likelihood,Maximum Likelihood,Maximum Likelihood,Maximum Likelihood,Maximum Likelihood,The maxi

42、mum likelihood method is a phenetic method that is statistically well founded. It has often lower variance than other methods (ie. it is frequently the estimation method least affected by sampling error) and tends to be robust to many violations of the assumptions in the evolutionary model. Even wit

43、h very short sequences maximum likelihood tends to outperform alternative methods such as parsimony or distance methods. Different tree topologies are evaluated. An important disadvantage is that it is very CPU intensive and thus time consuming and not appropriate for large datasets.,Phylogeny Flowc

44、hart,Difference in Methods,Comparison of methods,Neighbour Joining (NJ) is very fast but depends on accurate estimates of distance. This is more difficult with very divergent dataParsimony suffers from Long Branch Attraction. This may be a particular problem for very divergent dataNJ can suffer from

45、 Long Branch AttractionParsimony is also computationally intensiveCodon usage bias can be a problem for MP and NJMaximum Likelihood is the most reliable but depends on the choice of model and is very slowMethods may be combined,Comparison of Methods,进化树的检验方法,Bootstrapping: how dependent is the tree

46、on the dataset1. Randomly choose n objects from your dataset of n, with replacement2. Rebuild the tree based on the subset of the data3. Repeat 1,000 10,000 times4. How often are the same children joined?,Jackknifing: how dependent is the tree on the dataset1. Randomly choose k objects from your dat

47、aset of n, without replacement2. Rebuild the tree based on the subset of the data3. Repeat 1,000 10,000 times4. How often are the same children joined?,How confident am I that my tree is correct?,Assessing Reliability:Bootstrap,Assessing Reliability:Bootstrap,Assessing Reliability:Bootstrap,Assessin

48、g Reliability:Bootstrap,Bootstrapping is a very valuable and widely used technique (it is demanded by some journals)BPs give an idea of how likely a given branch would be to be unaffected if additional data, with the same distribution, became availableBPs are not the same as confidence intervals. Th

49、ere is no simple mapping between bootstrap values and confidence intervals. There is no agreement about what constitutes a good bootstrap value ( 70%, 80%, 85% ?)Some theoretical work indicates that BPs can be a conservative estimate of confidence intervalsIf the estimated tree is inconsistent all the bootstraps in the world wont help you.,

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 学术论文资料库 > 毕业论文

Copyright © 2018-2021 Wenke99.com All rights reserved

工信部备案号浙ICP备20026746号-2  

公安局备案号:浙公网安备33038302330469号

本站为C2C交文档易平台,即用户上传的文档直接卖给下载用户,本站只是网络服务中间平台,所有原创文档下载所得归上传人所有,若您发现上传作品侵犯了您的权利,请立刻联系网站客服并提供证据,平台将在3个工作日内予以改正。