TraitRateProp Logo Detecting trait-dependent
evolutionary rate shifts
in sequence sites

Prof. Itay Mayrose Lab - Plant Evolution, bioinformatics, & comparative genomics
   HOME    OVERVIEW    GALLERY    SOURCE CODE    CITING & CREDITS



Abstract

TraitRateProp is a probabilistic method that allows testing whether the rate of sequence evolution of an examined protein or genomic region is associated with a binary phenotypic character trait. The method further allows the detection of specific sequence sites whose evolutionary rate is most noticeably affected following the character transition, suggesting a shift in functional/structural constraints.



Introduction

TraitRateProp detects cases in which some or all sequence positions in a given gene (protein) exhibit evolutionary rate shifts that are associated with the state of a binary phenotypic trait. The trait can be related to a genomic attribute (e.g., the presence/absence of a certain gene family) or to an organismal trait (e.g., an environmental or ecological preference, life history attribute, or morphological feature). Given an input rooted ultrametric species tree, a multiple sequence alignment (MSA), and the characters describing the trait states of the extant species (coded as either '0' or '1'), TraitRateProp allows for: (1) testing whether the evolutionary rate of the input sequence data is associated with the given trait data; (2) In case an association is detected, the method infers the sequence positions whose evolutionary rate is most likely to be associated with the trait data. TraitRateProp is based on the maximum-likelihood paradigm, and provides two important maximum likelihood estimators (MLEs) regarding the co-evolution of sequence and trait data: the relative rate parameter, r, describing the ratio between the sequence evolutionary rates under states '1' and '0', and the parameter, p, which is the proportion of positions in the sequence whose evolutionary rate is associated with the phenotypic state. The full details of the model, the likelihood estimation procedures and the associated statistical tests are detailed in (Levy Karin et al.; Mayrose and Otto).



Methodology

TraitRateProp combines models of sequence evolution and of phenotypic trait evolution in a single likelihood framework by first reconstructing a large number of possible evolutionary histories of the phenotypic trait along the phylogeny. Each such history is inferred using the stochastic mapping approach (Nielsen) and is consistent with the observed phenotypic state values of the extant species. The method is based on comparing a null model, in which a single sequence rate matrix is fit to the data and an alternative model, in which two sequence rate matrices, each corresponding to one of the phenotypic states, are fit to the sequence data.



Input

  1. A rooted ultrametric phylogentic tree with branch lengths (Newick format).

  2. A multiple sequence alignment (MSA) of the sequence data of the extant species (Fasta format).

  3. The character states of the extant species coded as either '0' or '1' (Fasta format).

  4. In addition, the user should indicate the type of sequence input (DNA or protein). The user can also control the search range of the r parameter and whether the p parameter should be optimized or not. Fixing the p parameter to 1 allows the user to run the program in TraitRate mode, assuming that the evolutionary rate of all sequence sites is associated with the examined trait. Finally, in case of protein data, the user can provide a 3D structural model in the form of a PDB file format. In this case, the site-specific predictions of TraitRateProp are projected onto the provided 3D protein structure.



Output

TraitRateProp directs you to a web page called "TraitRateProp Job Status Page". This web page is automatically updated every 30 seconds, showing messages regarding the different stages of the server activity.

A runtime estimation is computed based on the size of the provided input. A basic linear regression model through the origin was pre-computed from simulated datasets that contained 1,000 sequence position (MSA length) with a varying number of species:


As the runtime is expected to increase linearly with the number of species (N) and the number of sequence positions (L), the TraitRateProp web server uses the following formula to estimate the runtime in seconds:



When the calculation finishes, results are printed to this page and provided in several links. For an example output page click here. These results include:

In case the user provided PDB information:


References

  1. Levy Karin E., Wicke S., Pupko T., and Mayrose I. 2017. An integrated model of phenotypic trait changes and site-specific sequence evolution. In press. J. Sys. Biol.

  2. Mayrose I., Otto SP. 2011. A likelihood method for detecting trait-dependent shifts in the rate of molecular evolution. Mol. Biol. Evol. 28:759–770.

  3. Nielsen, R. 2002. Mapping mutations on phylogenies. Syst Biol 51:729-739.

  4. Uzzell T.,and Corbin K.W. 1971. Fitting discrete probability distributions to evolutionary events. Science 172:1089-1096

  5. Wakeley J. 1993. Substitution rate variation among sites in hypervariable region 1 of human mitochondrial DNA. J. Mol. Evo.l 37:613-623

  6. Jones D.T., Taylor W.R., and Thornton J.M. 1992. The rapid generation of mutation data matrices from protein sequences. Computer Appl. in the Bios. 8:275-282

  7. Yang Z. 1994. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol. Evol. 39:306-314