# Controlling for phylogenetic signal - what is statistically appropriate?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I am currently collaborating with a fellow PhD student. We are both in the same Biology department, but my collaborator is more of a natural historian, so I am handling the statistical side of things.

He has published a few papers using only Brownian models and Pagel's $$lambda$$ (with a different collaborator). However there seems to be a plurality of different methods available - Grafen, Blomberg, and Martin, to name a few. OU models seem inappropriate for our data since we have relatively small phylogenies (Cooper 2016). The literature seems to indicate that Pagel's $$lambda$$ is more robust than Blomberg's $$kappa$$, and in general an okay method to check for phylogenetic signal. I've also found that some people throw everything and the kitchen sink at their data, and then compare log-likelihoods, AIC, BIC, with log-likelihood ratios for any nested comparisons.

My first question is then, should you have any a priori assumptions of which method will be appropriate for your data?

I am still new to PCM so I used old R scripts and an AmNat paper (from 2019) as my reference. I am also using the same phylogenies as that AmNat paper. From that paper, and from those scripts, only two models were used, a Brownian model (which is essentially $$lambda$$ = 1, anyways) and an estimate of $$lambda$$. They compared the two models, chose the more appropriate model by log-likelihood ratio and that was it.

My second question is, shouldn't you always compare your models to a model with fixed $$lambda$$=0?

For example, I have the following output in R,

``#Brownian Model pglsModel_BM <- gls(sum_dep ~ ContGroup, correlation = corPagel(1, phy = UltTree, fixed = TRUE), data = temp, method = "ML") #Estimated Lambda Model pglsModel_E <- gls(sum_dep ~ ContGroup, correlation = corPagel(0.50, phy = UltTree, fixed = FALSE), data = temp, method = "ML") ###Output of comparison is… Model df AIC BIC logLik Test L.Ratio p-value pglsModel_BM 1 3 528.4584 534.5344 -261.2292 pglsModel_E 2 4 528.0454 536.1468 -260.0227 1 vs 2 2.412992 0.1203 ###ANOVA output of preferred model Denom. DF: 54 numDF F-value p-value (Intercept) 1 7.039370 0.0104 ContGroup 1 6.480427 0.0138``

And the Brownian model got the go ahead. It seems that Brownian is treated as a null model, but I can't wrap my head around why $$lambda =0$$ isn't also a null model or the null model. Moving forward with my assumption,

``###Adding in a lambda=0 model pglsModel_0<- gls(sum_dep ~ ContGroup, correlation = corPagel(0, phy = UltTree, fixed = TRUE), data = temp, method = "ML") ###Using anova for model comparison Model df AIC BIC logLik pglsModel_0 1 3 524.0962 530.1723 -259.0481 pglsModel_BM 2 3 528.4584 534.5344 -261.2292 ###checking out the 0 model… Denom. DF: 54 numDF F-value p-value (Intercept) 1 289.67228 <.0001 ContGroup 1 0.21234 0.6468``

I realize the differences are rather small, but all methods point to $$lambda =0$$ as the 'more likely' model.

My third question is -- although, possibly answered by the second question -- do we assume that there must be some amount of phylogenetic signal due to shared history, such that it is suffiecient to only test for $$hat{lambda}$$ and compare to $$lambda = 1$$?

Note: Just to be sure, I compared all Brownian models with corBrownian to their $$lambda = 1$$equivalents, and got the exact same outputs.

I have a tentative answer to my question.

In the publication for the R package phylosignal (Keck, 2016), they state:

To test the presence of phylogenetic signal, the null hypothesis is that trait values are randomly distributed in the phylogeny. Another null hypothesis might be that trait values follow a Brownian motion model but it is less often used and implemented.

So it would seem there are indeed two null hypotheses, however it seems disingenuous to not test for both, especially since it is relatively simple to do. And of the two, should you for some reason be limited to one, testing for $$lambda = 0$$ (or the log-likelihood of a general linear model) should be your first choice.

Another issue was with corPagel from the package ape. It requires an initial value to estimate Pagel's $$lambda$$ (Unless you fix the value, of course). Convergence is not guaranteed, and sometimes requires some fine-tuning of the initial value. This set off a bit of a red-flag, so I collected the median lambda from 5000 subsets of my data (I am using a different family/phylogeny/dataset, where the $$hat{lambda}$$ model is preferred over a Brownian, but not $$lambda = 0$$).

I think this is stronger argumentation for a phylogenetic signal, albeit a weak signal.

``#output of comparing lambda=0, lambda=1, and median lambda Model df AIC BIC logLik pglsModel_0 1 3 636.9876 643.6899 -315.4938 pglsModel_1 2 3 681.9784 688.6807 -337.9892 pglsModel_MEDIAN 3 3 636.4812 643.1836 -315.2406 #output of comparing median lambda to the estimated lambda from the full dataset Model df AIC BIC logLik Test L.Ratio p-value pglsModel_MEDIAN 1 3 636.4812 643.1836 -315.2406 pglsModel_FULL 2 4 637.7024 646.6388 -314.8512 1 vs 2 0.7788265 0.3775``

By the evidence of the distribution of lambdas, and the median model's AIC, BIC, and log-likelihood, we can argue in favor of the median model. Residuals also look normal and random. Confidence intervals should also be obtained. But, in reality, here, the difference here between a weak phylogenetic signal and none at all are vanishingly small.

## Identifying environmental versus phylogenetic correlates of behavioural ecology in gibbons: implications for conservation management of the world’s rarest ape

For conservation of highly threatened species to be effective, it is crucial to differentiate natural population parameters from atypical behavioural, ecological and demographic characteristics associated with human disturbance and habitat degradation, which can constrain population growth and recovery. Unfortunately, these parameters can be very hard to determine for species of extreme rarity. The Hainan gibbon (Nomascus hainanus), the world’s rarest ape, consists of a single population of c.25 individuals, but intensive management is constrained by a limited understanding of the species’ expected population characteristics and environmental requirements. In order to generate a more robust evidence-base for Hainan gibbon conservation, we employed a comparative approach to identify intrinsic and extrinsic drivers of variation in key ecological and behavioural traits (home range size, social group size, mating system) across the Hylobatidae while controlling for phylogenetic non-independence.

### Results

All three studied traits show strong phylogenetic signals across the Hylobatidae. Although the Hainan gibbon and some closely related species have large reported group sizes, no observed gibbon group size is significantly different from the values expected on the basis of phylogenetic relationship alone. However, the Hainan gibbon and two other Nomascus species (N. concolor, N. nasutus) show home range values that are higher than expected relative to all other gibbon species. Predictive models incorporating intraspecific trait variation but controlling for covariance between population samples due to phylogenetic relatedness reveal additional environmental and biological determinants of variation in gibbon ranging requirements and social structure, but not those immediately associated with recent habitat degradation.

### Conclusions

Our study represents the first systematic assessment of behavioural and ecological trait patterns across the Hylobatidae using recent approaches in comparative analysis. By formally contextualising the Hainan gibbon’s observed behavioural and ecological characteristics within family-wide variation in gibbons, we are able to determine natural population parameters expected for this Critically Endangered species, as well as wider correlates of variation for key population characteristics across the Hylobatidae. This approach reveals key insights with a direct impact on future Hainan gibbon conservation planning, and demonstrates the usefulness of the comparative approach for informing management of species of conservation concern.

## Abstract

Phylogenetic signal is the tendency for closely related species to display similar trait values as a consequence of their phylogenetic proximity. Ecologists and evolutionary biologists are becoming increasingly interested in studying the phylogenetic signal and the processes which drive patterns of trait values in the phylogeny. Here, we present a new R package, phylosignal which provides a collection of tools to explore the phylogenetic signal for continuous biological traits. These tools are mainly based on the concept of autocorrelation and have been first developed in the field of spatial statistics. To illustrate the use of the package, we analyze the phylogenetic signal in pollution sensitivity for 17 species of diatoms.

## Methods

### Data collection and processing

Reference genomes were downloaded along with their corresponding General Feature Format (GFF3) files from the National Center for Biotechnology Information (NCBI) database 18,19,20,21 in August 2018 using the NCBI FTP site: ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/. We used the most recent reference assembly version for each of 247 vertebrate species (see Supplementary Notes S1 and S2 for list of species used in this study). The mammalian taxonomic group was analyzed (114 mammalian species), as well as their non-mammalian vertebrate outgroup (133 non-mammalian species). Our analyses include only vertebrate species because insufficient orthologous ramp sequences were identified in other taxonomic groups. Of archaea, bacteria, fungi, invertebrates, mammalian vertebrates, other vertebrates, plants, protozoa, and viruses, only vertebrates passed our filtering criteria to ensure orthologs contained ramp sequences in at least 5% of the available species and did not contain ramp sequences in at least 5% of the available species. At least 5% of all annotated orthologs needed to pass those filtering criteria for a taxonomic group to be included in our analyses.

We then assessed the congruence of the phylogenetic signal of ramp sequences within mammalian species and their vertebrate outgroup. All coding sequences (CDS) data were extracted from the reference genomes using a GFF3 parser included in JustOrthologs 22 . Any sequences with annotated exceptions, such as translational exceptions, unclassified transcription discrepancies, and suspected errors, were removed from the dataset. Our analyses included all NCBI gene annotations. NCBI gene annotations are calculated by NCBI's Eukaryotic Genome Annotation pipeline for the NCBI Gene dataset. They use a combination of protein sequence similarity and local synteny information to establish orthology. A manual curator may additionally assign orthologous gene relationships. The NCBI database includes 34,202 orthologs for Mammalia and 41,337 orthologs for non-mammalian vertebrates.

### Identifying ramp sequences

Ramp sequences were identified using ExtRamp (Fig. 1). The relative codon adaptiveness was calculated for each codon by using its frequency in the genome. The translation rate at each codon in the gene was then estimated using the mean translational efficiency of a window of codons. A nine-codon sliding window was used to approximate the span of a ribosome, as recommended in the ExtRamp documentation 9 . Ramp sequences were identified when low outlier regions of codon translational efficiency (i.e., a translational bottlenecks) occurred at the beginning of gene sequences. ExtRamp was run on each species FASTA file (.fasta) containing all genes using the options to output the ramp sequence and the portion after the ramp sequence, as described in the ExtRamp README file (https://github.com/ridgelab/ExtRamp) The exact command used is included in Supplementary Note S3.

Identifying Ramp Sequences Using ExtRamp. Flowchart for finding ramp sequences using ExtRamp.

### Recovering phylogenies using the presence and absence of ramps

The presence or absence of a ramp sequence in each annotated ortholog was encoded in a binary matrix. If a ramp sequence was present in an ortholog, it was encoded in the matrix as a '1’, and if it was absent, it was encoded as a '0’. Species that did not contain the ortholog were assigned a '?' for a missing value, similar to other methods that have found phylogenetic signals in codon usage biases 23,24,25 . The effect of missing data was limited by applying an additional filter to the data. An orthologous gene was included in the analyses only if a ramp sequence in that gene was found in at least 5% of the species. Additionally, all species were required to contain ortholog annotations for at least 5% of the orthologs passing that initial filter. After applying this filter, mammalian species had a mean of 16.31% ± 7.81% missing data, and non-mammalian vertebrates had a mean of 28.50% ± 13.11% missing data.

Parsimony phylogenetic trees were recovered using Tree Analysis using New Technology (TNT) 26 . The most parsimonious trees were found by saving multiple trees using tree bisection reconnection (tbr) branch swapping 27 . Maximum likelihood trees were recovered using IQTREE 28 .

### Retrieving reference phylogenies

In order to determine the congruence of the phylogenetic signal of ramp sequences, each of the recovered phylogenies (i.e., parsimony and maximum likelihood trees) were compared to the synthetic phylogeny from the Open Tree of Life (OTL) 29 . Although this phylogeny cannot be considered the "true" tree, it is created from a conglomeration of many phylogenetic studies, and provides a useful resource for benchmarking ramp sequences as a new character state. The synthetic phylogeny was retrieved from the OTL using a previously-published parser, getOTLtree.py 30 , that references the OTL application programming interface (API) to obtain OTL taxonomy identifiers for each species and retrieves the phylogeny from the OTL database. The exact command is included in Supplementary Note S4.

### Comparisons with the OTL synthetic tree

The accuracy of recovered phylogenies based on ramp sequence presence or absence were assessed by comparing each tree to the OTL synthetic phylogeny. The difference was quantified using branch percent comparisons, as implemented by the Environment for Tree Exploration toolkit ete3 compare module 31,32 . This metric computes the percentage of branch similarity between two trees, where a high percentage corresponds to more similar trees. This metric was selected because of its ability to compare large trees, including unrooted trees and trees with polytomies. The baseline performance of the ete3 branch percent identity metric was determined by comparing 1000 random permutations of the mammalian and other vertebrate topologies to the OTL.

### Scoring ramp sequences

Using the binary matrix of ramp sequences within each ortholog, the extent to which ramp sequences are homoplasious was quantified by mapping each ramp sequence to the OTL. For each ramp sequence, the species were divided into two partitions based on presence or absence of the ramp sequence. Since autapomorphies do not provide phylogenetic information, an orthologous ramp sequence was required to be present in at least two species and absent in at least two species, assuming a fully-resolved tree. For each ramp sequence, the number of parallelisms and reversals that occurred was quantified. Parallelisms occur when a character arises independently multiple times due to convergent evolution. Reversals occur when a derived character is lost or when the character reverts back to its ancestral state. A ramp sequence was determined to be orthologous if it correctly separated species according to their relationships reported in the OTL, and if the total number of gain/loss events equaled one, as previously computed for other codon usage biases 23,24 . The number of origin and loss events was then used to calculate the retention index for each ramp sequence 33 , where a retention index of zero represents a fully homoplasious character, and a retention index of one represents a character in which none of the states are homoplasious.

### Statistical calculations using random permutation test

Random permutations were performed in order to determine the extent to which the observed mean retention index of ramp sequences compares to random chance. Permutation tests (also called randomization tests) are non-parametric statistical tests that determine statistical significance by randomly rearranging the labels of a dataset 34 . The taxa in the OTL were shuffled 1000 times to generate random trees. The tree topology of the OTL was maintained to prevent any biases due to tree topology. The retention indices of the ramp sequences were calculated for each random tree to create a null distribution of retention indices due to random chance. The actual mean retention index of the ramp was compared to this distribution and an empirical p-value was calculated as the proportion of permutated retention indices less than or equal to the observed retention index from the OTL.

### Statistical calculation of completely orthologous ramps

A ramp sequence was considered orthologous if all species that either have or do not have the ramp sequence form a monophyletic group. For each orthologous ramp sequence, the probability that it would form a monophyletic group in agreement with the OTL topology due to random chance was calculated. The species were divided into two groups: species with ramp sequences, and species without ramp sequences. The conditional probability was then calculated that a group of species would randomly divide into a monophyletic group concordant to the OTL using the method previously described in Miller, et al. 23 , which describes how (t) total species with (s) number of species in the smaller of the two groups (i.e., species with ramps or species without ramps for a given gene) will track a proposed phylogeny using Eq. (1).

For example, if three species contain a ramp sequence in an orthologous gene and there are seven total species, then the probability that the three species containing a ramp sequence in the orthologous gene would form a monophyletic group in agreement with the OTL topology by random chance is as follows:

For each orthologous ramp sequence, the expected number of ramp sequences was calculated by multiplying the conditional probability by the total number of ramp sequences with that same taxonomic distribution (e.g., if the dataset contained 15 orthologous genes with ramp sequences where there were three species in the smaller group and seven total species, then the expected number of orthologous ramps across that distribution would be (P*15= frac<1><15>*15=1) ). A chi square analysis was performed using the expected number of orthologous ramp sequences versus the observed numbers in order to calculate a p-value for the dataset.

### Control comparisons with shortened sequences

We performed an additional control analysis to ensure that ExtRamp identified ramp sequences that likely affected translational efficiency instead of genomic artifacts by removing the first 50 codons in all genes and rerunning our analysis pipeline. Since the ramp sequence generally occurs within the first 50 codons of a gene, we expected this control analysis to identify significantly fewer ramp sequences than the original dataset. We assessed this difference using a chi square statistic and p-value.

### Recovering phylogenies using aligned sequence data

In order to investigate the hypothesis that nucleotides in ramp sequences provide a different phylogenetic signal than other portions of the gene, the aligned sequences were analyzed using maximum likelihood and parsimony. Ramp sequences for each orthologous group were aligned using Clustal Omega 35 (see Supplementary Note S5 for the command). Sequences were aligned using nucleotide sequence alignment as opposed to amino acid sequence alignment to accommodate potential differences in splice site reading frames between species. Nucleotide sequence alignments allow homologous genes to be aligned that may contain dual-coding exons, which occur when one portion of a sequence can be encoded using different reading frames.

The character matrix was encoded by first concatenating the aligned ramp sequences from each ortholog. Then, if an ortholog was not present in a species, each nucleotide character for that sequence was encoded as a '?' for missing data. The max was then used in IQ-TREE 28 to select the best model 36 and perform a maximum likelihood estimation of the phylogeny. The matrix was also used in TNT to recover phylogenies using parsimony.

Phylogenies were similarly recovered using the aligned sequence after the ramp and the complete gene sequence for each orthologous gene. For the maximum likelihood analysis, the size of the dataset for the portion after the ramp sequence and the complete sequence rendered the automatic model selection impractical due to computational demands. Therefore, we selected the same models that were used on the ramp sequence to evaluate the gene sequence after the ramp sequence and the complete gene sequence, which were GTR + F + R5 for Mammalia and GTR + F + R8 for non-mammalian vertebrates.

## Discussion

ESTs and other partial gene sequences are the predominant source of sequence data for a large and taxonomically diverse set of species. These sequences are tremendously valuable for gene discovery, genome annotation, comparative genomics, marker development, and a variety of other uses [11, 33]. However, for studies of gene family evolution or for large-scale analyses of gene families, one must contend with the large amount of missing data in alignments derived from partial sequences. For example, of the ≈27,000 families in the Phytome database [30] for which there are three or more sequences, the average proportion of alignment gaps is 37%.

Are these missing data really a problem? We found that it was possible to recover accurate trees from alignments in which the missing residues were clustered into columns. Even though half of the simulated alignments had between 50% and 60% missing data, the median stQD for the NJ and ML trees were 0, and the median stQD for the MP trees was 0.004. These results confirm that the presence of missing data itself does not lead to an incorrect phylogeny as long as sufficient data is available for the analysis [20, 22, 24, 25].

However, EST-like gappy alignments appear to be qualitatively different. When the same amount of missing data was distributed in a pattern typical of EST unigenes, phylogenies were much less accurate: mean stQD for trees computed from these alignments ranged from 0.17 for ML to 0.34 for MP. When using NJ, the phylogenetic accuracy was even lower for the EST-like gappy alignments than for alignments in which the same number of residues were randomly deleted. One explanation for these results is that for the random-deleted alignments, there is at least some overlap between all the pairs of sequences. For the EST-like gappy alignments, on the other hand, it is common for some pairs of sequences to share no columns in which data are present (e.g. see gap patterns 5, 6, 9 and 10), and thus no distance can be computed. This poses particular problems for distance methods. For example, PHYLIP reports a distance of "-1.0" for any two sequences that do not overlap in the input alignment. This is taken at face value during execution of the NJ algorithm, leading to a systematic bias toward overly close relationships between sequences in the tree as a result of the lack of overlap between them. The importance of the distribution, and not just the amount, of missing data, was shown earlier in a different context by Wiens [24]. In that study, lower accuracy was obtained when missing genes were randomly distributed among the sampled taxa, compared to data sets in which the missing genes were restricted to monophyletic subsets of taxa.

We have shown that one can improve phylogenetic accuracy by taking either one of two diametrically opposed approaches. In the first approach, one excludes gappy columns and sequences from the analysis through alignment masking. In our implementation of masking (REAP), we mimic the way it would be performed on real data by also excluding columns and rows that show evidence of misalignment, even though there is no alignment error in our simulation. Most of the trees computed from masked alignments using either NJ or ML methods were comparable to those computed from alignments without any missing data (mean stQD of 0.0022 and 0.0026 vs. 0.0 for full alignments). Even for MP tress, alignment masking was able to improve the trees approximately to the level of unmasked NJ trees (stQD of 0.2286). While this may be due solely to the removal of gaps, it may also reflect the removal of alignment positions that have undergone multiple substitutions, thus making the phylogenetic signal clearer in those that remain. Either way, one cannot escape the paradox that the phylogeny is made more accurate by ignoring error-free alignment input. Another important point is that alignment masking comes at the necessary expense of failing to retain all the sequences. On average, 27% of the sequences within an EST-like alignment were excluded by masking in our experiments.

A very different approach is to attempt to model the missing data, which we have done through a technique we call alignment subdivision. Relative to masking, we found that our implementation of alignment subdivision (SIA) was able to retain a much higher proportion of the sequences the median proportion of retained sequences using SIA was 100%. SIA generally, though not universally, led to more accurate trees than those computed directly from the gappy alignment. The greatest improvements in accuracy under SIA were seen in those families that had many subalignments. Where incomplete alignments were divided into 12 or more subalignments, SIA resulted in a more accurate phylogeny in almost all cases. On the other hand, when there were only two subalignments, the phylogeny computed directly from the original alignment was more accurate two-thirds of the time. Perhaps not surprisingly, the number of subalignments was closely associated with the gap pattern used in the simulation. Gap patterns 1, 2, 3, and 8 typically resulted in only one to four subalignments, while gap patterns 6, 9, and 10 typically resulted in a much larger number thus, certain gap patterns are intrinsically more likely to see an improvement under SIA than others.

The improvement in phylogenetic accuracy was generally much higher with masking than with subdivision. NJ trees computed from EST-like alignments were over 100-fold more accurate with alignment masking (stQD = 0.002) than when directly computed (stQD = 0.246). The same differential was only about two-fold when using SIA (stQD = 0.127). The phylogenetic accuracy using SIA was thus comparable to the masked MP trees and the unmasked ML trees. Furthermore, the SIA approach is computationally laborious. Taken together, our results suggest that alignment masking is the preferred approach when the distribution of missing data is EST-like in nature.

While it appears from our results that alignment masking is not necessary when ML is used to infer the phylogeny, this may reflect the lack of alignment error in the simulated data. Although under some circumstances, the choice of phylogenetic inference method is known to have a major effect on phylogenetic accuracy [34], previous studies have shown that both alignment accuracy [35–37] and the ratio of phylogenetic signal to noise in the alignment [38] can be even more important than the choice of phylogenetic method. While we have not studied the effects of misalignment due to using partial gene sequences as input, we suggest that alignment error is likely to improve the relative performance of masking.

In modeling the missing alignment data, we have estimated the distance matrix that we would expect to see in the absence of missing observations. To further develop and optimize the SIA method, other approaches for combining subalignments can be tested in future studies. For example, we have imputed pairwise distances that could not be computed from the submatrices using a four-point metric [18, 39]. Future implementations of SIA could be improved by incorporating a three-point metric or a weighted least-squares imputation [18, 23, 40]. However, because only 17.5% of the cells in the combined matrices were missing, we expect the difference in imputation quality to have only minor effects on the results. Alternative approaches that model the missing alignment data probabilistically or by imputation would allow more accurate (likelihood or Bayesian) phylogenetic techniques to be applied while still retaining all the input sequences. Another interesting approach would be to infer phylogenies separately for each subalignment and then calculate a supertree for the full dataset [41].

## Key words

Plant pathogenic and endophytic Botryosphaeriales known from culture

You are free to share – to copy, distribute and transmit the work, under the following conditions:

Attribution: You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

Non-commercial: You may not use this work for commercial purposes.

No derivative works: You may not alter, transform, or build upon this work.

## U NIVARIATE C ASE : P HYLOGENETIC S IGNAL

The univariate case corresponds to phylogenetic logistic regression applied in the absence of independent variables, so there is only a single parameter that determines the mean. For this case, we construct a model of phylogenetic change of a binary trait by assuming the trait evolves up a phylogenetic tree. During each small increment of time, there is some probability α1 that the trait switches to 1 if it is currently 0 and some other probability α0 that the trait switches to 0 if it is currently 1 thus, evolution up the phylogenetic tree takes the form of a Markov process, as has been used in previous models of evolution of binary traits (e.g., Pagel 1994). This process of evolution leads to a probability distribution for the trait values at the tips of the phylogenetic tree. The absolute magnitudes of α0 and α1 set the rates of transitions between 0 and 1 and hence affect the strength of phylogenetic correlations observed among tip species. For example, if α0 and α1 have large values, then transitions between 0 and 1 occur rapidly, and this will break down the tendency for closely related species to resemble each other.

Although we make these specific assumptions about the evolutionary process to produce a statistical model, we recognize that the evolution of a real trait through time is unlikely to follow this process precisely. For example, the transition probability might vary among branches of the phylogenetic tree. Nonetheless, basing our analyses around a specific (and rather simple) model of evolutionary change makes it possible to derive an explicit statistical distribution for the values of a binary trait among species.

We have chosen to use μ and α as parameters in our statistical model because they have intuitive interpretations and increase the correspondence between the model and standard logistic regression. Nonetheless, the model could also be formulated in other parameters, for example, α0 and α1 (i.e., the transition rates). An important statistical limitation, however, is that only 2 pieces of information are available from data sets, the mean value of Y and the correlation in Y among species. Therefore, it is only possible to estimate 2 parameters. This limitation explains some strategic decisions we made in model formulation. For example, in deriving the correlation matrix C(α) we assumed that the process is at stationarity, so the probability that the trait at the base of the phylogenetic tree has State 1 equals μ, the same as at the tree tips. If we were to assume that the process were not at stationarity, then the correlation matrix would be ⁠ , where m is the expected trait value at the tips and m0 is the expected trait value at the base of the phylogenetic tree. However, this model now has 3 parameters (m, m0, and α) and only 2 can be estimated, so nothing is gained by this formulation. Therefore, because it leads to no loss of generality, we have used the assumption that the process is at stationarity.

The correlation matrix C(α) has a different structure from the correlation matrix that is used for phylogenetic regression of continuous-valued traits ( Martins and Hansen 1997 Garland and Ives 2000 Lavin et al. 2008). For continuous-valued traits under Brownian motion evolution, the correlations in trait values between species are proportional to the lengths of shared branches (off-diagonals) given in the matrix W, whereas for our evolutionary model of a binary process the correlations are given by C(α). The structure of C(α) is identical to that produced for continuous-valued traits following an OU model of evolution under the assumption that the process is at stationarity ( Hansen and Martins 1996 Martins and Hansen 1997 Butler and King 2004). The derivation of the OU process given in Blomberg et al. (2003) differs from that given in these citations by assuming that the trait value at the base of the phylogenetic tree is known with zero variance this has the advantage of producing a transform that returns the original tree (i.e., W) when the parameter giving phylogenetic signal d = 1. This assumption is not an option for the case of binary variables because the variance is determined strictly by the mean. Although the matrix C(α) is never identical to W, when α = 1 the strengths of phylogenetic correlations (off-diagonal elements) are of similar overall magnitude for C(α) and W, and therefore α = 1 serves as a rough reference point to gauge the strength of phylogenetic signal. In other words, when α = 1 the magnitude of phylogenetic correlations among tip values of the trait is approximately of the same magnitude as the phylogenetic correlations that one would expect for continuous-valued traits evolving in a Brownian motion fashion up the same tree. The relationship between C(α) and W, however, depends on the structure of W and therefore should be considered on a case-by-case basis the program PLogReg.m outputs the matrix C(α) so that it can be examined directly (see Supplementary Material).

Because the statistical model requires input of the matrix W that gives expected phylogenetic correlations among species, special consideration needs to be made when a phylogenetic tree has noncontemporaneous tips. The phylogenetic correlations in our model ( equation (1)) depend on the branch-length (patristic) distances between tips on the phylogenetic tree given by off-diagonal elements of 2(1 − W). To preserve the relative distances for a tree with noncontemporaneous tips, let be the matrix with elements giving the shared branch lengths between tips i and j (measured on any scale, e.g., estimates of time, DNA divergence). If T is the matrix whose elements equal the average length from base to tips i and j, and if max( ⁠⁠ ) is the maximum value of the elements in ⁠ , then gives the tip-to-tip distances on the phylogenetic tree standardized so that the maximum distance between tips is 2. Thus, in equation (1) we let to give a standardized way to incorporate phylogenetic trees with noncontemporaneous tips.

As an explicit example, consider the case in which species A and species B have base-to-tip length 2, species C has base-to-tip length 8, and species B and species C share branch length 1, thereby giving ⁠ . The species-to-species branch lengths are then ⁠ , and hence the standardized distance matrix ⁠ . Here, species A and species B are nearest and therefore have the lowest corresponding element of 2(1 − W), even though in the initial tree ⁠ , species B and species C are the phylogenetically related species.

### Parameter Estimation

Although it is possible to derive the likelihood function for the evolutionary process we described above (e.g., Pagel 1994) and hence to estimate parameters μ and α using ML estimation, instead we use a procedure that is more flexible and numerically more efficient. Specifically, we estimate μ given α using the quasi-likelihood function and then estimate α given μ using least-squares estimation, repeatedly alternating between estimating μ and α until both values converge. Both quasi-likelihood and least-squares estimation require knowing only the first 2 statistical moments of the probability distribution of trait values among tip species however, for a binomial process the first 2 moments fully specify the distribution, and therefore the estimation procedure uses all information provided by the data.

The quasi-likelihood function is derived from the expectation and variance of the distribution of Y. Although for any distribution the quasi-likelihood function only approximates the likelihood function, quasi-likelihood estimates are the same as ML estimates, and the asymptotic properties of the estimators which are used to derive, for example, approximate confidence intervals are the same ( McCullagh and Nelder 1989). In the evolutionary process described above, the expectation of all elements of Y is simply μ, and the correlation structure of the distribution of Y is given by C(α) ( equation (1)), which together define the quasi-likelihood function for a given value of α. Quasi-likelihood estimation underlies GEE ( Liang and Zeger 1986 Zeger and Liang 1986 Zeger et al. 1988). The GEEs proposed for phylogenetic analyses of comparative data ( Paradis and Claude 2002 Forsyth et al. 2004) have been first-order approximations (GEE1), whereas it is also possible to use second-order approximations (GEE2) that incorporate both the mean components of the models (regression coefficients) and the variance components (those that affect the covariance matrix, such as the parameter α) ( Prentice 1988 Zhao and Prentice 1990 Liang et al. 1992). However, for our application the second-order GEE2 is prohibitively complex and the first-order GEE1 often had poor convergence properties (results not presented). We therefore used quasi-likelihood functions directly, employing simplex minimization to find the ML parameter values rather than Newton–Raphson minimization that is typically used in the GEE approach.

## An introduction to phylosymbiosis

Phylosymbiosis was recently formulated to support a hypothesis-driven framework for the characterization of a new, cross-system trend in host-associated microbiomes. Defining phylosymbiosis as ‘microbial community relationships that recapitulate the phylogeny of their host’, we review the relevant literature and data in the last decade, emphasizing frequently used methods and regular patterns observed in analyses. Quantitative support for phylosymbiosis is provided by statistical methods evaluating higher microbiome variation between host species than within host species, topological similarities between the host phylogeny and microbiome dendrogram, and a positive association between host genetic relationships and microbiome beta diversity. Significant degrees of phylosymbiosis are prevalent, but not universal, in microbiomes of plants and animals from terrestrial and aquatic habitats. Consistent with natural selection shaping phylosymbiosis, microbiome transplant experiments demonstrate reduced host performance and/or fitness upon host–microbiome mismatches. Hybridization can also disrupt phylosymbiotic microbiomes and cause hybrid pathologies. The pervasiveness of phylosymbiosis carries several important implications for advancing knowledge of eco-evolutionary processes that impact host–microbiome interactions and future applications of precision microbiology. Important future steps will be to examine phylosymbiosis beyond bacterial communities, apply evolutionary modelling for an increasingly sophisticated understanding of phylosymbiosis, and unravel the host and microbial mechanisms that contribute to the pattern. This review serves as a gateway to experimental, conceptual and quantitative themes of phylosymbiosis and outlines opportunities ripe for investigation from a diversity of disciplines.

### 1. Introduction

The last decade has brought renewed interest in the complexity of microorganisms living in association with hosts, yielding a number of new empirical results, philosophical concepts and research opportunities [1,2]. Any discussion on the study of host–microbiome interactions must begin with clear definitions. Here, we use the term symbiosis (sym—‘together’, bios—‘life’ in Greek) to encompass associations between two or more organisms of different species and without restriction to the length of time of the association or phenotypes produced by the interacting species. Since temporal and functional variation in symbiosis is context-dependent, symbiotic interactions can include a range of obligatory, facultative, transient and permanent associations with varying degrees of specificity and functional costs and benefits.

The last two decades of research and technological advances have placed microbial symbiosis as a nexus of many subdisciplines within and beyond biology. Scholars now have a suite of tools and increased awareness of the major questions to be answered. These include holistic approaches for the identification of ecological [3] and host [4–7] drivers of microbial taxonomic and functional diversity, as well as reductionist approaches that provide evolutionary and mechanistic insights into transmission processes [8] and phenotypic outcomes of symbiosis [1]. The abundance of empirical and theoretical investigations on the ecology and evolution of simple symbioses also comprise fertile ground to build a foundation for the microbiome field that studies frequently complex associations between hosts and their multiple microbial associates. One rapidly growing research area across diverse systems is the recently defined pattern of phylosymbiosis [9]. This review aims to synthesize the topic to provide: (i) a long-lasting definition of the term (ii) a practical guide to test phylosymbiosis (iii) an overview of the prevalence of phylosymbiosis (iv) a discourse on the biological significance of phylosymbiosis and (v) future directions in phylosymbiosis research.

### 2. What is and what is not phylosymbiosis?

We use the following quote to describe our initial and basic definition of phylosymbiosis, namely ‘microbial community relationships that recapitulate the phylogeny of their host’ [9]. Phylosymbiosis is first and foremost a significant association between host phylogenetic relationships and host-associated microbial community relationships wherein ‘phylo’ refers to the host clade and ‘symbiosis’ refers to the microbial community in or on the host.

Prior to the introduction of the term phylosymbiosis in a study of Nasonia parasitoid wasp species [9], early investigations specified relationships between host phylogenies or genetic distances with microbial beta diversity in maize [10], insects [5,11] and mammals [4,12]. These studies used bacterial 16S rRNA gene sequencing across multiple host species to demonstrate that closely related species harbour more similar microbiomes than distantly related species. For example, the sister species N. giraulti and N. longicornis diverged approximately 0.4 Ma and harbour more similar 2nd instar larval, pupal and adult microbiomes compared with the microbiome in their outgroup species N. vitripennis [9,11], which diverged approximately 1.0 Ma from the two sister species [13].

Phylosymbiosis may arise from stochastic and/or deterministic evolutionary and ecological forces. For example, stochastic effects include dispersal fluctuations in microbial communities (ecological drift) or shifts in host geographical ranges [14]. Phylosymbiosis can also be shaped by ecological [15–17] and dietary [4] niche variation across host lineages. Deterministic effects include microbial colonization preferences for certain host backgrounds or host regulation in which microbial community composition is influenced by host trait(s) [18]. The first study linking phylosymbiotic patterns to the function of specific host genes found that knockdown of the Hydra armenin antimicrobial peptide disrupted phylosymbiosis [6] commonly observed in several freshwater and laboratory Hydra species [19]. Although phylosymbiosis can potentially arise from long-term, intimate host–microbe associations over evolutionary time, such as through host–microbe coevolution, codiversification [20] and cospeciation [21], importantly it may also be driven by relatively short-term changes in microbiome composition. Indeed, a recent Drosophila melanogaster study revealed the effects of gut microbiome changes on host genomic divergence in as little as five generations [22]. This suggests that rather than being passive agents of phylosymbiosis, microbial communities have the potential to induce host genomic changes that could, in turn, impact the establishment, maintenance or breakdown of phylosymbiosis.

While phylosymbiosis distinguishes itself from non-phylosymbiosis by a significant degree of association between host phylogenetic and microbiome community relationships, it is not universal (§5) and therefore provides a testable hypothesis. Determining the presence of phylosymbiosis is a first step preceding further investigations into eco-evolutionary mechanisms, such as the nature of species–species associations, selective or neutral forces driving phylosymbiosis, and the (in)consequences of the pattern on the host and microbial phenotypes. If phylosymbiosis results from an evolutionary selective pressure, then decreases in host or microbial fitness are expected upon host exposure to microbiomes from different host lineages in an evolutionarily informed manner. Evolutionary selective pressures that result in phylosymbiosis could drive the spread of host traits that regulate microbiome composition or microbial traits that enhance host colonization. In this general light, we refer to ‘functional phylosymbiosis' when the host and/or microbial phenotypes impact or are impacted by phylosymbiotic associations.

Interspecific microbiome transplant experiments are useful in elucidating functional phylosymbiosis. A large-scale phylosymbiosis investigation spanning 24 species across four laboratory-reared host clades (Nasonia wasps, Drosophila flies, mosquitoes and Peromyscus deer mice) demonstrated that interspecific transplants of gut microbial communities between Peromyscus species decreased dry matter digestibility and increased food intake, while transplants between Nasonia species markedly lowered survival to adulthood by nearly half [23]. In addition, interspecific microbiomes are more costly to Nasonia larval growth and pupation than intraspecific microbiomes [24]. Similarly, reciprocal maternal symbiont transplants between two wild, sympatric Ontophagus dung beetle species caused developmental delay and elevated mortality in non-native hosts that persisted to the next generation [25]. Collectively, phylosymbiotic associations that impact host fitness support the premise that hosts are adapted to their native microbiomes rather than non-native microbiomes, although more studies are needed to confirm these associations and effects in captive and wild host populations.

Hybridization between host species causes host–microbiome mismatches since combining independently evolved host genotypes in a hybrid may cause a breakdown in either microbial colonization preferences for certain hosts or host control of the microbiome. As demonstrated in Nasonia [9], house mice [26] and whitefish [27], hybrids have an altered microbiome relative to the parental microbiome, suggesting a reduced capacity for hosts to regulate their microbiomes and an increased capacity for pathogenic microbes to bloom. These breakdowns in host–microbiome interactions can associate with maladaptive phenotypes in hybrids including immune dysfunction, pathology, inviability and sterility [9,26] that can reduce interbreeding between species or populations. In Nasonia, the lethality of hybrids between the older species pair was rescued by germ-free rearing and restored by feeding an inoculum of select, resident gut bacterial species from parents to germ-free hybrids [9]. By contrast, hybrids between a younger Nasonia species pair did not have an altered microbiome nor suffer functional costs. Collectively, the results from interspecific microbiome transplant experiments and host hybridization studies illustrate that host–microbiome interactions across host species can have important functional consequences that impact evolutionary events within and between species, including wedging host populations into species.

Having now summarized phylosymbiosis, we briefly accentuate what phylosymbiosis is not, for clarity. Phylosymbiosis does not necessarily imply vertical transmission, mutualistic interactions or evolutionary splitting from a common ancestor via coevolution, cospeciation, co-diversification or cocladogenesis. Although these processes may lead to phylosymbiosis, the pattern may alternatively arise by antagonistic interactions and/or horizontal microbial transmission whereby interactions between hosts and environmental microbes establish phylosymbiosis anew each generation. As such, phylosymbiosis has varied underpinnings subject to empirical investigation, and it may appear at certain points of time and space rather than be stable throughout a host's entire lifespan.

### 3. A practical guide to studying phylosymbiosis

Investigations of phylosymbiosis vary in approach (qualitative versus quantitative), methodology and statistical power [18]. Thus, a clear, consistent and robust workflow to detect phylosymbiosis is desirable for newcomers and experts alike. Here, we suggest a comprehensive workflow for examining phylosymbiosis (figure 1).

Figure 1. Sequential overview of bioinformatic methods commonly used for phylosymbiosis analyses. (Online version in colour.)

#### (a) Host taxa and input data

Because phylosymbiosis detection involves the collection of replicated samples across multiple taxa, both optimization of statistical sensitivity [28] and specificity [18], as well as minimization of sequencing batch effects, are crucial for differentiating between noise and signal. Although our 2016 study showed that rooted trees with four Nasonia species are sufficient to detect phylosymbiosis within the clade [23], we suggest the use of appropriate power and effect size analyses (reviewed in [29] for microbiome data) to determine sufficient replicates and taxa for the optimization of statistical power [28]. Sampling multiple individuals per species will help resolve noise from signal in microbial community relationships, but further study is required on how replicates of inter- and intraspecies samples are best used in studying phylosymbiosis across host clades that can vary in divergence times. If available, experimental designs of successful phylosymbiosis studies with similar sample types can also be adapted accordingly [30]. Previous studies have successfully detected phylosymbiosis in host taxa spanning approximately 0.3–100 Myr of evolutionary history [21,23], and whether longer times since a last common ancestor impacts phylosymbiosis detection requires further study. Nucleotide or amino acid sequence(s) from host species can be used to generate a phylogenetic or phylogenomic tree that is confidently supported at branching nodes with bootstrap [31] or other measures [32] and across several phylogenetic inference methods (e.g. maximum likelihood [33] and Bayesian inference [34]). Because an accurate host phylogenetic topology is essential for evaluating phylosymbiosis, the tree should be free from systematic artefacts such as long-branch attraction and polytomies should be resolved in the host phylogeny when possible. As methods used to reconstruct a host phylogeny from a sequence alignment have been extensively reviewed [35], we will not discuss them further here. With a host evolutionary tree, pairwise host distances can also be represented as cophenetic distances, computed as the sum of branch lengths connecting a pair of terminal nodes on a phylogenetic tree [36].

#### (b) Microbiome input data

Phylosymbiosis analysis requires microbial diversity data from each host lineage. Short-read sequencing of microbial phylogenetic marker genes (e.g. 16S rRNA gene) is common and economical for microbial profiling. Processed sequenced reads can be analysed by one of two current methods. First, they can be clustered into operational taxonomic units (OTUs) at different sequence cutoffs (e.g. 97% and 99%) with and/or without reference sequence database [37,38]. OTU clustering cutoffs reflect genetic distances between taxa over evolutionary time and may affect phylosymbiosis detection [39] such variability has also been observed in practice (reviewed in [18]). Second, reads can be resolved into amplicon sequence variants (ASVs) without clustering, which may offer single-nucleotide resolution, though sequencing error rates should be accounted for [40]. For the greatest sensitivity in phylosymbiosis assessment, meta-omics datasets are advantageous because finer-scale taxonomic and functional profiling can be achieved [41]. Metagenomic sequence data were used to demonstrate viral phylosymbiosis in Nasonia [42] as well as the varying effects of host phylogeny and ecology on the composition and functions of non-human, primate gut microbiomes [43,44].

#### (c) Microbial beta diversity measures

Microbial beta diversity, which measures dissimilarities in microbial composition and structure across host samples, is conventionally used to measure phylosymbiosis. Binary measures, such as Jaccard distance and Sørensen–Dice distance [45,46], are calculated with OTU presence/absence data. Quantitative descriptors of OTU abundances can also compute beta diversity, including the Bray–Curtis dissimilarity [47] derived from Motyka et al.'s coefficient [48]. Phylogeny-based metrics, such as weighted and unweighted unique fraction (UniFrac), use phylogenetic distances between communities (samples) to calculate microbial community differences, necessitating the use of a microbial phylogenetic tree as input to calculate the total community distance [49].

Because beta diversity metrics reflect different aspects of dissimilarity, the choice of metric is study specific and depends partly on the microbial composition and evolutionary history of the lineages studied. Binary metrics based on presence/absence are more sensitive to variations in rare taxa and were implemented to study host specificity of sponge microbiomes, where rare taxa comprised more than 90% of distinct OTUs [50]. Binary metrics may also be sensitive to recent microbial diversification because recently diverged OTUs/ASVs will exert the same effect as OTUs/ASVs with a longer divergence history [39]. By contrast, quantitative metrics are more sensitive to variations in abundant taxa. Besides taxonomy-based phylosymbiosis studies [23,51–53], quantitative metrics have also been applied to metagenomics data [42,43]. Metrics that consider phylogenetic relationships between OTUs, such as UniFrac distances, [54] are applied in many other phylosymbiosis studies, including bats [55], corals [20] and mammals [4,43].

Microbiome distinguishability, or the characteristic of being able to significantly differentiate microbial communities of host lineages under evaluation, is a prerequisite for phylosymbiosis and should be tested before evaluating the phylosymbiosis prediction that more similar host species harbour more similar microbiomes [20,23,51–53]. Microbiome distinguishability can be visualized from beta diversity data and categorical sample grouping data using ordination plots, such as principal coordinate analysis (PCoA) and non-metric multidimensional scaling (NMDS) plots [56]. In addition, microbiome distinguishability can be further evaluated using typically non-parametric multivariable analyses, such as analysis of similarities (ANOSIM) [57] and variants of permutational multivariate analysis of variance (PERMANOVA) [58]. Specific pairwise comparisons of intra- and interspecific microbial beta diversity distances can also be performed with an appropriate non-parametric two-sample test [23].

#### (d) Quantifying phylosymbiosis

The determination of phylosymbiosis relies on evaluating a significant association between host phylogenetic relationships and host-associated microbial community distances. To this end, topological congruency tests directly compare topologies of a host phylogenetic tree and a microbiome dendrogram [23,42,51–53,59]. To generate a hierarchical dendrogram, several agglomerative hierarchical clustering methods (reviewed in [56]) can cluster microbial beta diversity distances. The most commonly used method, unweighted pair group method with arithmetic mean (UPGMA), performs pairwise sample clustering from their average dissimilarity values and gives all samples equal weights [60]. Compared with linkage clustering approaches, UPGMA prioritizes relationships among groups over individual samples [56]. By assigning equal weights to all samples, UPGMA assumes that samples in each group are representative of groups in the larger reference population [56]. As such, it may be sensitive to sample sizes and may generate unstable topologies with imbalanced data where some groups are oversampled while some are undersampled. Newer clustering methods, such as the phylogenetically aware squash clustering method, directly compute distances between samples (rather than differences between beta diversity distances) based on their positions on a microbial phylogenetic tree [61]. In general, the effects of clustering methods on phylosymbiosis detection require further study.

Topological comparison metrics, such as the Robinson–Foulds metric and the more robust and sensitive matching cluster metric, are frequently used to detect phylosymbiosis [23,42,51,52,59,62]. Robinson–Foulds analyses the distance between two trees as the smallest number of operations required to convert one topology to the other [63], while matching cluster considers congruency at the subtree level and is, therefore, a more refined evaluation of small topological changes that affect incongruence [64]. Statistical significance (p-values) has been evaluated by determining the probability of 100 000 randomized bifurcating dendrogram topologies yielding equivalent or more congruent phylosymbiotic patterns than the microbiome dendrogram [23]. Moving forward, improved randomization techniques that preserve conspecific relationships will be useful in reducing false positives. Normalized Robinson–Foulds and matching cluster scores can be calculated as the number of differences between the two topologies divided by the total possible congruency scores for the two trees, with normalized distances ranging from 0 (complete congruence) to 1 (complete incongruence) [23].

Matrix correlation methods identify phylosymbiosis by comparing the similarities between host-derived and microbial-derived distance matrices. Methods implemented in phylosymbiosis studies [20,21,39,50,65–72] include variations of the Mantel test, which statistically evaluates the linear correlation between all corresponding elements from two independent matrices by permutation [73] and the more powerful Procrustean superimposition approach, which rotates and fits two matrices to minimize their differences association [74]. Partial Mantel tests [75] measuring correlations between two matrices while controlling for the effects of a third variable described in another matrix are also used to evaluate associations between microbial communities and multiple aspects of host characteristics, such as phylogeny, identity, genetic distances and geographical distances [39,66,67,69].

Although both topology-based and matrix-based tests are specific and sensitive enough to detect phylosymbiosis in a variety of empirical cases, there are several differences between them. Topological comparison metrics do not use branch length information as there is no a priori reason to assume rates of host evolution in each lineage should equal rates of ecological community change in the microbiome. Indeed, rates of microbiome change may be expected to be far more rapid than the gradual evolution of host genetic changes. As such, tests of topology without relative branch lengths are conservative relative to matrix correlation methods that directly rely on comparisons of host genetic divergence with microbial community dissimilarity. A simulation analysis suggested that the Mantel test has higher sensitivity and power than the Robinson–Foulds metric when phylosymbiosis is based on the assumption of microbial preferences for a host trait [19]. The practical relevance of this conclusion is not clear because phylosymbiosis will arise from reasons other than microbial colonization preferences, such as host preferences, neutral processes and microbe–microbe interactions. Moreover, the performance between the Mantel test and the more sensitive topology-based matching cluster distance was not evaluated in this simulation, and such comparisons are likely to yield different insights. Systematic benchmarking of type I and II error rates of phylosymbiosis measurement methods across various possible scenarios will aid experimental design and result interpretation. As such, research opportunities for the development and implementation of improved phylosymbiosis detection methods are ample.

#### (e) Parameter selection

Phylosymbiosis detection involves the selection of various parameters, such as OTU identity cutoff, beta diversity metric, clustering method and congruency test, each with their strengths and limitations that will vary with study design and questions. Although various parameter combinations can be tested and compared simultaneously [39], in the case when only a few of all possible parameter combinations detect phylosymbiosis, we recommend cautious interpretation of results with respect to the chosen parameters. If available, results should also be compared to those from previous phylosymbiosis studies with similar sample types using the same parameter combinations. Experimental replication is also necessary to confirm phylosymbiosis, especially when it is not consistently detected.

#### (f) Phylogenetic comparative methods

The effects of phylogenetic signal, defined as ‘a tendency for related species to resemble each other more than they resemble species drawn at random from the tree’ [76], on univariate traits (e.g. microbial alpha diversity) have been examined in parallel with phylosymbiosis studies [66,67]. Phylogenetic signal indices like Pagel's λ [77] and Blomberg's K [78] are based on a random Brownian model of trait evolution [79], but can also be used with and compared to more complex models that take into account natural selection. Although these methods are less commonly used on multivariable data and have not yet been applied to evaluate phylosymbiosis explicitly, they are promising alternatives for not only examining host phylogenetic signal on microbial beta diversity, but also testing evolutionary models relevant to phylosymbiosis.

Phylogenetic comparative methods, such as phylogenetic independent contrasts [79] and phylogenetic generalized linear mixed models (pGLMMs) [80], predict the evolutionary correlation between two or more discrete or continuous traits given a known phylogeny and an evolutionary model. These can also be integrated into phylosymbiosis studies. pGLMMs were recently implemented in coral microbiome [20] and passerine feather microbiome studies [71] to examine the effects of latitude and colony size on coral alpha diversity, cophylogenetic coral–bacteria relationships, and relationships between alpha diversity and relative abundances of bacteriocin-producing bacteria and keratinolytic feather damaging bacteria. Because phylosymbiosis may arise from ecological (among other) forces, these methods can be useful in understanding the various ecological interactions that possibly underlie phylosymbiosis.

Overall, as meta-omics and trait evolution analyses become more widely applicable to phylosymbiosis, one compelling direction of future phylosymbiosis investigations in silico is to venture beyond host phylogenetic effects on microbial diversity to resolve linkages between host phylogeny, host functions, microbial diversity, microbial functions, selective forces and environmental factors.

### 4. The prevalence of phylosymbiosis

A major goal of microbiome science is to find general paradigms and rules, if any, that are comparable across varied systems. In this light, phylosymbiosis is emerging as a bona fide trend because of its frequent recurrence across eukaryotic host systems (figure 2). Phylosymbiosis in insects include viromes of Nasonia parasitoid jewel wasps [42] and gut microbiomes of cockroaches, termites [81], lab-reared [23] and wild mosquitoes [59], Cephalotes turtle ants [39] and Apis social corbiculate bees [69]. In Drosophila flies, phylosymbiosis patterns are either weakly supported [23] or not detected [82] in laboratory strains and wild populations.

Figure 2. Representative diversity of phylosymbiosis across host species, tissues, habitats and functions. Asterisks denote taxa with mixed evidence of phylosymbiosis. (Online version in colour.)

The first phylosymbiosis study on mammalian gut microbiomes [4] demonstrated the effects of animal phylogeny and diet on gut microbial community dissimilarity [12,21,23,39,70,83]. Studies focusing on gut microbiomes of specific animal groups detected phylosymbiosis in American pikas [51] and Peromyscus deer mice [23,52], no phylosymbiosis in western chipmunks [84], and mixed evidence of phylosymbiosis in primates [17,43,44,70], bats [55,85] and birds [62,68,86,87]. A recent large-scale study revealed much stronger effects of host phylogeny and diet on the gut microbiomes of non-flying mammals than those of bats and birds [72]. Besides gut or faecal microbiomes, animal surface microbiomes have also been analysed for phylosymbiotic associations [88], which for example occur on mammalian skin [53] and passerine feathers [71], but not on amphibian skin [3]. A meta-analysis of phylosymbiosis literature highlighted an increased prevalence of the trend in microbiomes inhabiting internal host compartments in relation to those inhabiting external host compartments [18]. However, the finding may be inherently biased due to the larger number of studies investigating phylosymbiosis in the gut in relation to other external host compartments.

Beyond terrestrial and associated habitats, research interest in phylosymbiotic associations in aquatic habitats is steadily growing (figure 2), spanning global sponge microbiome surveys [67,89,90] and taxon-specific sponge surveys [50,65,66] with mixed results. Two previous studies in sponges showed significant correlations between host phylogeny and microbial beta diversity [66,67]. In Australian scleractinian corals, phylosymbiosis was generally observed in tissue and skeleton compartments, but not mucus specimens that are predominantly influenced by the environment [20], suggesting different anatomical impacts on the pattern. Phylosymbiosis and host dietary impacts also occur on the skin microbiomes of 44 fish species from the western Indian Ocean [91], but do not exist on the surface microbiomes of sympatric kelp species [92].

Phylosymbiosis has been assessed in plants, mainly to distinguish the effects of host phylogeny and soil determinants on microbial beta diversity. A comparative analysis of lycopods, ferns, gymnosperms and angiosperms across a coastal tropical soil chronosequence indicated host phylogeny is a secondary but statistically significant factor shaping root-associated bacterial community structure, after soil age [15]. More taxonomically and/or spatially restricted surveys have also revealed phylosymbiosis between rhizobacterial communities and Poaceae crop plants [93], endosphere bacterial communities and 30 plant species [94], rhizosphere-associated fungal communities and willows from hydrocarbon-contaminated soils [95], root-associated eumycotan fungal communities and Asteraceae flowering plants in a dry grassland [96], ectomycorrhizal fungal communities and conifer–broadleaf forest trees [97], and ectomycorrhizal fungal communities and Estonian Salicaceae willows [98]. Contrarily, qualitative incongruency between Brassicaceae host phylogeny and their root microbiomes has been observed [99], whereas non-statistically significant phylosymbiotic correlations have been reported in other plant microbiome studies [16,100].

### 5. Significance and future directions of phylosymbiosis

Microbiome research will continue to be revolutionized by the multi-omics era, where a deluge of data has enabled unprecedented insights into the extensive taxonomic, genetic and functional composition of microbial communities and their associated hosts. Such large-scale accumulation of empirical and theoretical findings can potentiate the development of new hypotheses, unifying concepts and frameworks across diverse host–microbiome systems. Indeed, the recurrence of phylosymbiosis across host systems lends itself to large comparative surveys across kingdoms of life that may uncover taxonomic range restrictions of phylosymbiosis as well as the environmental parameters (e.g. soil and water properties) and ecological interactions (e.g. diet and predator–prey relationships) that determine the boundaries of where and when phylosymbiosis occurs. If the microbiome field will have general trends to test in new systems, phylosymbiosis is well poised for this circumstance.

Phylosymbiosis distinguishes itself from non-phylosymbiosis by characterizing a significant degree of association between host phylogenetic and microbiome community relationships. It provides a testable hypothesis, reflects the variation likely to be seen in nature and is amenable to explanation by mechanisms that require further investigation. The determination of whether phylosymbiosis is present or not is a first step preceding further investigations into mechanistic details, such as the nature of species–species associations and the type(s) of ecological and evolutionary genetic processes underpinning phylosymbiosis.

Phylosymbiosis also engenders a holistic view of ecology and evolution in which hosts are communities or holobionts whose microbial members can contribute to genetic and phenotypic variation subject to natural selection. Several questions have been conventionally overlooked. For example, what are the microbial effects on host allele frequencies? Does host gene flow in natural populations impact microbiome variation and phylosymbiosis? Is phylosymbiosis associated with the acceleration or deceleration of host speciation? What are the genetic and mechanistic factors that regulate phylosymbiosis and how do these factors vary across populations or species? Collectively, studies determining the magnitude of ecological, evolutionary and genetic forces in structuring phylosymbiosis represent an important area of future research.

### 6. Conclusion

Phylosymbiosis defines a link between host evolutionary relationships and microbial diversity that is quantifiable and applicable across living systems. As research in this area proliferates, a definition, conceptual framework and workflow for assessing phylosymbiosis will facilitate identification of phylosymbiotic host–microbe interactions. Future cause-and-effect studies of phylosymbiosis will bring a further mechanistic understanding of the evolutionary, genetic and molecular bases. Just as no mature theory of evolutionary genetics was possible until we understood the mode of inheritance, no mature principle of evolutionary ecology for host-associated microbiomes seems possible until we understand the general mechanisms establishing host–microbiome associations.

## Case Study I: Felsenstein’s Worst-Case Scenario

More than anything else, it was the famous series of figures depicting the “worst-case scenario” (Figs. 5, 6, and 7 in the original our Fig. 2) from Felsenstein’s iconic 1985 article “Phylogenies and the comparative method” that awakened biologists to the need for tree-thinking and started a revolution in modern comparative biology. The idea is simple: as a result of shared ancestry, measurements taken on one species will not be independent from those collected on another and especially so, if the two species are closely related. This nonindependence can create apparent correlations between traits that, are in truth, evolving independently. To illustrate the effect of nonindependence of characters, Felsenstein generated a scenario in which two clades are separated by long branches (our Fig. 2). He then evolved traits according to a BM process along the phylogeny he recovered a significant regression slope using Ordinary Least Squares (OLS) despite there being no evolutionary covariance between the traits.

Felsenstein’s worst-case scenario ( Felsenstein 1985) illustrates a problem quite like that identified by Maddison and FitzJohn. Here we modify Felsenstein’s original generating process from simple BM, to A) BM with a single burst occurring on the stem branch of one of the two clades (indicated by vertical dash). B) The distribution of trait values produces a figure very similar to Felsenstein’s original scenario, but results in C) a single contrast (black) that is not well-described by the estimated BM process, and thereby generates a significant regression of PIC Y and PIC X (dotted line) despite both X and Y in the shift and BM distributions being uncorrelated. D) As the ratio of the shift variance to the BM variance increases, the proportion of contrast regressions that return a significant result increases dramatically (each point represents 200 simulations for a fixed phylogeny, with both the BM process and the random draw from the shift distribution being uncorrelated with equal variance for both traits). While IC corrects for singular events consistent with BM, it does not correct for the more general phenomenon of dramatic singular events driving significant results in comparative analyses. Note that the nonindependence of species is not the issue.

Felsenstein’s worst-case scenario ( Felsenstein 1985) illustrates a problem quite like that identified by Maddison and FitzJohn. Here we modify Felsenstein’s original generating process from simple BM, to A) BM with a single burst occurring on the stem branch of one of the two clades (indicated by vertical dash). B) The distribution of trait values produces a figure very similar to Felsenstein’s original scenario, but results in C) a single contrast (black) that is not well-described by the estimated BM process, and thereby generates a significant regression of PIC Y and PIC X (dotted line) despite both X and Y in the shift and BM distributions being uncorrelated. D) As the ratio of the shift variance to the BM variance increases, the proportion of contrast regressions that return a significant result increases dramatically (each point represents 200 simulations for a fixed phylogeny, with both the BM process and the random draw from the shift distribution being uncorrelated with equal variance for both traits). While IC corrects for singular events consistent with BM, it does not correct for the more general phenomenon of dramatic singular events driving significant results in comparative analyses. Note that the nonindependence of species is not the issue.

While other researchers had hit upon similar notions throughout the early 1980s (e.g., Clutton-Brock and Harvey 1980 Mace et al. 1981 Ridley 1983 Stearns 1983 Cheverud et al. 1985), none of these had the pervasive impact that Felsenstein’s presentation did (see e.g., Losos 2011 who reproduces the figures and the accompanying reasoning in his presidential address for the American Society of Naturalists). The problem is just so obvious—data from different clades clustering in different parts of the bivariate plot—all you have to do is look. And while of course his proposed solution, “independent contrasts” (IC), was widely adopted, we suspect it is the clarity with which Felsenstein articulated the problem that has kept his article a hallmark of biological education and a testament to the importance of tree-thinking, even as his method has largely been superseded by the least squares ( Grafen 1989) (which is identical to IC if BM is used to model the covariance of errors: Rohlf 2001 Blomberg et al. 2012) and mixed model ( Lynch 1991 Housworth et al. 2004 Hadfield and Nakagawa 2010) approaches.

However, an important part of this story is often missed: Felsenstein also noted that the problem of nonindependence does not occur if “characters respond essentially instantaneously to natural selection in the current environment, so that phylogenetic inertia is essentially absent” (p. 6). Despite this comment, a common misunderstanding of his argument is that the problem inherent in a nonphylogenetic regression of phylogenetically structured data is that species are not independent. In fact, independence of data is not an assumption of standard (nonphylogenetic) linear regression at all. Rather, standard linear regression assumes that the errors of the fitted model are independent and identically distributed (i.i.d.). As a result, many applications of a “phylogenetic correction” seem to be missing the point ( Revell 2010 Hansen and Bartoszek 2012): if all of the phylogenetic signal in a data set is present in the predictor trait and the errors are i.i.d., then there is no need for any phylogenetic correction ( Rohlf 2001, 2006). (However, phylogenetic analyses are nearly always needed to determine this condition in the first place.)

We suggest that what made Felsenstein’s prima facie argument so compelling was that it appealed to biologists’ intuition that many large clades of organisms are just different in many potentially idiosyncratic ways ( Vermeij 2006). If the apparent association between traits found in a nonphylogenetic regression analysis is simply a result of these idiosyncratic differences between clades, then we would be inferring a relationship from unreplicated data ( Nee et al. 1996), irrespective of the purely statistical consideration of whether errors are i.i.d.

Here, we revisit Felsenstein’s worst-case scenario in order to demonstrate that IC and Phylogenetic Generalized Least Squares (PGLS) do not completely address the problem that we tend to think they do—these methods are still susceptible to singular evolutionary events. To demonstrate this, we add a slight twist to Felsenstein’s original example. First, we used a phylogeny with two clades, each of which is internally unresolved, similar to that of the 1985 article. We emphasize that the only phylogenetic structure is that stemming from the deepest split. We then simulated two traits under independent BM processes, each with an evolutionary rate ( ⁠|\$sigma^2\$|⁠ ) of 1. However, at some point on a stem branch of one of the two clades we introduce a singular evolutionary “event”—i.e., a dramatic shift in a lineage’s phenotype—drawn from a multivariate normal distribution with uncorrelated divergences and equal variances that are a scalar multiple of |\$sigma^2\$|⁠ . The resulting distribution of the data suggests a situation very similar to Felsenstein’s worst-case scenario—and what we suspect is the type of problem envisioned by most biologists when they warn their students of the dangers of ignoring phylogeny.

One would hope that our tools for “correcting for phylogeny” would recognize that the apparently strong relationship between the two traits in our example was driven by only a single contrast. However, this is not the case. That single contrast results in a very high-leverage statistical outlier that drives significance as the size of the shift increases ( Fig. 2). We can repeat the same exercise with more phylogenetically structured data (where the two clades of interest are fully bifurcating following a Yule process) and obtain identical results ( Fig. 2, see Supplementary Material available on Dryad at http://dx.doi.org/10.5061/dryad.p8066hd). This is disconcerting since our intuition suggests that we do not have compelling evidence for a causal relationship between these two traits (i.e., there is very little reason for us to believe from this correlation alone that one trait is an adaptation to the other).

How can we formulate a better set of models that can account for what our intuition tells us is a dangerous situation for causal inference? We can do so by including another phylogenetically plausible model: that trait correlations result from a single random shift, drawn from a different distribution than the one used to model trait evolution across the rest of the branches.

Let us consider a situation quite distinct from Felsenstein’s multivariate BM (mvBM) scenario. Here traits do not evolve by mvBM, but rather undergo a shift at a single point (perhaps an ancient dispersal event where one clade invaded a new environment). In such a scenario, we only need to consider the phylogeny in as much as a given species exists on either side of the event in question. We can then erect two statistical models: a linear regression model and a singular event model.

Alternatively, |\$X\$| and |\$Y\$| may not be related to one another at all. Rather, they may be the products of singular random evolutionary events denoted by |\$E1\$|⁠ , and |\$E2\$|⁠ , that happened to occur on the branch separating two clades.

The linear regression and singular event models lead to potentially very different distributions of trait data at the tips. For example, under the singular event model, the distribution of Y is conditionally independent of X after accounting for |\$L_, eta_Y, eta_\$| —a testable empirical prediction that will often result in these two models being easily distinguishable with model selection. But failing to consider the singular event model as a possibility is a problem: even for the simple case of two continuous traits, we have shown how easily data simulated under the singular event model can result in highly significant regressions for OLS, PGLS, and IC regressions, regardless if the errors are simulated as independent or phylogenetically correlated with respect to the model and phylogeny. We also note that estimating a |\$lambda\$| transformation for the errors ( Pagel 1999 Freckleton et al. 2002) will not rescue the analysis the estimated value of |\$lambda\$| will lie between 0 and 1 and we have found both these more extreme cases (OLS and IC, respectively) to be susceptible.

One might argue that the situation we describe is the violation of the assumption of a BM model of evolution—and this would, of course, be correct (see also Maddison and FitzJohn 2015). Indeed, for decades it has been common practice (but unfortunately, not universally so) to test whether contrasts are i.i.d. after conducting an analysis using IC ( Garland et al. 1992 Purvis and Rambaut 1995 Slater and Pennell 2013 Pennell et al. 2015) and many researchers have followed Jones and Purvis (1997) in dropping outlying contrasts from regressions. Felsenstein recognized this particular vulnerability in his method and correctly predicted that the underlying model was an “obvious point for future development” (p. 14). While today we have a much wider range of comparative models to choose from including some that allow for adaptive shifts, most continuous trait models are Gaussian (e.g., Pagel 1999 Blomberg et al. 2003 Butler and King 2004 O’Meara et al. 2006 Eastman et al. 2011 Beaulieu et al. 2012 Uyeda and Harmon 2014) and do not accommodate abrupt, discontinuous shifts in phenotypes. It is only recently that alternative classes of models have been considered ( Landis et al. 2012 Elliot and Mooers 2014 Schraiber and Landis 2015 Blomberg 2017 Boucher et al. 2017 Duchen et al. 2017). Whether or not these other types of models can sufficiently account for rare, singular events will be examined in the next section.

Nevertheless, our primary point here is to suggest that the phenomenon that made Felsenstein’s argument so intuitive is not the violation of i.i.d. errors but rather the biologically intuitive realization that unreplicated differences colocalized on a single branch provide only weak evidence of a causal relationship between traits. Furthermore, models that actually describe such scenarios—like our “singular events” model—are rarely considered in comparative analyses. Admittedly, fitting such models to biologically realistic cases more complex than Felsenstein’s scenario will require estimating the location and number of events and we therefore view our “singular events” model as primarily an illustrative alternative solution to Felsenstein’s thought experiment. Nevertheless, the example illustrates that the phylogeny imposes a challenge to the inference of meaningful associations between traits not because it renders errors nonindependent, but because the structure of the phylogeny allows for ancient, potentially unknowable causal factors (which may be few or even singular) to drive widespread associations between traits. Evaluating the validity of these associations as evidence for a meaningful relationship, even in the case of continuous traits, is precisely the unresolved challenge identified by Maddison and FitzJohn ( 2015) in the case of discrete character correlations (as we will further elaborate in Case Study III).

## The comparative method in conservation biology

The phylogenetic comparative approach is a statistical method for analyzing correlations between traits across species. Whilst it has revolutionized evolutionary biology, can it work for conservation biology? Although it is correlative, advocates of the comparative method hope that it will reveal general mechanisms in conservation, provide shortcuts for prioritizing conservation research, and enable us to predict which species will experience (or create) problems in the future. Here, we ask whether these stated management goals are being achieved. We conclude that comparative methods are stimulating research into the ecological mechanisms underlying conservation, and are providing information for preemptive screening of problem species. But comparative analyses of extinction risk to date have tended to be too broad in scope to provide shortcuts to conserving particular endangered species. Correlates of vulnerability to conservation problems are often taxon, region and threat specific, so models must be narrowly focused to be of maximum practical use.