credibleShiftSet {BAMMtools} | R Documentation |
Credible set of macroevolutionary rate shift configurations from
BAMM
results
Description
Computes the 95% (or any other %) credible set of
macroevolutionary rate shift configurations from a bammdata
object. These results can be analyzed further and/or plotted.
Usage
credibleShiftSet(
ephy,
expectedNumberOfShifts,
threshold = 5,
set.limit = 0.95,
...
)
Arguments
ephy |
An object of class |
expectedNumberOfShifts |
The expected number of rate shifts under the prior. |
threshold |
The marginal posterior-to-prior odds ratio for a rate shift on a specific branch, used to distinguish core and non-core shifts. |
set.limit |
The desired limit to the credible set. A value of 0.95 will return the 95% credible set of shift configurations. |
... |
Other arguments to |
Details
Computes the 95% credible set (or XX% credible set, depending
on set.limit
) of diversification shift configurations sampled
using BAMM
. This is analogous to a credible set of phylogenetic
tree topologies from a Bayesian phylogenetic analysis.
To understand how this calculation is performed, one must first
distinguish between "core" and "non-core" rate shifts. A "core shift"
is a rate shift with a marginal probability that is substantially
elevated above the probability expected on the basis of the prior
alone. With BAMM
, every branch in a phylogenetic tree is
associated with some non-zero prior probability of a rate shift.
Typically this is a very low per-branch shift probability (this prior
is determined by the value of the "poissonRatePrior" parameter in a
BAMM
analysis).
If we compute distinct shift configurations with every sampled shift (including those shifts with very low marginal probabilities), the number of distinct shift configurations will be overwhelmingly high. However, most of these configurations include shifts with marginal probabilities that are expected even under the prior alone. Hence, using these shifts to identify distinct shift configurations simply generates noise and isn't particularly useful.
The solution adopted in BAMMtools
is, for each branch in the
phylogeny, to compute both the posterior and prior probabilities of a
rate shift occurring. The ratio of these probabilities is a
branch-specific marginal odds ratio: it is the marginal posterior
frequency of one or more rate shifts normalized by the corresponding
prior probability. Hence, any branch with a marginal odds ratio of
1.0 is one where the observed (posterior) odds of a rate shift are no
different from the prior odds. A value of 10 implies that the
posterior probability is 10 times the prior probability.
The user of credibleShiftSet
must specify a threshold
argument. This is simply a cutoff value for identifying "important"
shifts for the purposes of identifying distinct shift configurations.
This does not imply that it is identifying "significant" shifts. See
the online documentation on this topic available at
http://bamm-project.org for more information. If you specify
threshold = 5
as an argument to credibleShiftSet
, the
function will ignore all branches with marginal odds ratios less than
5 during the enumeration of topologically distinct shift
configurations. Only shifts with marginal odds ratios greater than or
equal to threshold
will be treated as core shifts for the
purposes of identifying distinct shift configurations.
For each shift configuration in the credible set, this function will
compute the average diversification parameters. For example, the most
frequent shift configuration (the maximum a posteriori shift
configuration) might have 3 shifts, and 150 samples from your
posterior (within the bammdata
object) might show this shift
configuration. However, the parameters associated with each of these
shift configurations (the actual evolutionary rate parameters) might
be different for every sample. This function returns the mean set of
rate parameters for each shift configuration, averaging over all
samples from the posterior that can be assigned to a particular shift
configuration.
Value
A class credibleshiftset
object with many components. Most
components are an ordered list of length L, where L is the number of
distinct shift configurations in the credible set. The first list
element in each case corresponds to the shift configuration with the
maximum a posteriori probability.
frequency |
A vector of frequencies of shift configurations,
including those that account for |
shiftnodes |
A list of the "core" rate shifts (marginal
probability > threshold) that occurred in each distinct shift
configuration in the credible set. The i'th vector from this list
gives the core shift nodes for the i'th shift configuration. They
are sorted by frequency, so |
indices |
A list of vectors containing the indices of samples in
the |
cumulative |
Like |
threshold |
The marginal posterior-to-prior odds for rate shifts on branches used during enumeration of distinct shift configurations. |
number.distinct |
Number of distinct shift configurations in the credible set. |
set.limit |
which credible set is this (0.9, 0.95, etc)? |
coreshifts |
A vector of node numbers corresponding to the core
shifts. All of these nodes have a Bayes factor of at least
|
In addition, a number of components that are defined similarly in
class phylo
or class bammdata
objects:
edge |
See documentation for class |
Nnode |
See documentation for class |
tip.label |
See documentation for class |
edge.length |
See documentation for class |
begin |
The beginning time of each branch in absolute time (the root is set to time zero) |
end |
The ending time of each branch in absolute time. |
numberEvents |
An integer vector with the number of core events
contained in the |
eventData |
A list of dataframes. Each element holds the average
rate and location parameters for all samples from the posterior
that were assigned to a particular distinct shift configuration.
Each row in a dataframe holds the data for a single event. Data
associated with an event are: |
eventVectors |
A list of integer vectors. Each element is for a
single shift configuration in the posterior. For each branch in
the |
eventBranchSegs |
A list of matrices. Each element of the list is
a single distinct shift configuration. Each matrix has four
columns: |
tipStates |
A list of integer vectors. Each element is a single distinct shift configuration. For each tip the index of the event that occurs along the branch subtending the tip. Tips are ordered increasing here and elsewhere. |
tipLambda |
A list of numeric vectors. Each element is a single distinct shift configuration. For each tip is the average rate of speciation or trait evolution at the end of the terminal branch subtending that tip (averaged over all samples that are assignable to this particular distinct shift configuration). |
tipMu |
A list of numeric vectors. Each element is a single
distinct shift configuration. For each tip the rate of extinction
at the end of the terminal branch subtending that tip. Meaningless
if working with |
type |
A character string. Either "diversification" or "trait"
depending on your |
downseq |
An integer vector holding the nodes of |
lastvisit |
An integer vector giving the index of the last node
visited by the node in the corresponding position in
|
Author(s)
Dan Rabosky
References
See Also
distinctShiftConfigurations
,
plot.bammshifts
, summary.credibleshiftset
,
plot.credibleshiftset
,
getBranchShiftPriors
Examples
data(events.whales, whales)
ed <- getEventData(whales, events.whales, burnin=0.1, nsamples=500)
cset <- credibleShiftSet(ed, expectedNumberOfShifts = 1, threshold = 5)
# Here is the total number of samples in the posterior:
length(ed$eventData)
# And here is the number of distinct shift configurations:
cset$number.distinct
# here is the summary statistics:
summary(cset)
# Accessing the raw frequency vector for the credible set:
cset$frequency
#The cumulative frequencies:
cset$cumulative
# The first element is the shift configuration with the maximum
# a posteriori probability. We can identify all the samples from
# posterior that show this shift configuration:
cset$indices[[1]]
# Now we can plot the credible set:
plot(cset, plotmax=4)