simFossilRecord {paleotree} | R Documentation |
Full-Scale Simulations of the Fossil Record with Birth, Death and Sampling of Morphotaxa
Description
A complete birth-death-sampling branching simulator that captures morphological-taxon identity of lineages, as is typically discussed in models of paleontological data. This function allows for the use of precise point constraints to condition simulation run acceptance and can interpret complex character strings given as rate values for use in modeling complex processes of diversification and sampling.
Usage
simFossilRecord(
p,
q,
r = 0,
anag.rate = 0,
prop.bifurc = 0,
prop.cryptic = 0,
modern.samp.prob = 1,
startTaxa = 1,
nruns = 1,
maxAttempts = Inf,
totalTime = c(0, 1000),
nTotalTaxa = c(1, 1000),
nExtant = c(0, 1000),
nSamp = c(0, 1000),
returnAllRuns = FALSE,
tolerance = 10^-6,
maxStepTime = 0.01,
shiftRoot4TimeSlice = "withExtantOnly",
count.cryptic = FALSE,
negRatesAsZero = TRUE,
print.runs = FALSE,
sortNames = FALSE,
plot = FALSE
)
Arguments
p , q , r , anag.rate |
These parameters control the instantaneous
('per-capita') rates of branching, extinction,
sampling and anagenesis, respectively. These can be given as a number
equal to or greater than zero, or as a
character string which will be interpreted as an algebraic equation.
These equations can make use of three
quantities which will/may change throughout the simulation:
the standing richness is By default, the rates |
prop.cryptic , prop.bifurc |
These parameters control
(respectively) the proportion of branching events that have
morphological differentiation, versus those that are cryptic
( |
modern.samp.prob |
The probability that a taxon is sampled at the modern time
(or, for |
startTaxa |
Number of initial taxa to begin a simulation with. All will have the simulation start date listed as their time of origination. |
nruns |
Number of simulation datasets to accept, save and output.
If |
maxAttempts |
Number of simulation attempts allowed before the simulation process
is halted with an error message. Default is |
totalTime , nTotalTaxa , nExtant , nSamp |
These arguments represent
stopping and acceptance conditions for simulation runs. They are
respectively |
returnAllRuns |
If |
tolerance |
A small number which defines a tiny interval for
the sake of placing run-sampling dates before events and
for use in determining whether a taxon is extant in |
maxStepTime |
When rates are time-dependent (i.e. when
parameters 'D' or 'T' are used in equations input for
one of the four rate arguments), then protocol used by
|
shiftRoot4TimeSlice |
Should the dating of events be shifted, so that the
date given for |
count.cryptic |
If |
negRatesAsZero |
A logical. Should rates calculated as a
negative number cause the simulation to fail
with an error message ( |
print.runs |
If |
sortNames |
If |
plot |
If |
Details
simFossilRecord
simulates a birth-death-sampling branching process
(ala Foote, 1997, 2000; Stadler, 2009) in which lineages of organisms may branch,
go extinct or be sampled at discrete points within a continuous time-interval.
The occurrence of these discrete events are modeled as stochastic
Poisson process, described by some set of instantaneous rates.
This model is ultimately based on the birth-death model (Kendall, 1948; Nee, 2006),
which is widely implemented in many R packages. Unlike other such typical branching
simulators, this function enmeshes the lineage units within explicit models of how
lineages are morphologically differentiated (Bapst, 2013).
This is key to allow comparison to datasets from the fossil record,
as morphotaxa are the basic units of paleontological diversity estimates
and phylogenetic analyses.
Models of Morphological Differentiation and Branching (Cladogenesis and Anagenesis)
These models of morphological differentiation do not involve the direct simulation of morphological traits. Instead, morphotaxon identity is used as a proxy of the distinctiveness of lineages on morphological grounds, as if there was some hypothetical paleontologist attempting to taxonomically sort collections of specimens of these simulated lineages. Two lineages are either identical, and thus share the same morphotaxon identity, or they are distinct, and thus have separate morphotaxon identities. Morphological differentiation is assumed to be an instantaneous process for the purposes of this model, such that no intermediate could be uncovered.
Specifically, simFossilRecord
allows for three types of
binary branching events, referred to here as under the umbrella
term of 'cladogenesis': 'budding cladogenesis',
'bifurcating cladogenesis', and 'cryptic cladogenesis',
as well as for a fourth non-branching event-type, 'anagenesis'.
See Wagner and Erwin, 1995; Foote, 1996; and Bapst, 2013, for further details.
Budding, bifurcation and cryptic cladogenetic events all
share in common that a single geneological
lineage splits into two descendant lineages, but
differ in the morphological differentiation
of these child lineages relative to their parent.
Under budding cladogenesis, only one of the
child lineages becomes morphologically distinguishable
from the parent, and thus the ancestral
morphotaxon persists through the branching event
as the child lineage that does not
differentiate. Under bifurcating cladogenesis, both child lineages
become immediately distinct from the ancestor,
and thus two new morphotaxa appear while the
ancestor terminates in an event known as 'pseudoextinction'.
Cryptic cladogenesis has no morphological differentiation:
both child lineages are presumed to be indistinct from
the ancestor and from each other, which means a hypothetical paleontologist
would not observe that branching had occurred at all.
Anagenesis is morphological differentiation independent of
any branching, such that a morphotaxon instantaneously transitions to a
new morphotaxon identity, resulting in the pseudoextinction of
the ancestral morphotaxon and the immediate 'pseudospeciation'
of the child morphotaxon.
In anagenesis, the ancestral morphotaxon and descendant morphotaxon
do not overlap in time at all, as modeled here
(contra to the models described by Ezard et al., 2012).
For ease of following these cryptic lineages, cryptic cladogenetic
events are treated in terms of data structure similarly to
budding cladogenetic events, with one child lineage treated
as a persistence of the ancestral lineage, and the other
as a new morphologically indistinguishable lineage.
This model of cryptic cladogenesis is ultimately
based on the hierarchical birth-death model used by
many authors for modeling patterns across paraphyletic
higher taxa and the lower taxon units within them
(e.g. Patzkowsky, 1995; Foote, 2012).
The occurrence of the various models is controlled by
multiple arguments of simFossilRecord
.
The overall instantaneous rate of branching (cladogenesis) is
controlled by argument p
, and the proportion of
each type of cladogenesis controlled by arguments
prop.bifurc
and prop.cryptic
.
prop.cryptic
controls the overall probability that
any branching event will be cryptic versus
involving any morphological differentiation (budding or bifurcating).
If prop.cryptic = 1
, all branching events will be cryptic cladogenesis,
and if prop.cryptic = 0
, all branching events will
involve morphological differentiation and none will be cryptic.
prop.bifurc
controls how many branching events that
involve morphological differentiation (i.e. the inverse of prop.cryptic
)
are bifurcating, as opposed to budding cladogenesis.
If prop.bifurc = 1
, all morphologically-differentiating branching events will
be bifurcating cladogenesis, and if prop.bifurc = 0
,
all morphologically-differentiating branching events will be budding cladogenesis.
Thus, for example, the probability of a given cladogenesis event
being budding is given by:
Prob(budding cladogenesis at a branching event) = (1 - prop.cryptic) * (1 - prop.bifurc)
By default, prop.cryptic = 0
and prop.bifurc = 0
,
so all branching events will be instances of budding cladogenesis
in analyses that use default setting.
Anagenesis is completely independent of these, controlled as its
own Poisson process with an instantaneous rated
defined by the argument anag.rate
.
By default, this rate is set to zero and thus there is no
anagenetic events without user intervention.
Stopping Conditions and Acceptance Criteria for Simulations
How forward-time simulations are generated, halted and whether they are accepted
or not for output is a critical component of simulation design.
Most uses of simFossilRecord
will involve iteratively
generating and analyzing multiple simulation runs. Runs are only
accepted for output if they meet the conditioning criteria defined
in the arguments, either matching point constraints or falling
within range constraints. However, this requires separating the processes of
halting simulation runs and accepting a run for output, particularly to avoid bias
related to statistical sampling issues.
Hartmann et al. (2011) recently discovered a potential statistical artifact
when branching simulations are conditioned on some number of taxa.
Previously within paleotree
, this was accounted for in
the deprecated function simFossilTaxa
by a
complex arrangement of minimum and maximum constraints,
and an (incorrect) presumption that allowing simulations to
continue for a short distance after constraints were reached
would solve this statistical artifact. This strategy is not applied here.
Instead, simFossilRecord
applies the General Sampling Algorithm presented
by Hartmann et al. (or at least, a close variant). A simulation continues until
extinction or some maximum time-constraint is reached, evaluated for intervals
that match the set run conditions (e.g. nExtant
, nTotalTime
) and, if some
interval or set of intervals matches the run conditions, a date is randomly sampled
from within this interval/intervals. The simulation is then cut at this date using
the timeSliceFossilRecord
function, and saved as an accepted run.
The simulation data is otherwise discarded and then a new simulation initiated
(therefore, at most, only one simulated dataset is accepted from one simulation run).
Thus, accepted simulations runs should reflect unbiased samples of evolutionary
histories that precisely match the input constraints, which can be very precise,
unlike how stopping and acceptance conditions were handled in the previous (deprecated)
simFossilTaxa
function. Of course, selecting very precise constraints that
are very unlikely or impossible given other model parameters may take considerable
computation time to find acceptable simulation runs, or effectively never find any
acceptable simulation runs.
On Time-Scale Used in Output
Dates given in the output are on an reversed absolute time-scale; i.e. time
decreases going from the past to the future, as is typical in paleontological
uses of time (as time before present) and as for most function in package
paleotree
. The endpoints of the time-scale are decided by details of the
simulation and can be modified by several arguments.
By default (with shiftRoot4TimeSlice =
"withExtantOnly"
),
any simulation run that is accepted with extant taxa will have zero as the
end-time (i.e. when those taxa are extant),
as zero is the typical time assigned to the modern day in empirical studies.
If a simulation ends with all taxa extinct, however, then instead the start-time
of a run (i.e. when the run initiates with starting taxa) will be maximum value
assigned to the conditioning argument totalTime
.
If shiftRoot4TimeSlice =
FALSE
, then the
start-time of the run will always be this maximum value for
totalTime
, and any extant taxa will stop at some time greater than zero.
Value
simFossilRecord
returns either a single object of class fossilRecordSimulation
or a list of multiple such objects, depending on whether nruns
was 1 or more.
If argument returnAllRuns
= TRUE
, a list composed of two sublists,
each of which contains 0 or more fossilRecordSimulation
objects. The
first sublist containing all the accepted simulations (i.e. all the simulations that
would have been returned if returnAllRuns
was FALSE
), and the second
sublist containing the final iteration of all rejected runs before they hit an
irreversible out-of-bounds condition (to wit, reaching the maximum totalTime
,
exceeding the maximum number of total taxa (nTotalTaxa
),
exceeding the maximum number of sampled taxa (nSamp
),
or total extinction of all lineages in the simulation).
An object of class fossilRecordSimulation
consists
of a list object composed of multiple
elements, each of which is data for 'one taxon'.
Each data element for each taxon is itself
a list, composed of two elements: the first describes
vital information about the taxon unit,
and the second describes the sampling times of each taxon.
The first element of the list (named $taxa.data
)
is a distinctive six-element
vector composed of numbers (some are nominally integers,
but not all, so all are stored
as double-precision integers) with the following field names:
taxon.id
The ID number of this particular taxon-unit.
ancestor.id
The ID number of the ancestral taxon-unit. The initial taxa in a simulation will be listed with
NA
as their ancestor.orig.time
True time of origination for a taxon-unit in absolute time.
ext.time
True time of extinction for a taxon-unit in absolute time. Extant taxa will be listed with an
ext.time
of the run-end time of the simulation run, which for simulations with extant taxa is 0 by default (but this may be modified using argumentshiftRoot4TimeSlice
).still.alive
Indicates whether a taxon-unit is 'still alive' or not: '1' indicates the taxon-unit is extant, '0' indicates the taxon-unit is extinct
looks.like
The ID number of the first morphotaxon in a dataset that 'looks like' this taxon-unit; i.e. belongs to the same multi-lineage cryptic complex. Taxa that are morphologically distinct from any previous lineage will have their
taxon.id
match theirlooks.like
. Thus, this column is rather uninformative unless cryptic cladogenesis occurred in a simulation.
The second element for each taxon-unit is a vector of sampling times, creatively
named $sampling.times
, with each value representing a data in absolute time
when that taxon was sampled in the simulated fossil record. If a taxon was never
sampled, this vector is an empty numeric vector of length = 0
.
As is typical for paleontological uses of absolute time, absolute time in these
simulations is always decreasing toward the modern; i.e. an absolute date of 50
means a point in time which is 50 time-units before the present-day, if the
present-day is zero (the default, but see argument shiftRoot4TimeSlice
).
Each individual element of a fossilRecordSimulation
list object
is named, generally of the form "t1"
and "t2"
,
where the number is the taxon.id
.
Cryptic taxa are instead named in the form of "t1.2"
and "t5.3"
,
where the first number is the taxon which they are a
cryptic descendant of (looks.like
) and the second number, after the period, is
the order of appearance of lineage units in that cryptic complex.
For example, for "t5.3"
, the first number is the taxon.id
and the second number communicates that this is the third lineage
to appear in this cryptic complex.
Author(s)
David W. Bapst, inspired by code written by Peter Smits.
References
Bapst, D. W. 2013. When Can Clades Be Potentially Resolved with Morphology? PLoS ONE 8(4):e62312.
Ezard, T. H. G., P. N. Pearson, T. Aze, and A. Purvis. 2012. The meaning of birth and death (in macroevolutionary birth-death models). Biology Letters 8(1):139-142.
Foote, M. 1996 On the Probability of Ancestors in the Fossil Record. Paleobiology 22(2):141–151.
Foote, M. 1997. Estimating Taxonomic Durations and Preservation Probability. Paleobiology 23(3):278-300.
Foote, M. 2000. Origination and extinction components of taxonomic diversity: general problems. Pp. 74-102. In D. H. Erwin, and S. L. Wing, eds. Deep Time: Paleobiology's Perspective. The Paleontological Society, Lawrence, Kansas.
Foote, M. 2012. Evolutionary dynamics of taxonomic structure. Biology Letters 8(1):135-138.
Gavryushkina, A., D. Welch, T. Stadler, and A. J. Drummond. 2014. Bayesian Inference of Sampled Ancestor Trees for Epidemiology and Fossil Calibration. PLoS Computational Biology 10(12):e1003919.
Hartmann, K., D. Wong, and T. Stadler. 2010 Sampling Trees from Evolutionary Models. Systematic Biology 59(4):465–476.
Heath, T. A., J. P. Huelsenbeck, and T. Stadler. 2014. The fossilized birth-death process for coherent calibration of divergence-time estimates. Proceedings of the National Academy of Sciences 111(29):E2957-E2966.
Kendall, D. G. 1948 On the Generalized "Birth-and-Death" Process. The Annals of Mathematical Statistics 19(1):1–15.
Nee, S. 2006 Birth-Death Models in Macroevolution. Annual Review of Ecology, Evolution, and Systematics 37(1):1–17.
Patzkowsky, M. E. 1995. A Hierarchical Branching Model of Evolutionary Radiations. Paleobiology 21(4):440-460.
Solow, A. R., and W. Smith. 1997 On Fossil Preservation and the Stratigraphic Ranges of Taxa. Paleobiology 23(3):271–277.
Stadler, T. 2009. On incomplete sampling under birth-death models and connections to the sampling-based coalescent. Journal of Theoretical Biology 261(1):58-66.
Wagner, P. J., and D. H. Erwin. 1995. Phylogenetic patterns as tests of speciation models. Pp. 87-122. In D. H. Erwin, and R. L. Anstey, eds. New approaches to speciation in the fossil record. Columbia University Press, New York.
See Also
This function essentially replaces and adds to all functionality of the
deprecated paleotree
functions simFossilTaxa
, simFossilTaxaSRCond
,
simPaleoTrees
, as well as the combined used of simFossilTaxa
and sampleRanges
for most models of sampling.
Examples
set.seed(2)
# quick birth-death-sampling run
# with 1 run, 50 taxa
record <- simFossilRecord(
p = 0.1,
q = 0.1,
r = 0.1,
nruns = 1,
nTotalTaxa = 50,
plot = TRUE
)
################
# Now let's examine with multiple runs of simulations
# example of repeated pure birth simulations over 50 time-units
records <- simFossilRecord(
p = 0.1,
q = 0,
nruns = 10,
totalTime = 50,
plot = TRUE
)
# plot multiple diversity curves on a log scale
records <- lapply(records,
fossilRecord2fossilTaxa)
multiDiv(records,
plotMultCurves = TRUE,
plotLogRich = TRUE
)
# histogram of total number of taxa
hist(sapply(records, nrow))
##############################################
# example of repeated birth-death-sampling
# simulations over 50 time-units
records <- simFossilRecord(
p = 0.1,
q = 0.1,
r = 0.1,
nruns = 10,
totalTime = 50,
plot = TRUE)
records <- lapply(records,
fossilRecord2fossilTaxa)
multiDiv(records,
plotMultCurves = TRUE)
# like above...
# but conditioned instead on having 10 extant taxa
# between 1 and 100 time-units
set.seed(4)
records <- simFossilRecord(
p = 0.1,
q = 0.1,
r = 0.1,
nruns = 10,
totalTime = c(1,300),
nExtant = 10,
plot = TRUE
)
records <- lapply(records,
fossilRecord2fossilTaxa)
multiDiv(records,
plotMultCurves = TRUE
)
################################################
# How probable were the runs I accepted?
# The effect of conditions
set.seed(1)
# Let's look at an example of a birth-death process
# with high extinction relative to branching
# notes:
# a) use default run conditions (barely any conditioning)
# b) use print.runs to look at acceptance probability
records <- simFossilRecord(
p = 0.1,
q = 0.8,
nruns = 10,
print.runs = TRUE,
plot = TRUE
)
# 10 runs accepted from a total of 10 !
# now let's give much more stringent run conditions
# require 3 extant taxa at minimum, 5 taxa total minimum
records <- simFossilRecord(
p = 0.1,
q = 0.8,
nruns = 10,
nExtant = c(3,100),
nTotalTaxa = c(5,100),
print.runs = TRUE,
plot = TRUE
)
# thousands of simulations to just obtail 10 accepable runs!
# most ended in extinction before minimums were hit
# beware analysis of simulated where acceptance conditions
# are too stringent: your data will be a 'special case'
# of the simulation parameters
# it will also take you a long time to generate reasonable
# numbers of replicates for whatever analysis you are doing
# TLDR: You should look at print.runs = TRUE
##################################################################
# Using the rate equation-input for complex diversification models
# First up... Diversity Dependent Models!
# Let's try Diversity-Dependent Branching over 50 time-units
# first, let's write the rate equation
# We'll use the diversity dependent rate equation model
# from Ettienne et al. 2012 as an example here
# Under this equation, p = q at carrying capacity K
# Many others are possible!
# Note that we don't need to use max(0,rate) as negative rates
# are converted to zero by default, as controlled by
# the argument negRatesAsZero
# From Ettiene et al.
# lambda = lambda0 - (lambda0 - mu)*(n/K)
# lambda and mu are branching rate and extinction rate
# lambda and mu == p and q in paleotree (i.e. Foote convention)
# lambda0 is the branching rate at richness = 0
# K is the carrying capacity
# n is the richness
# 'N' is the algebra symbol for standing taxonomic richness
# for simFossilRecord's simulation capabilities
# also branching rate cannot reference extinction rate
# we'll have to set lambda0, mu and K in the rate equation directly
lambda0 <- 0.3 # branching rate at 0 richness in Ltu
K <- 40 # carrying capacity
mu <- 0.1 # extinction rate will 0.1 Ltu ( = 1/3 of lambda0 )
# technically, mu here represents the lambda at richness = K
# i.e. lambdaK
# Ettienne et al. are just implicitly saying that the carrying capacity
# is the richness at which lambda == mu
# construct the equation programmatically using paste0
branchingRateEq <- paste0(lambda0, "-(", lambda0, "-", mu, ")*(N/", K, ")")
# and take a look at it...
branchingRateEq
# its a thing of beauty, folks!
# now let's try it
records <- simFossilRecord(
p = branchingRateEq,
q = mu,
nruns = 3,
totalTime = 100,
plot = TRUE,
print.runs = TRUE
)
records <- lapply(records,
fossilRecord2fossilTaxa)
multiDiv(records,
plotMultCurves = TRUE)
# those are some happy little diversity plateaus!
# now let's do diversity-dependent extinction
# let's slightly modify the model from Ettiene et al.
# mu = mu0 + (mu0 - muK)*(n/K)
mu0 <- 0.001 # mu at n = 0
muK <- 0.1 # mu at n = K (should be equal to lambda at K)
K <- 40 # carrying capacity (like above)
lambda <- muK # equal to muK
# construct the equation programmatically using paste0
extRateEq <- paste0(mu0, "-(", mu0, "-", muK, ")*(N/" ,K, ")")
extRateEq
# now let's try it
records <- simFossilRecord(
p = lambda,
q = extRateEq,
nruns = 3,
totalTime = 100,
plot = TRUE,
print.runs = TRUE)
records <- lapply(records,
fossilRecord2fossilTaxa)
multiDiv(records,
plotMultCurves = TRUE)
# these plateaus looks a little more spiky
#( maybe there is more turnover at K? )
# also, it took a longer for the rapid rise to occur
##########################################################
# Now let's try an example with time-dependent origination
# and extinction constrained to equal origination
# Note! Use of time-dependent parameters "D" and "T" may
# result in slower than normal simulation run times
# as the time-scale has to be discretized; see
# info for argument maxTimeStep above
# First, let's define a time-dependent rate equation
# "T" is the symbol for time passed
timeEquation <- "0.4-(0.007*T)"
#in this equation, 0.4 is the rate at time = 0
# and it will decrease by 0.007 with every time-unit
# at time = 50, the final rate will be 0.05
# We can easily make it so extinction
# is always equal to branching rate
# "P" is the algebraic equivalent for
# "branching rate" in simFossilRecord
# now let's try it
records <- simFossilRecord(
p = timeEquation,
q = "P",
nruns = 3,
totalTime = 50,
plot = TRUE,
print.runs = TRUE
)
records <- lapply(records,
fossilRecord2fossilTaxa)
multiDiv(records,
plotMultCurves = TRUE)
# high variability that seems to then smooth out as turnover decreases
# And duration what about duration-dependent processes?
# let's do a duration-dep extinction equation:
durDepExt <- "0.01+(0.01*D)"
# okay, let's take it for a spin
records <- simFossilRecord(
p = 0.1,
q = durDepExt,
nruns = 3,
totalTime = 50,
plot = TRUE,
print.runs = TRUE
)
records <- lapply(records,
fossilRecord2fossilTaxa)
multiDiv(records,
plotMultCurves = TRUE)
# creates runs full of short lived taxa
# Some more stuff to do with rate formulae!
# The formulae input method for rates allows
# for the rate to be a random variable
# For example, we could constantly redraw
# the branching rate from an exponential
record <- simFossilRecord(
p = "rexp(n = 1,rate = 10)",
q = 0.1, r = 0.1, nruns = 1,
nTotalTaxa = 50, plot = TRUE)
# Setting up specific time-variable rates can be laborious though
# e.g. one rate during this 10 unit interval,
# another during this interval, etc
# The problem is setting this up within a fixed function
#############################################################
# Worked Example
# What if we want to draw a new rate from a
# lognormal distribution every 10 time units?
# Need to randomly draw these rates *before* running simFossilTaxa
# This means also that we will need to individually do each simFossilTaxa run
# since the rates are drawn outside of simFossilTaxa
# Get some reasonable log normal rates:
rates <- 0.1+rlnorm(100,meanlog = 1,sdlog = 1)/100
# Now paste it into a formulae that describes a function that
# will change the rate output every 10 time units
rateEquation <- paste0(
"c(",
paste0(rates,collapse = ","),
")[1+(T%/%10)]"
)
# and let's run it
record <- simFossilRecord(
p = rateEquation,
q = 0.1,
r = 0.1,
nruns = 1,
totalTime = c(30,40),
plot = TRUE
)
#####################################################################
# Speciation Modes
# Some examples of varying the 'speciation modes' in simFossilRecord
# The default is pure budding cladogenesis
# anag.rate = prop.bifurc = prop.cryptic = 0
# let's just set those for the moment anyway
record <- simFossilRecord(p = 0.1, q = 0.1, r = 0.1,
anag.rate = 0, prop.bifurc = 0, prop.cryptic = 0,
nruns = 1, nTotalTaxa = c(20,30) ,nExtant = 0, plot = TRUE)
#convert and plot phylogeny
# note this will not reflect the 'budding' pattern
# branching events will just appear like bifurcation
# its a typical convention for phylogeny plotting
converted <- fossilRecord2fossilTaxa(record)
tree <- taxa2phylo(converted,plot = TRUE)
#now, an example of pure bifurcation
record <- simFossilRecord(p = 0.1, q = 0.1, r = 0.1,
anag.rate = 0, prop.bifurc = 1, prop.cryptic = 0,
nruns = 1, nTotalTaxa = c(20,30) ,nExtant = 0)
tree <- taxa2phylo(fossilRecord2fossilTaxa(record),plot = TRUE)
# all the short branches are due to ancestors that terminate
# via pseudoextinction at bifurcation events
# an example with anagenesis = branching
record <- simFossilRecord(
p = 0.1, q = 0.1, r = 0.1,
anag.rate = 0.1,
prop.bifurc = 0,
prop.cryptic = 0,
nruns = 1,
nTotalTaxa = c(20,30),
nExtant = 0
)
tree <- taxa2phylo(fossilRecord2fossilTaxa(record),
plot = TRUE)
# lots of pseudoextinction
# an example with anagenesis, pure bifurcation
record <- simFossilRecord(
p = 0.1, q = 0.1, r = 0.1,
anag.rate = 0.1,
prop.bifurc = 1,
prop.cryptic = 0,
nruns = 1,
nTotalTaxa = c(20,30) ,
nExtant = 0
)
tree <- taxa2phylo(
fossilRecord2fossilTaxa(record),
plot = TRUE
)
# lots and lots of pseudoextinction
# an example with half cryptic speciation
record <- simFossilRecord(
p = 0.1,
q = 0.1,
r = 0.1,
anag.rate = 0,
prop.bifurc = 0,
prop.cryptic = 0.5,
nruns = 1,
nTotalTaxa = c(20,30),
nExtant = 0
)
tree <- taxa2phylo(
fossilRecord2fossilTaxa(record),
plot = TRUE)
# notice that the tree has many more than the maximum of 30 tips:
# that's because the cryptic taxa are not counted as
# separate taxa by default, as controlled by count.cryptic
# an example with anagenesis, bifurcation, cryptic speciation
record <- simFossilRecord(
p = 0.1, q = 0.1, r = 0.1,
anag.rate = 0.1,
prop.bifurc = 0.5,
prop.cryptic = 0.5,
nruns = 1,
nTotalTaxa = c(20,30),
nExtant = 0
)
tree <- taxa2phylo(
fossilRecord2fossilTaxa(record),
plot = TRUE)
# note in this case, 50% of branching is cryptic
# 25% is bifurcation, 25% is budding
# an example with anagenesis, pure cryptic speciation
# morphotaxon identity will thus be entirely indep of branching!
# I wonder if this is what is really going on, sometimes...
record <- simFossilRecord(
p = 0.1, q = 0.1, r = 0.1,
anag.rate = 0.1,
prop.bifurc = 0,
prop.cryptic = 1,
nruns = 1,
nTotalTaxa = c(20,30),
nExtant = 0
)
tree <- taxa2phylo(fossilRecord2fossilTaxa(record),
plot = TRUE)
# merging cryptic taxa when all speciation is cryptic
set.seed(1)
record <- simFossilRecord(
p = 0.1,
q = 0.1,
r = 0.1,
prop.crypt = 1,
totalTime = 50,
plot = TRUE
)
# there looks like there is only a single taxon, but...
length(record)
#the above is the *actual* number of cryptic lineages
#########################################################################
# playing with count.cryptic with simulations of pure cryptic speciation
# what if we had fossil records with NO morphological differentiation?
# We can choose to condition on total morphologically-distinguishable taxa
# or total taxa including cryptic taxa with count.cryptic = FALSE
# an example with pure cryptic speciation with count.cryptic = TRUE
record <- simFossilRecord(
p = 0.1, q = 0.1, r = 0.1,
anag.rate = 0,
prop.bifurc = 0,
prop.cryptic = 1,
nruns = 1,
totalTime = 50,
nTotalTaxa = c(10,100),
count.cryptic = TRUE
)
tree <- taxa2phylo(fossilRecord2fossilTaxa(record))
# plot the tree
plot(tree)
axisPhylo()
# notice how the tip labels indicate all are the same morphotaxon?
#################
# an example with pure cryptic speciation with count.cryptic = FALSE
# Need to be careful with this!
# We'll have to replace the # of taxa constraints with a time constraint
# or else the count.cryptic = FALSE simulation will never end!
record <- simFossilRecord(
p = 0.1, q = 0.1, r = 0.1,
anag.rate = 0,
prop.bifurc = 0,
prop.cryptic = 1,
nruns = 1,
totalTime = 50,
count.cryptic = FALSE
)
tree <- taxa2phylo(fossilRecord2fossilTaxa(record))
# plot it
plot(tree)
axisPhylo()
###########################################
# let's look at numbers of taxa returned when varying count.cryptic
# with prop.cryptic = 0.5
# Count Cryptic Example Number One
# simple simulation going for 50 total taxa
# first, count.cryptic = FALSE (default)
record <- simFossilRecord(
p = 0.1,
q = 0.1,
r = 0.1,
anag.rate = 0,
prop.bifurc = 0,
prop.cryptic = 0.5,
nruns = 1,
nTotalTaxa = 50,
count.cryptic = FALSE
)
taxa <- fossilRecord2fossilTaxa(record)
#### Count the taxa/lineages !
# number of lineages (inc. cryptic)
nrow(taxa)
# number of morph-distinguishable taxa
length(unique(taxa[,6]))
###################
# Count Cryptic Example Number Two
# Now let's try with count.cryptic = TRUE
record <- simFossilRecord(
p = 0.1,
q = 0.1,
r = 0.1,
anag.rate = 0,
prop.bifurc = 0,
prop.cryptic = 0.5,
nruns = 1,
nTotalTaxa = 50,
count.cryptic = TRUE
)
taxa <- fossilRecord2fossilTaxa(record)
### Count the taxa/lineages !
# number of lineages (inc. cryptic)
nrow(taxa)
# number of morph-distinguishable taxa
length(unique(taxa[,6]))
# okay...
###########
# Count Cryptic Example Number Three
# now let's try cryptic speciation *with* 50 extant taxa!
# first, count.cryptic = FALSE (default)
record <- simFossilRecord(
p = 0.1,
q = 0.1,
r = 0.1,
anag.rate = 0,
prop.bifurc = 0,
prop.cryptic = 0.5,
nruns = 1,
nExtant = 10,
totalTime = c(1,100),
count.cryptic = FALSE
)
taxa <- fossilRecord2fossilTaxa(record)
### Count the taxa/lineages !
# number of still-living lineages (inc. cryptic)
sum(taxa[,5])
# number of still-living morph-dist. taxa
length(unique(taxa[taxa[,5] == 1,6]))
##############
# Count Cryptic Example Number Four
# like above with count.cryptic = TRUE
record <- simFossilRecord(
p = 0.1,
q = 0.1,
r = 0.1,
anag.rate = 0,
prop.bifurc = 0,
prop.cryptic = 0.5,
nruns = 1,
nExtant = 10,
totalTime = c(1,100),
count.cryptic = TRUE
)
taxa <- fossilRecord2fossilTaxa(record)
### Count the taxa/lineages !
# number of still-living lineages (inc. cryptic)
sum(taxa[,5])
# number of still-living morph-dist. taxa
length(unique(taxa[taxa[,5] == 1,6]))
#################################################
# Specifying Number of Initial Taxa
# Example using startTaxa to have more initial taxa
record <- simFossilRecord(
p = 0.1,
q = 0.1,
r = 0.1,
nruns = 1,
nTotalTaxa = 100,
startTaxa = 20,
plot = TRUE
)
######################################################
# Specifying Combinations of Simulation Conditions
# Users can generate datasets that meet multiple conditions:
# such as time, number of total taxa, extant taxa, sampled taxa
# These can be set as point conditions or ranges
# let's set time = 10-100 units, total taxa = 30-40, extant = 10
#and look at acceptance rates with print.run
record <- simFossilRecord(
p = 0.1,
q = 0.1,
r = 0.1,
nruns = 1,
totalTime = c(10,100),
nTotalTaxa = c(30,40),
nExtant = 10,
print.runs = TRUE,
plot = TRUE
)
# let's make the constraints on totaltaxa a little tighter
record <- simFossilRecord(
p = 0.1,
q = 0.1,
r = 0.1,
nruns = 1,
totalTime = c(50,100),
nTotalTaxa = 30,
nExtant = 10,
print.runs = TRUE,
plot = TRUE
)
# still okay acceptance rates
# alright, now let's add a constraint on sampled taxa
record <- simFossilRecord(
p = 0.1,
q = 0.1,
r = 0.1,
nruns = 1,
totalTime = c(50,100),
nTotalTaxa = 30,
nExtant = 10,
nSamp = 15,
print.runs = TRUE,
plot = TRUE
)
# still okay acceptance rates
# we can be really odd and instead condition on having a single taxon
set.seed(1)
record <- simFossilRecord(
p = 0.1,
q = 0.1,
r = 0.1,
nTotalTaxa = 1,
totalTime = c(10,20),
plot = TRUE
)
########################################################
# Simulations of Entirely Extinct Taxa
# Typically, a user may want to condition on a precise
# number of sampled taxa in an all-extinct simulation
record <- simFossilRecord(
p = 0.1,
q = 0.1,
r = 0.1,
nruns = 1,
nTotalTaxa = c(1,100),
nExtant = 0,
nSamp = 20,
print.runs = TRUE,
plot = TRUE
)
# Note that when simulations don't include
# sampling or extant taxa, the plot
# functionality changes
record <- simFossilRecord(
p = 0.1,
q = 0.1,
r = 0,
nruns = 1,
nExtant = 0,
print.runs = TRUE,
plot = TRUE
)
# Something similar happens when there is no sampling
# and there are extant taxa but they aren't sampled
record <- simFossilRecord(
p = 0.1,
q = 0.1,
r = 0,
nruns = 1,
nExtant = 10,
nTotalTaxa = 100,
modern.samp.prob = 0,
print.runs = TRUE,
plot = TRUE
)
########################################################
# Retaining Rejected Simulations
# sometimes we might want to look at all the simulations
# that don't meet acceptability criteria
# In particular, look at simulated clades that go extinct
# rather than surviving long enough to satisfy
# conditioning on temporal duration.
# Let's look for 10 simulations with following conditioning:
# that are exactly 10 time-units in duration
# that have between 10 and 30 total taxa
# and have 1 to 30 extant taxa after 10 time-units
set.seed(4)
record <- simFossilRecord(
p = 0.1,
q = 0.1,
r = 0.1,
nruns = 10,
totalTime = 10,
nTotalTaxa = c(10,30),
nExtant = c(1,30),
returnAllRuns = TRUE,
print.runs = TRUE,
plot = TRUE
)
# when returnAllRuns = TRUE, the length of record is 2
# named 'accepted' and 'rejected'
# all the accepted runs (all 10) are in 'accepted'
length(record$accepted)
# all the rejected runs are in 'rejected'
length(record$rejected)
# probably many more than 10!
# (I got 1770!)
# how many taxa are in each rejected simulation run?
totalTaxa_rej <- sapply(record$rejected, length)
# plot as a histogram
hist(totalTaxa_rej)
# a very nice exponential distribution...
# plot the rejected simulation with the most taxa
divCurveFossilRecordSim(
fossilRecord = record$rejected[[
which(max(totalTaxa_rej) == totalTaxa_rej)[1]
]]
)
# we can plot all of these too...
result <- sapply(record$rejected,
divCurveFossilRecordSim)
# let's look at the temporal duration of rejected clades
# need to write a function
getDuration <- function(record){
taxa <- fossilRecord2fossilTaxa(record)
maxAge <- max(taxa[,"orig.time"], na.rm = TRUE)
minAge <- min(taxa[,"ext.time"], na.rm = TRUE)
cladeDuration <- maxAge - minAge
return(cladeDuration)
}
# all the accepted simulations should have
# identical durations (10 time-units)
sapply(record$accepted, getDuration)
# now the rejected set
durations_rej <- sapply(record$rejected, getDuration)
# plot as a histogram
hist(durations_rej)
# Most simulations hit the max time without
# satisfying the other specified constraints
# (probably they didn't have the min of 10 taxa total)