ou_haltlost {glinvci} | R Documentation |
Handling missing data and lost traits in Ornstein-Uhlenbeck processes
Description
ou_haltlost
and ou_zaplost
handles lost traits and missing data.
Each of them wraps the function oupar
and returns
a new function that accepts the same arguments and output the same form of result,
but takes into account lost traits and missing data. dou_haltlost
and
dou_zaplost
wraps the Jacobian function oujac
, and
hou_haltlost
and hou_zaplost
wraps the Hessian function
ouhess
.
Usage
ou_haltlost(parfn)
dou_haltlost(jacfn)
hou_haltlost(hessfn)
ou_zaplost(parfn)
dou_zaplost(jacfn)
hou_zaplost(hessfn)
Arguments
parfn |
A function that maps from the user-parametrisation to the underlying Gaussian parameters.
Each of them returns a vector of concatenated |
jacfn |
A function that accepts the same arguments as |
hessfn |
A function that accepts the same arguments as |
Details
What is missing traits and lost traits
A ‘missing’ trait refers to a trait value whose data is missing due to data
collection problems. Fundamentally, they evolves in the same manner as other
traits. An NA
entry in the data is deemed ‘missing’. On the other hand,
a lost trait is a trait dimension which had ceased to exists during the
evolutionary process. An NaN
entry in the data indicates a ‘lost’ trait.
Each nodes has their own missing-ness tags
Each trait dimension of each nodes, either internal or tip, are tagged with
one of the three labels: MISSING
, LOST
, and OK
.
If the data contains an NA
in the p
-th dimension of the i
-th tip
then X_pi
is tagged MISSING
. No other tags of any other nodes and dimensions
are changed in the case of missing-ness. On the other hands, the p
-th dimension of
any node j
, regardless of whether or not it is an internal node or a tips, is
tagged LOST
if and only if the p
-th dimension of all tips inside
the clade started at j
are NaN
. Any entry that is neither tagged
LOST
nor MISSING
are tagged OK
.
This corresponds to the biological intuition that, if a value is missing only due to data collection problems, the missingness should not influence the random walk process way up the phylogenetic tree; and this is obviously not true if the trait had ceased to exists instead.
Handling of missing data and lost traits
ou_haltlost
and ou_zaplost
handles missing data in the same way: they
simply marginalises the unobserved dimensions in the joint Gaussian distributions of
tip data.
For lost traits, ou_haltlost
assumes the followings:
In the entire branch leading to the earliest node
j
whosep
-th dimension is taggedLOST
, the lost trait dimension does not evolve at all.In the entire same branch, the magnitude of the
p
-th dimension atj
's mother node has no influence on other dimensions, in any instantaneous moments during the evolution in the branch, neither through the linear combination with the drift matrix nor the Wiener process covariance; in other words, the SDE governing the non-lost dimensions' random walk is invariant ofj
's mother nodes'p
-th dimension.
Therefore, ou_haltlost
first set the p
-th row and column of both of H_j
and the p
-th row of Sigma_x
to zero and marginalise out the degenerate Gaussian
dimension.
On the other hands, ou_zaplost
does not assume the lost trait to stop evolving
immediately at moment when the branch leading to j
starts, but, instead, simply
marginalise out the lost, non-degenerate Gaussian dimensions. This method is the same as
the one that is used in the PCMBase
package.
Usage in combination with parameter restrictions
Without paramter restriction, the following is an example usage in a call to the
glinv
function. It constructs a glinv
model object
which is capable of handling missing data and lost traits.
mod.full = glinv(tree, x0, my_data, parfns = haltlost(oupar), pardims = nparams_ou(k), parjacs = dhaltlost(oujac), parhess = hhaltlost(ouhess))
Note that we have the same naming convention that functions wrappers whose
nams have prefix d
wraps the Jacobians, while prefix d
wraps
the Hessians.
If parameter restriction is needed, then *ou_*lost
should called
before any reparameterisation/restriction functions because it
expects the passed-in function parfn
to accept the full H
matrix, rather than only the diagonal or lower-triangular part of it.
Example:
f = haltlost(oupar) g = dhaltlost(oujac) h = hhaltlost(oujac) mod.full = glinv(tree, x0, my_data, parfns = ou_spdH(f), pardims = nparams_ou_spdH(k), parjacs = dou_spdH(g), parhess = ou_spdH(h,g))
Value
ou_haltlost
and ou_zaplost
returns a wrapped versions of 'parfn', which accepts the same arguments
and outputs in the same format. dou_haltlost
and dou_zaplost
, analogously, wraps jacfn
.
hou_zaplost
and hou_zaplost
wraps hessfn
.