glinv {glinvci}R Documentation

Construct an GLInv model with respect to user-specified parametrisation

Description

The glinv function construct an object of class glinv, which represents a GLInv model with respect to a user-specified parametrisation.

The lik.glinv function returns a function which accepts a parameter vector, which is of length mod$nparams, and returns the log-likelihood.

The grad.glinv function returns a function which accepts a parameter vector, which is of length mod$nparams, and returns the gradient of log-likelihood with respect to this parametrisation.

The hess.glinv function returns a function which accepts a parameter vector, which is of length mod$nparams, and returns the Hessian matrix of log-likelihood with respect to this parametrisation.

Usage

glinv(
  tree,
  x0,
  X,
  parfns = NULL,
  pardims = NULL,
  regimes = NULL,
  parjacs = NULL,
  parhess = NULL,
  repar = NULL
)

## S3 method for class 'glinv'
print(x, ...)

## S3 method for class 'glinv'
lik(mod, ...)

## S3 method for class 'glinv'
grad(
  mod,
  num_threads = 2L,
  numDerivArgs = list(method = "Richardson", method.args = list(d = 0.5, r = 3)),
  ...
)

## S3 method for class 'glinv'
hess(
  mod,
  num_threads = 2L,
  numDerivArgs = list(method = "Richardson", method.args = list(d = 0.5, r = 3)),
  store_gaussian_hessian = FALSE,
  ...
)

## S3 method for class 'glinv'
plot(x, internal_nodes = TRUE, ...)

Arguments

tree

A tree of class phylo.

x0

A vector representing the root's trait vector. Must not contain NA and NaN.

X

Optional. A matrix of trait values, in which X[p,n] stores the p-th dimension of the multivariate trait of the n-th tip of the phylogeny. NA and NaN has special meanings (See Details).

parfns

A list of functions that maps from the user-parametrisation to the underlying Gaussian parameters. Each of them returns a vector of concatenated (\Phi, w, V'), where V' is the lower triangular part of V, and accepts four arguments: a vector of parameters whose length is specified by the pardims argument to the glinv_gauss function, the branch length leading to the currently processing node, a vector of factors with three levels indicating which dimensions are missing or lost in the mother of the current node, and a vector of factors with the same three levels indicating missingness of the current node.

pardims

A vector of integers, which has the same amount elements as the length of parfns. pardims[i] indicates the length of the parameter vector that parfns[i] accepts.

regimes

A list of length-two integer vectors. Each of these length-two vectors specifies an evolutionary regime and consists of a named element start, which specifies the node ID at which an evolutionary regime starts, and another named element fn, which is an index of parfns, indicating which parametrisation function this evolutionary regime should use.

parjacs

A list of functions, which has the same amount elements as that of parfns. parjacs[i] accepts the same arguments as parfns[i] and returns the Jacobian of parfns[i].

parhess

A list of functions, which has the same amount elements as that of parfn[i]. parhess[i] accepts the same arguments as parfns[i] and returns a list of three 3D arrays, named Phi, w, V respectively inside the list. ((parhess[[i]])(...))$Phi[m,i,j] contains the cross second-order partial derivative of \Phi_m (here we treat the matrix \Phi as a column-major-flattened vector) with respect to the i-th andj-th user parameters; while ((parhess[[i]])(...))$w[m,i,j] and ((parhess[[i]])(...))$V[m,i,j] analogously contains second-order derivative of w_m and V'_m.

repar

Optional. One or a list of object returned by get_restricted_ou. This is a convenient short-cut alternative to supplying pardims, parfns, parjacs, and parhess one-by-one.

x

An object of class glinv.

...

Not used.

mod

An object of class glinv.

num_threads

Number of threads to use.

numDerivArgs

Arguments to pass to numDeriv::jacobian. Only used the user did not specify the parjacs arguments when creating mod.

store_gaussian_hessian

If TRUE and method is not mc, the returned list will contain a (usually huge) Hessian matrix gaussian_hessian with respect to the Gaussian parameters \Phi, w, V'. This option significantly increases the amount of memory the function uses, in order to store the matrix.

internal_nodes

Boolean, whether to plot the internal nodes's numbers

Details

Models for glinv assumes one or more evolutionary regimes exists in the phylogeny. The regimes parameters defines how many regimes there are, where do the regimes start, and what parameterisation function it has. If regimes were NULL then a single regime starting from the root node is assumed. Multiple regimes could share the same parametrisation function (and thus the same parameters) by specifying the same index; therefore the number of regimes may differs from the number of parametrisation functions. One and only one regime must start from the root of the phylogeny.

If X contains NA in the p-th dimension of the i-th tip (whose node ID is also i) then X_pi is tagged MISSING. No other tags of any other nodes are changed. The p-th dimension of any node j, regardless of whether or not it is an internal node or a tips, is tagged LOST if and only if the p-th dimension of all tips inside the clade started at j are NaN. Any entry that is neither LOST nor MISSING are tagged OK. These tags are then passed into the user-defined functions parfns etc. as arguments; therefore the user is free to specify how these tags are handled. x0 cannot contain missing values, and the vectors of missingness tags passed to parfns, for any nodes, are always of the same length as x0.

Before this package calls the functions in parhess, it adds, into the function's environment, a variable named INFO__ which contains some extra information.

Passing a single function to parfns is equivalent to passing a singleton list; and the same is true for parjacs, parhess, and pardims.

Value

The glinv function returns a model object of S3 class glinv. Elements are:

rawmod

An object of class glinv_gauss.

regimes

Identical to the regimes argument.

regtags

An integer vector of the same length as the number of nodes. The i-th element is the regime ID (corresponding to the index in the regimes argument to the glinv_gauss function) of node i. NA at the root.

misstags

A factor matrix with three ordered levels, LOST, OK, and MISSING. Each column corresponds to a node and row to a trait dimension.

nparams

The sum of the pardims argument, an integer.

pardims

Identical to the pardims arguemnt.

parfntags

An integer vector of the same length as the number of nodes. The i-th element is the index of parfns that corresponds to node i. NA at the root.

parfns

Identical to the parfns argument.

parjacs

Identical to the parjacs argument.

parhess

Identical to the parhess argument.

parsegments

A K-by-2 matrix of integer indicies, where K is the length of parfns. If v is a vector that lik.glinv accepts, then v[parsegments[k,1]:parsegments[k,2]] is the parameter vector should parfns[[k]] accept.

gausssegments

A N-by-2 matrix of integer indicies, where N is the number of nodes. If w is a vector that lik.glinv_gauss accepts, then w[gausssegments[i,1]:gausssegments[i,2]] is the concatenated (\Phi, w, V') corresponding to node i.

gaussparams_fn

A function that accepts a parameter vector of length nparams and returns a parameter vector of length rawmod$nparams. When called, this function traverses the tree, calls the functions in parfns on each node, and assemble the results into a format that lik.glinv_gauss accepts.

gaussparams_jac

A function that accepts a parameter vector of length nparams and returns a p-by-q Jacobian matrix, where p is rawmod$nparams and q is nparams in this object. When called, this function traverses the tree, calls the functions in parjacs on each node, and row-concatenates the result in an order consistent with what lik.glinv_gauss accepts.

X

The original data (trait) matrix in a "normalized" format.

References

Mitov V, Bartoszek K, Asimomitis G, Stadler T (2019). “Fast likelihood calculation for multivariate Gaussian phylogenetic models with shifts.” Theor. Popul. Biol.. https://doi.org/10.1016/j.tpb.2019.11.005.

Examples


library(glinvci)
### --- STEP 1: Make an example tree
set.seed(0x5EEDL, kind='Super-Duper')
ntips = 200
k     = 2                 # No. of trait dimensions
tr    = ape::rtree(ntips) 
x0    = rnorm(k)

### --- STEP 2: Make a model object. We use OU as an example.
###             Assume H is a positively definite diagonal matrix and in
###             log scale.
mod = glinv(tr, x0, X=NULL,
            parfns  = list(ou_logdiagH(ou_haltlost(oupar))),
            pardims = list(nparams_ou_diagH(k)),
            parjacs = list(dou_logdiagH(dou_haltlost(oujac))),
            parhess = list(hou_logdiagH(hou_haltlost(ouhess))))

### --- STEP 3: Set up parameters; H, theta, and sig_x needs to be concatenated
H     = matrix(c(2,0,0,1/2), k) #Diagonals, 
theta = c(0,0)
sig   = matrix(c(0.5,0.1,0.1,0.2), k)
sig_x = t(chol(sig))
## glinvci ALWAYS assumes diagonals of sig_x is in log scale.
diag(sig_x) = log(diag(sig_x))
par_truth = c(logdiagH=log(diag(H)),theta=theta,sig_x=sig_x[lower.tri(sig_x,diag=TRUE)])
## Now par_truth the vector of parameters in the right format that the model
## can consume. Notice about we use log(diag(H)) because we specified ou
## logdiagH earlier.

### --- STEP 4: Simulate a data set from the model and the true parameters,
### then set this data into the model.
X = rglinv(mod, par=par_truth)
set_tips(mod, X)

### --- STEP 5: Try computing the likelihood, gradient and Hessian justifying
### for illustration purpose.
print(par_truth)
print(lik(mod)(par_truth))
print(grad(mod)(par_truth))
print(hess(mod)(par_truth))

### --- STEP 6: Fit a model; here we use the truth as initialisation
### only for illustration purpose to reduce load on CRAN's server. In reality
### you usually want to initialise with either some best guess or random
### values.
fitted = fit(mod, parinit = par_truth)
print(fitted)

### --- STEP 7: Estimate the variance-covariance matrix of the MLE
v_estimate = varest(mod, fitted)

### --- STEP 8: Get marginal confidence intervals; and compare with the truth.
print(marginal_ci(v_estimate, lvl=0.95)) 
print(par_truth)


[Package glinvci version 1.2.4 Index]