R: Construct an GLInv model with respect to user-specified...

glinv {glinvci}

R Documentation

Construct an GLInv model with respect to user-specified parametrisation

Description

The glinv function construct an object of class glinv, which represents a GLInv model with respect to a user-specified parametrisation.

The lik.glinv function returns a function which accepts a parameter vector, which is of length mod$nparams, and returns the log-likelihood.

The grad.glinv function returns a function which accepts a parameter vector, which is of length mod$nparams, and returns the gradient of log-likelihood with respect to this parametrisation.

The hess.glinv function returns a function which accepts a parameter vector, which is of length mod$nparams, and returns the Hessian matrix of log-likelihood with respect to this parametrisation.

Usage

glinv(
  tree,
  x0,
  X,
  parfns = NULL,
  pardims = NULL,
  regimes = NULL,
  parjacs = NULL,
  parhess = NULL,
  repar = NULL
)

## S3 method for class 'glinv'
print(x, ...)

## S3 method for class 'glinv'
lik(mod, ...)

## S3 method for class 'glinv'
grad(
  mod,
  num_threads = 2L,
  numDerivArgs = list(method = "Richardson", method.args = list(d = 0.5, r = 3)),
  ...
)

## S3 method for class 'glinv'
hess(
  mod,
  num_threads = 2L,
  numDerivArgs = list(method = "Richardson", method.args = list(d = 0.5, r = 3)),
  store_gaussian_hessian = FALSE,
  ...
)

## S3 method for class 'glinv'
plot(x, internal_nodes = TRUE, ...)

Arguments

`tree`	A tree of class `phylo`.
`x0`	A vector representing the root's trait vector. Must not contain `NA` and `NaN`.
`X`	Optional. A matrix of trait values, in which `X[p,n]` stores the p-th dimension of the multivariate trait of the n-th tip of the phylogeny. `NA` and `NaN` has special meanings (See Details).
`parfns`	A list of functions that maps from the user-parametrisation to the underlying Gaussian parameters. Each of them returns a vector of concatenated `(\Phi, w, V')`, where `V'` is the lower triangular part of `V`, and accepts four arguments: a vector of parameters whose length is specified by the `pardims` argument to the `glinv_gauss` function, the branch length leading to the currently processing node, a vector of factors with three levels indicating which dimensions are missing or lost in the mother of the current node, and a vector of factors with the same three levels indicating missingness of the current node.
`pardims`	A vector of integers, which has the same amount elements as the length of parfns. `pardims[i]` indicates the length of the parameter vector that `parfns[i]` accepts.
`regimes`	A list of length-two integer vectors. Each of these length-two vectors specifies an evolutionary regime and consists of a named element `start`, which specifies the node ID at which an evolutionary regime starts, and another named element `fn`, which is an index of `parfns`, indicating which parametrisation function this evolutionary regime should use.
`parjacs`	A list of functions, which has the same amount elements as that of `parfns`. `parjacs[i]` accepts the same arguments as `parfns[i]` and returns the Jacobian of `parfns[i]`.
`parhess`	A list of functions, which has the same amount elements as that of `parfn[i]`. `parhess[i]` accepts the same arguments as `parfns[i]` and returns a list of three 3D arrays, named `Phi`, `w`, `V` respectively inside the list. `((parhess[[i]])(...))$Phi[m,i,j]` contains the cross second-order partial derivative of `\Phi_m` (here we treat the matrix `\Phi` as a column-major-flattened vector) with respect to the `i`-th and`j`-th user parameters; while `((parhess[[i]])(...))$w[m,i,j]` and `((parhess[[i]])(...))$V[m,i,j]` analogously contains second-order derivative of `w_m` and `V'_m`.
`repar`	Optional. One or a list of object returned by `get_restricted_ou`. This is a convenient short-cut alternative to supplying `pardims`, `parfns`, `parjacs`, and `parhess` one-by-one.
`x`	An object of class `glinv`.
`...`	Not used.
`mod`	An object of class `glinv`.
`num_threads`	Number of threads to use.
`numDerivArgs`	Arguments to pass to `numDeriv::jacobian`. Only used the user did not specify the `parjacs` arguments when creating `mod`.
`store_gaussian_hessian`	If `TRUE` and `method` is not `mc`, the returned list will contain a (usually huge) Hessian matrix `gaussian_hessian` with respect to the Gaussian parameters `\Phi, w, V'`. This option significantly increases the amount of memory the function uses, in order to store the matrix.
`internal_nodes`	Boolean, whether to plot the internal nodes's numbers

Details

Models for glinv assumes one or more evolutionary regimes exists in the phylogeny. The regimes parameters defines how many regimes there are, where do the regimes start, and what parameterisation function it has. If regimes were NULL then a single regime starting from the root node is assumed. Multiple regimes could share the same parametrisation function (and thus the same parameters) by specifying the same index; therefore the number of regimes may differs from the number of parametrisation functions. One and only one regime must start from the root of the phylogeny.

If X contains NA in the p-th dimension of the i-th tip (whose node ID is also i) then X_pi is tagged MISSING. No other tags of any other nodes are changed. The p-th dimension of any node j, regardless of whether or not it is an internal node or a tips, is tagged LOST if and only if the p-th dimension of all tips inside the clade started at j are NaN. Any entry that is neither LOST nor MISSING are tagged OK. These tags are then passed into the user-defined functions parfns etc. as arguments; therefore the user is free to specify how these tags are handled. x0 cannot contain missing values, and the vectors of missingness tags passed to parfns, for any nodes, are always of the same length as x0.

Before this package calls the functions in parhess, it adds, into the function's environment, a variable named INFO__ which contains some extra information.

Passing a single function to parfns is equivalent to passing a singleton list; and the same is true for parjacs, parhess, and pardims.

Value

The glinv function returns a model object of S3 class glinv. Elements are:

`rawmod`	An object of class `glinv_gauss`.
`regimes`	Identical to the `regimes` argument.
`regtags`	An integer vector of the same length as the number of nodes. The `i`-th element is the regime ID (corresponding to the index in the `regimes` argument to the `glinv_gauss` function) of node `i`. `NA` at the root.
`misstags`	A factor matrix with three ordered levels, `LOST`, `OK`, and `MISSING`. Each column corresponds to a node and row to a trait dimension.
`nparams`	The sum of the `pardims` argument, an integer.
`pardims`	Identical to the `pardims` arguemnt.
`parfntags`	An integer vector of the same length as the number of nodes. The `i`-th element is the index of `parfns` that corresponds to node `i`. `NA` at the root.
`parfns`	Identical to the `parfns` argument.
`parjacs`	Identical to the `parjacs` argument.
`parhess`	Identical to the `parhess` argument.
`parsegments`	A `K`-by-2 matrix of integer indicies, where `K` is the length of `parfns`. If `v` is a vector that `lik.glinv` accepts, then `v[parsegments[k,1]:parsegments[k,2]]` is the parameter vector should `parfns[[k]]` accept.
`gausssegments`	A `N`-by-2 matrix of integer indicies, where `N` is the number of nodes. If `w` is a vector that `lik.glinv_gauss` accepts, then `w[gausssegments[i,1]:gausssegments[i,2]]` is the concatenated `(\Phi, w, V')` corresponding to node `i`.
`gaussparams_fn`	A function that accepts a parameter vector of length `nparams` and returns a parameter vector of length `rawmod$nparams`. When called, this function traverses the tree, calls the functions in parfns on each node, and assemble the results into a format that `lik.glinv_gauss` accepts.
`gaussparams_jac`	A function that accepts a parameter vector of length `nparams` and returns a `p`-by-`q` Jacobian matrix, where `p` is `rawmod$nparams` and `q` is `nparams` in this object. When called, this function traverses the tree, calls the functions in `parjacs` on each node, and row-concatenates the result in an order consistent with what `lik.glinv_gauss` accepts.
`X`	The original data (trait) matrix in a "normalized" format.

References

Mitov V, Bartoszek K, Asimomitis G, Stadler T (2019). “Fast likelihood calculation for multivariate Gaussian phylogenetic models with shifts.” Theor. Popul. Biol.. https://doi.org/10.1016/j.tpb.2019.11.005.

Examples


library(glinvci)
### --- STEP 1: Make an example tree
set.seed(0x5EEDL, kind='Super-Duper')
ntips = 200
k     = 2                 # No. of trait dimensions
tr    = ape::rtree(ntips) 
x0    = rnorm(k)

### --- STEP 2: Make a model object. We use OU as an example.
###             Assume H is a positively definite diagonal matrix and in
###             log scale.
mod = glinv(tr, x0, X=NULL,
            parfns  = list(ou_logdiagH(ou_haltlost(oupar))),
            pardims = list(nparams_ou_diagH(k)),
            parjacs = list(dou_logdiagH(dou_haltlost(oujac))),
            parhess = list(hou_logdiagH(hou_haltlost(ouhess))))

### --- STEP 3: Set up parameters; H, theta, and sig_x needs to be concatenated
H     = matrix(c(2,0,0,1/2), k) #Diagonals, 
theta = c(0,0)
sig   = matrix(c(0.5,0.1,0.1,0.2), k)
sig_x = t(chol(sig))
## glinvci ALWAYS assumes diagonals of sig_x is in log scale.
diag(sig_x) = log(diag(sig_x))
par_truth = c(logdiagH=log(diag(H)),theta=theta,sig_x=sig_x[lower.tri(sig_x,diag=TRUE)])
## Now par_truth the vector of parameters in the right format that the model
## can consume. Notice about we use log(diag(H)) because we specified ou
## logdiagH earlier.

### --- STEP 4: Simulate a data set from the model and the true parameters,
### then set this data into the model.
X = rglinv(mod, par=par_truth)
set_tips(mod, X)

### --- STEP 5: Try computing the likelihood, gradient and Hessian justifying
### for illustration purpose.
print(par_truth)
print(lik(mod)(par_truth))
print(grad(mod)(par_truth))
print(hess(mod)(par_truth))

### --- STEP 6: Fit a model; here we use the truth as initialisation
### only for illustration purpose to reduce load on CRAN's server. In reality
### you usually want to initialise with either some best guess or random
### values.
fitted = fit(mod, parinit = par_truth)
print(fitted)

### --- STEP 7: Estimate the variance-covariance matrix of the MLE
v_estimate = varest(mod, fitted)

### --- STEP 8: Get marginal confidence intervals; and compare with the truth.
print(marginal_ci(v_estimate, lvl=0.95)) 
print(par_truth)

[Package glinvci version 1.2.4 Index]