R: (S)MERF algorithm

MERF {LongituRF}

R Documentation

(S)MERF algorithm

Description

(S)MERF is an adaptation of the random forest regression method to longitudinal data introduced by Hajjem et. al. (2014) <doi:10.1080/00949655.2012.741599>. The model has been improved by Capitaine et. al. (2020) <doi:10.1177/0962280220946080> with the addition of a stochastic process. The algorithm will estimate the parameters of the following semi-parametric stochastic mixed-effects model:

Y_i(t)=f(X_i(t))+Z_i(t)\beta_i + \omega_i(t)+\epsilon_i

with Y_i(t) the output at time t for the ith individual; X_i(t) the input predictors (fixed effects) at time t for the ith individual; Z_i(t) are the random effects at time t for the ith individual; \omega_i(t) is the stochastic process at time t for the ith individual which model the serial correlations of the output measurements; \epsilon_i is the residual error.

Usage

MERF(
  X,
  Y,
  id,
  Z,
  iter = 100,
  mtry = ceiling(ncol(X)/3),
  ntree = 500,
  time,
  sto,
  delta = 0.001
)

Arguments

`X`	[matrix]: A `N`x`p` matrix containing the `p` predictors of the fixed effects, column codes for a predictor.
`Y`	[vector]: A vector containing the output trajectories.
`id`	[vector]: Is the vector of the identifiers for the different trajectories.
`Z`	[matrix]: A `N`x`q` matrix containing the `q` predictor of the random effects.
`iter`	[numeric]: Maximal number of iterations of the algorithm. The default is set to `iter=100`
`mtry`	[numeric]: Number of variables randomly sampled as candidates at each split. The default value is `p/3`.
`ntree`	[numeric]: Number of trees to grow. This should not be set to too small a number, to ensure that every input row gets predicted at least a few times. The default value is `ntree=500`.
`time`	[vector]: Is the vector of the measurement times associated with the trajectories in `Y`,`Z` and `X`.
`sto`	[character]: Defines the covariance function of the stochastic process, can be either `"none"` for no stochastic process, `"BM"` for Brownian motion, `OrnUhl` for standard Ornstein-Uhlenbeck process, `BBridge` for Brownian Bridge, `fbm` for Fractional Brownian motion; can also be a function defined by the user.
`delta`	[numeric]: The algorithm stops when the difference in log likelihood between two iterations is smaller than `delta`. The default value is set to O.O01

Value

A fitted (S)MERF model which is a list of the following elements:

forest: Random forest obtained at the last iteration.
random_effects : Predictions of random effects for different trajectories.
id_btilde: Identifiers of individuals associated with the predictions random_effects.
var_random_effects: Estimation of the variance covariance matrix of random effects.
sigma_sto: Estimation of the volatility parameter of the stochastic process.
sigma: Estimation of the residual variance parameter.
time: The vector of the measurement times associated with the trajectories in Y,Z and X.
sto: Stochastic process used in the model.
Vraisemblance: Log-likelihood of the different iterations.
id: Vector of the identifiers for the different trajectories.
OOB: OOB error of the fitted random forest at each iteration.

Examples

set.seed(123)
data <- DataLongGenerator(n=20) # Generate the data composed by n=20 individuals.
# Train a SMERF model on the generated data. Should take ~ 50 seconds
# The data are generated with a Brownian motion,
# so we use the parameter sto="BM" to specify a Brownian motion as stochastic process
smerf <- MERF(X=data$X,Y=data$Y,Z=data$Z,id=data$id,time=data$time,mtry=2,ntree=500,sto="BM")
smerf$forest # is the fitted random forest (obtained at the last iteration).
smerf$random_effects # are the predicted random effects for each individual.
smerf$omega # are the predicted stochastic processes.
plot(smerf$Vraisemblance) # evolution of the log-likelihood.
smerf$OOB # OOB error at each iteration.

[Package LongituRF version 0.9 Index]