REEMforest {LongituRF}R Documentation

(S)REEMforest algorithm

Description

(S)REEMforest algorithm

Usage

REEMforest(
  X,
  Y,
  id,
  Z,
  iter = 100,
  mtry,
  ntree = 500,
  time,
  sto,
  delta = 0.001
)

Arguments

X

[matrix]: A Nxp matrix containing the p predictors of the fixed effects, column codes for a predictor.

Y

[vector]: A vector containing the output trajectories.

id

[vector]: Is the vector of the identifiers for the different trajectories.

Z

[matrix]: A Nxq matrix containing the q predictor of the random effects.

iter

[numeric]: Maximal number of iterations of the algorithm. The default is set to iter=100

mtry

[numeric]: Number of variables randomly sampled as candidates at each split. The default value is p/3.

ntree

[numeric]: Number of trees to grow. This should not be set to too small a number, to ensure that every input row gets predicted at least a few times. The default value is ntree=500.

time

[time]: Is the vector of the measurement times associated with the trajectories in Y,Z and X.

sto

[character]: Defines the covariance function of the stochastic process, can be either "none" for no stochastic process, "BM" for Brownian motion, OrnUhl for standard Ornstein-Uhlenbeck process, BBridge for Brownian Bridge, fbm for Fractional Brownian motion; can also be a function defined by the user.

delta

[numeric]: The algorithm stops when the difference in log likelihood between two iterations is smaller than delta. The default value is set to O.O01

Details

(S)REEMforest is an adaptation of the random forest regression method to longitudinal data introduced by Capitaine et. al. (2020) <doi:10.1177/0962280220946080>. The algorithm will estimate the parameters of the following semi-parametric stochastic mixed-effects model:

Y_i(t)=f(X_i(t))+Z_i(t)\beta_i + \omega_i(t)+\epsilon_i

with Y_i(t) the output at time t for the ith individual; X_i(t) the input predictors (fixed effects) at time t for the ith individual; Z_i(t) are the random effects at time t for the ith individual; \omega_i(t) is the stochastic process at time t for the ith individual which model the serial correlations of the output measurements; \epsilon_i is the residual error.

Value

A fitted (S)REEMforest model which is a list of the following elements:

Examples


set.seed(123)
data <- DataLongGenerator(n=20) # Generate the data composed by n=20 individuals.
# Train a SREEMforest model on the generated data. Should take ~ 50 secondes
# The data are generated with a Brownian motion
#  so we use the parameter sto="BM" to specify a Brownian motion as stochastic process
SREEMF <- REEMforest(X=data$X,Y=data$Y,Z=data$Z,id=data$id,time=data$time,mtry=2,ntree=500,sto="BM")
SREEMF$forest # is the fitted random forest (obtained at the last iteration).
SREEMF$random_effects # are the predicted random effects for each individual.
SREEMF$omega # are the predicted stochastic processes.
plot(SREEMF$Vraisemblance) #evolution of the log-likelihood.
SREEMF$OOB # OOB error at each iteration.



[Package LongituRF version 0.9 Index]