R: Fit a high-dimensional PLSMM

plsmm_lasso {plsmmLasso}

R Documentation

Fit a high-dimensional PLSMM

Description

Fits a partial linear semiparametric mixed effects model (PLSMM) via penalized maximum likelihood.

Usage

plsmm_lasso(
  x,
  y,
  series,
  t,
  name_group_var = NULL,
  bases,
  gamma,
  lambda,
  timexgroup,
  criterion,
  nonpara = FALSE,
  cvg_tol = 0.001,
  max_iter = 100,
  verbose = FALSE
)

Arguments

`x`	A matrix of predictor variables.
`y`	A continuous vector of response variable.
`series`	A variable representing different series or groups in the data modeled as a random intercept.
`t`	A numeric vector indicating the timepoints.
`name_group_var`	A character string specifying the name of the grouping variable in the `x` matrix.
`bases`	A matrix of bases functions.
`gamma`	The regularization parameter for the nonlinear effect of time.
`lambda`	The regularization parameter for the fixed effects.
`timexgroup`	Logical indicating whether to use a time-by-group interaction. If `TRUE`, each group in `name_group_var` will have its own estimate of the time effect.
`criterion`	The information criterion to be computed. Options are "BIC", "BICC", or "EBIC".
`nonpara`	Logical. If TRUE, the `criterion` is computed using both the coefficients of the fixed-effects and the coefficients of the nonlinear function. If FALSE, only the coefficients of the fixed-effects are used.
`cvg_tol`	Convergence tolerance for the algorithm.
`max_iter`	Maximum number of iterations allowed for convergence.
`verbose`	Logical indicating whether to print convergence details at each iteration. Default is `FALSE`.

Details

This function fits a PLSMM with a lasso penalty on the fixed effects and the coefficient associated with the bases functions. It uses the Expectation-Maximization (EM) algorithm for estimation. The bases functions represent a nonlinear effect of time.

The model includes a random intercept for each level of the variable specified by series. Additionally, if timexgroup is set to TRUE, the model includes a time-by-group interaction, allowing each group of name_group_var to have its own estimate of the nonlinear function, which can capture group-specific nonlinearities over time. If name_group_var is set to NULL only one nonlinear function for the whole data is being used

The algorithm iteratively updates the estimates until convergence or until the maximum number of iterations is reached.

Value

A list containing the following components:

`lasso_output`	A list with the fitted values for the fixed effect and nonlinear effect. The estimated coeffcients for the fixed effects and nonlinear effect. The indices of the used bases functions.
`se`	Estimated standard deviation of the residuals.
`su`	Estimated standard deviation of the random intercept.
`out_phi`	Data frame containing the estimated individual random intercept.
`ni`	Number of timepoitns per observations.
`hyperparameters`	Data frame with lambda and gamma values.
`converged`	Logical indicating if the algorithm converged.
`crit`	Value of the selected information criterion.

Examples


set.seed(123)
data_sim <- simulate_group_inter(
  N = 50, n_mvnorm = 3, grouped = TRUE,
  timepoints = 3:5, nonpara_inter = TRUE,
  sample_from = seq(0, 52, 13), 
  cos = FALSE, A_vec = c(1, 1.5)
)
sim <- data_sim$sim
x <- as.matrix(sim[, -1:-3])
y <- sim$y
series <- sim$series
t <- sim$t
bases <- create_bases(t)
lambda <- 0.0046
gamma <- 0.00000001
plsmm_output <- plsmm_lasso(x, y, series, t,
  name_group_var = "group", bases$bases,
  gamma = gamma, lambda = lambda, timexgroup = TRUE,
  criterion = "BIC"
)
# fixed effect coefficients
plsmm_output$lasso_output$theta

# fixed effect fitted values
plsmm_output$lasso_output$x_fit

# nonlinear functions coefficients
plsmm_output$lasso_output$alpha

# nonlinear functions fitted values
plsmm_output$lasso_output$out_f

# standard deviation of residuals
plsmm_output$se

# standard deviation of random intercept
plsmm_output$su

# series specific random intercept
plsmm_output$out_phi

[Package plsmmLasso version 1.1.0 Index]