R: Penalization in Large Scale Generalized Linear Array Models

glamlasso {glamlasso}

R Documentation

Penalization in Large Scale Generalized Linear Array Models

Description

Efficient design matrix free procedure for fitting large scale penalized 2 or 3-dimensional generalized linear array models (GLAM). It is also possible to fit an additional non-tensor structured component - e.g an intercept - however this can reduce the computational efficiency of the procedure substantially. Currently the LASSO penalty and the SCAD penalty are both implemented. Furthermore, the Gaussian model with identity link, the Binomial model with logit link, the Poisson model with log link and the Gamma model with log link is currently implemented. The underlying algorithm combines gradient descent and proximal gradient (gdpg algorithm), see Lund et al., 2017.

Usage

glamlasso(X, 
          Y, 
          Z = NULL,
          family = "gaussian",
          penalty = "lasso",
          intercept = FALSE,
          weights = NULL,
          betainit = NULL,
          alphainit = NULL,
          nlambda = 100,
          lambdaminratio = 1e-04,
          lambda = NULL,
          penaltyfactor = NULL,
          penaltyfactoralpha = NULL,
          reltolinner = 1e-07,
          reltolouter = 1e-04,
          maxiter = 15000,
          steps = 1,
          maxiterinner = 3000,
          maxiterouter = 25,
          btinnermax = 100,
          btoutermax = 100,
          iwls = "exact",
          nu = 1)

Arguments

`X`	A list containing the tensor components (2 or 3) of the tensor design matrix. These are matrices of sizes `n_i \times p_i`.
`Y`	The response values, an array of size `n_1 \times\cdots\times n_d`. For option `family = "binomial"` this array must contain the proportion of successes and the number of trials is then specified as `weights` (see below).
`Z`	The non tensor structrured part of the design matrix. A matrix of size `n_1 \cdots n_d\times q`. Is set to `NULL` as default.
`family`	A string specifying the model family (essentially the response distribution). Possible values are `"gaussian", "binomial", "poisson", "gamma"`.
`penalty`	A string specifying the penalty. Possible values are `"lasso", "scad"`.
`intercept`	Logical variable indicating if the model includes an intercept. When `intercept = TRUE` the first coulmn in the non-tensor design component `Z` is all 1s. Default is `FALSE`.
`weights`	Observation weights, an array of size `n_1 \times \cdots \times n_d`. For option `family = "binomial"` this array must contain the number of trials and must be provided.
`betainit`	The initial parameter values. Default is NULL in which case all parameters are initialized at zero.
`alphainit`	A `q\times 1` vector containing the initial parameter values for the non-tensor parameter. Default is NULL in which case all parameters are initialized at 0.
`nlambda`	The number of `lambda` values.
`lambdaminratio`	The smallest value for `lambda`, given as a fraction of `\lambda_{max}`; the (data derived) smallest value for which all coefficients are zero.
`lambda`	The sequence of penalty parameters for the regularization path.
`penaltyfactor`	An array of size `p_1 \times \cdots \times p_d`. Is multiplied with each element in `lambda` to allow differential shrinkage on the coefficients.
`penaltyfactoralpha`	A `q \times 1` vector multiplied with each element in `lambda` to allow differential shrinkage on the non-tensor coefficients.
`reltolinner`	The convergence tolerance for the inner loop
`reltolouter`	The convergence tolerance for the outer loop.
`maxiter`	The maximum number of inner iterations allowed for each `lambda` value, when summing over all outer iterations for said `lambda`.
`steps`	The number of steps used in the multi-step adaptive lasso algorithm for non-convex penalties. Automatically set to 1 when `penalty = "lasso"`.
`maxiterinner`	The maximum number of inner iterations allowed for each outer iteration.
`maxiterouter`	The maximum number of outer iterations allowed for each lambda.
`btinnermax`	Maximum number of backtracking steps allowed in each inner iteration. Default is `btinnermax = 100`.
`btoutermax`	Maximum number of backtracking steps allowed in each outer iteration. Default is `btoutermax = 100`.
`iwls`	A string indicating whether to use the exact iwls weight matrix or use a kronecker structured approximation to it.
`nu`	A number between 0 and 1 that controls the step size `\delta` in the proximal algorithm (inner loop) by scaling the upper bound `\hat{L}_h` on the Lipschitz constant `L_h` (see Lund et al., 2017). For `nu = 1` backtracking never occurs and the proximal step size is always `\delta = 1 / \hat{L}_h`. For `nu = 0` backtracking always occurs and the proximal step size is initially `\delta = 1`. For `0 < nu < 1` the proximal step size is initially `\delta = 1/(\nu\hat{L}_h)` and backtracking is only employed if the objective function does not decrease. A `nu` close to 0 gives large step sizes and presumably more backtracking in the inner loop. The default is `nu = 1` and the option is only used if `iwls = "exact"`.

Details

Consider a (two component) generalized linear model (GLM)

g(\mu) = X\beta + Z\alpha =: \eta.

Here g is a link function, \mu is a n\times 1 vector containing the mean of the response variable Y, Z is a n\times q matrix and X a n\times p matrix with tensor structure

X = X_d\otimes\cdots\otimes X_1,

where X_1,\ldots,X_d are the marginal n_i\times p_i design matrices (tensor factors) such that p = p_1\cdots p_d and n=n_1\cdots n_d. Then \beta is the p\times 1 parameter associated with the tensor component X and \alpha the q\times 1 parameter associated with the non-tensor component Z, e.g. the intercept.

Using the generalized linear array model (GLAM) framework the model equation is

g(\mu) = \textrm{vec}(\rho(X_d,\rho(X_{d-1},\ldots,\rho(X_1,B)))) + Z\alpha ,

where \rho is the so called rotated H-transform and B is the array version of \beta. See Currie et al., 2006 for more details.

The log-likelihood is a function of \theta :=(\beta,\alpha) through the linear predictor \eta i.e. \theta \mapsto l(\eta(\theta)). In the usual exponential family framework this can be expressed as

l(\eta(\theta )) = \sum_{i = 1}^n a_i \frac{y_i \vartheta(\eta_i(\theta)) - b(\vartheta(\eta_i(\theta )))}{\psi}+c(y_i,\psi)

where \vartheta, the canonical parameter map, is linked to the linear predictor via the identity \eta(\theta) = g(b'(\vartheta)) with b the cumulant function. Here a_i \ge 0, i = 1,\ldots,n are observation weights and \psi is the dispersion parameter.

For d = 3 or d = 2, using only the marginal matrices X_1,X_2,\ldots, the function glamlasso solves the penalized estimation problem

\min_{\theta} -l(\eta(\theta)) + \lambda J (\theta),

for J either the LASSO or SCAD penalty function, in the GLAM setup for a sequence of penalty parameters \lambda>0. The underlying algorithm is based on an outer gradient descent loop and an inner proximal gradient based loop. We note that if J is not convex, as with the SCAD penalty, we use the multiple step adaptive lasso procedure to loop over the inner proximal algorithm, see Lund et al., 2017 for more details.

Note that the package is optimized towards solving the estimation problem, for \alpha = 0. For \alpha \neq 0 the user incurs a potentially substantial computational cost. Especially it is not advisable to inlcude a very large non-tensor component in the model (large q) and even adding an intecept to the model (q=1) will result in a reduction of computational efficiency.

Value

An object with S3 Class 'glamlasso'.

`spec`	A string indicating the GLAM dimension (`d = 2, 3`), the model family and the penalty.
`beta`	A `p_1\cdots p_d \times` `nlambda` matrix containing the estimates of the parameters for the tensor structured part of the model (`beta`) for each `lambda`-value.
`alpha`	A `q \times` `nlambda` matrix containing the estimates of the parameters for the non tensor structured part of the model `alpha` for each `lambda`-value. If `intercept = TRUE` the first row contains the intercept estimate for each `lambda`-value.
`lambda`	A vector containing the sequence of penalty values used in the estimation procedure.
`df`	The number of nonzero coefficients for each value of `lambda`.
`dimcoef`	A vector giving the dimension of the model coefficient array `\beta`.
`dimobs`	A vector giving the dimension of the observation (response) array `Y`.
`Iter`	A list with 4 items: `bt_iter_inner` is total number of backtracking steps performed in the inner loop, `bt_enter_inner` is the number of times the backtracking is initiated in the inner loop, `bt_iter_outer` is total number of backtracking steps performed in the outer loop, and `iter_mat` is a `nlambda` `\times` `maxiterouter` matrix containing the number of inner iterations for each `lambda` value and each outer iteration and `iter` is total number of iterations i.e. `sum(Iter)`.

Author(s)

Adam Lund

Maintainer: Adam Lund, adam.lund@math.ku.dk

References

Lund, A., M. Vincent, and N. R. Hansen (2017). Penalized estimation in large-scale generalized linear array models. Journal of Computational and Graphical Statistics, 26, 3, 709-724. url = https://doi.org/10.1080/10618600.2017.1279548.

Currie, I. D., M. Durban, and P. H. C. Eilers (2006). Generalized linear array models with applications to multidimensional smoothing. Journal of the Royal Statistical Society. Series B. 68, 259-280. url = http://dx.doi.org/10.1111/j.1467-9868.2006.00543.x.

Examples


##size of example 
n1 <- 65; n2 <- 26; n3 <- 13; p1 <- 12; p2 <- 6; p3 <- 4

##marginal design matrices (tensor components)
X1 <- matrix(rnorm(n1 * p1), n1, p1) 
X2 <- matrix(rnorm(n2 * p2), n2, p2) 
X3 <- matrix(rnorm(n3 * p3), n3, p3) 
X <- list(X1, X2, X3)

##############gaussian example 
Beta <- array(rnorm(p1 * p2 * p3) * rbinom(p1 * p2 * p3, 1, 0.1), c(p1 , p2, p3))
Mu <- RH(X3, RH(X2, RH(X1, Beta)))
Y <- array(rnorm(n1 * n2 * n3, Mu), c(n1, n2, n3))

system.time(fit <- glamlasso(X, Y))

modelno <- length(fit$lambda)
plot(c(Beta), type = "h", ylim = range(Beta, fit$coef[, modelno]))
points(c(Beta))
lines(fit$coef[ , modelno], col = "red", type = "h")

###with non tensor design component Z
q <- 5
alpha <- matrix(rnorm(q)) * rbinom(q, 1, 0.5)
Z <- matrix(rnorm(n1 * n2 * n3 * q), n1 * n2 *n3, q) 
Y <- array(rnorm(n1 * n2 * n3, Mu + array(Z %*% alpha, c(n1, n2, n3))), c(n1, n2, n3))
system.time(fit <- glamlasso(X, Y, Z))

modelno <- length(fit$lambda)
oldmfrow <- par()$mfrow
par(mfrow = c(1, 2))
plot(c(Beta), type = "l", ylim = range(Beta, fit$coef[, modelno]))
points(c(Beta))
lines(fit$coef[ , modelno], col = "red")
plot(c(alpha), type = "h", ylim = range(Beta, fit$alpha[, modelno]))
points(c(alpha))
lines(fit$alpha[ , modelno], col = "red", type = "h")
par(mfrow = oldmfrow)

################ poisson example
Beta <- array(rnorm(p1 * p2 * p3, 0, 0.1) * rbinom(p1 * p2 * p3, 1, 0.1), c(p1 , p2, p3))
Mu <- RH(X3, RH(X2, RH(X1, Beta)))
Y <- array(rpois(n1 * n2 * n3, exp(Mu)), dim = c(n1, n2, n3))
system.time(fit <- glamlasso(X, Y, family = "poisson", nu = 0.1))

modelno <- length(fit$lambda)
plot(c(Beta), type = "h", ylim = range(Beta, fit$coef[, modelno]))
points(c(Beta))
lines(fit$coef[ , modelno], col = "red", type = "h")

###with non tensor design component Z
q <- 5
alpha <- matrix(rnorm(q)) * rbinom(q, 1, 0.5)
Z <- matrix(rnorm(n1 * n2 * n3 * q), n1 * n2 *n3, q) 
Y <- array(rpois(n1 * n2 * n3, exp(Mu + array(Z %*% alpha, c(n1, n2, n3)))), dim = c(n1, n2, n3))
system.time(fit <- glamlasso(X, Y, Z, family = "poisson", nu = 0.1))

modelno <- length(fit$lambda)
oldmfrow <- par()$mfrow
par(mfrow = c(1, 2))
plot(c(Beta), type = "l", ylim = range(Beta, fit$coef[, modelno]))
points(c(Beta))
lines(fit$coef[ , modelno], col = "red")
plot(c(alpha), type = "h", ylim = range(Beta, fit$alpha[, modelno]))
points(c(alpha))
lines(fit$alpha[ , modelno], col = "red", type = "h")
par(mfrow = oldmfrow)

[Package glamlasso version 3.0.1 Index]