SMME {SMME} | R Documentation |
Soft Maximin Estimation for Large Scale Heterogenous Data
Description
Efficient procedure for solving the Lasso or SCAD penalized soft
maximin problem for large scale_y data. This software implements two proximal
gradient based algorithms (NPG and FISTA) to solve different forms of the soft
maximin problem from Lund et al., 2022. 1) For general group specific
design the soft maximin problem is solved using the NPG algorithm.
2) For fixed identical d-array-tensor design across groups, where d = 1, 2, 3
, the
estimation procedure uses either the FISTA algorithm or the NPG algorithm and
is implemented for the following two cases; i) For a tensor design matrix the
algorithms use array arithmetic to speed up design matrix multiplications
using only the tensor components ii) For a wavelet design matrix the algorithms use
the pyramid algorithm to completely avoid the design matrix and speed up
design matrix multiplications.
Multi-threading is possible when openMP is available for R.
Note this package SMME replaces the SMMA package.
Usage
softmaximin(x,
y,
zeta,
penalty = c("lasso", "scad"),
alg = c("npg", "fista"),
nlambda = 30,
lambda.min.ratio = 1e-04,
lambda = NULL,
scale_y = 1,
penalty.factor = NULL,
reltol = 1e-05,
maxiter = 1000,
steps = 1,
btmax = 100,
c = 0.0001,
tau = 2,
M = 4,
nu = 1,
Lmin = 0,
lse = TRUE,
nthreads = 2)
Arguments
x |
Either a list containing the G group specific design matrices of sizes
|
y |
list containing the G group specific response vectors of sizes
|
zeta |
vector of strictly positive floats controlling the softmaximin
approximation accuracy. When |
penalty |
string specifying the penalty type. Possible values are
|
alg |
string specifying the optimization algorithm. Possible values are
|
nlambda |
positive integer giving the number of |
lambda.min.ratio |
strictly positive float giving the smallest value for
|
lambda |
A sequence of strictly positive floats used as penalty parameters. |
scale_y |
strictly positive number that the response |
penalty.factor |
a length |
reltol |
strictly positive float giving the convergence tolerance. |
maxiter |
positive integer giving the maximum number of iterations
allowed for each |
steps |
strictly positive integer giving the number of steps used in the
multi-step adaptive lasso algorithm for non-convex penalties. Automatically
set to 1 when |
btmax |
strictly positive integer giving the maximum number of backtracking
steps allowed in each iteration. Default is |
c |
strictly positive float used in the NPG algorithm. Default is
|
tau |
strictly positive float used to control the stepsize for NPG.
Default is |
M |
positive integer giving the look back for the NPG. Default is |
nu |
strictly positive float used to control the stepsize. A value less
that 1 will decrease the stepsize and a value larger than one will increase it.
Default is |
Lmin |
non-negative float used by the NPG algorithm to control the
stepsize. For the default |
lse |
logical variable indicating whether to use the log-sum-exp-loss. TRUE is default and yields the loss below and FALSE yields the exponential of this. |
nthreads |
integer giving the number of threads to use when openMP is available. Default is 2. |
Details
Consider modeling heterogeneous data y_1,\ldots, y_n
by dividing
it into G
groups \mathbf{y}_g = (y_1, \ldots, y_{n_g})
,
g \in \{ 1,\ldots, G\}
and then using a linear model
\mathbf{y}_g = \mathbf{X}_gb_g + \epsilon_g, \quad g \in \{1,\ldots, G\},
to model the group response. Then b_g
is a group specific p\times 1
coefficient, \mathbf{X}_g
an n_g\times p
group design matrix and
\epsilon_g
an n_g\times 1
error term. The objective is to estimate
a common coefficient \beta
such that \mathbf{X}_g\beta
is a robust
and good approximation to \mathbf{X}_gb_g
across groups.
Following Lund et al., 2022, this objective may be accomplished by solving the soft maximin estimation problem
\min_{\beta}\frac{1}{\zeta}\log\bigg(\sum_{g = 1}^G \exp(-\zeta \hat V_g(\beta))\bigg)
+ \lambda \Vert\beta\Vert_1, \quad \zeta > 0,\lambda \geq 0.
Here \zeta
essentially controls the amount of pooling across groups
(\zeta \sim 0
effectively ignores grouping and pools observations) and
\hat V_g(\beta):=\frac{1}{n_g}(2\beta^\top \mathbf{X}_g^\top
\mathbf{y}_g-\beta^\top \mathbf{X}_g^\top \mathbf{X}_g\beta),
is the empirical explained variance, see Lund et al., 2022 for more details and references.
The function softmaximin
solves the soft maximin estimation problem in
large scale settings for a sequence of penalty parameters
\lambda_{max}>\ldots >\lambda_{min}>0
and a sequence of strictly positive
softmaximin parameters \zeta_1, \zeta_2,\ldots
.
The implementation also solves the problem above with the penalty given by the SCAD penalty, using the multiple step adaptive lasso procedure to loop over the inner proximal algorithm.
Two optimization algorithms are implemented in the SMME packages; a non-monotone proximal gradient (NPG) algorithm and a fast iterative soft thresholding algorithm (FISTA).
The implementation is particularly efficient for models where the design is
identical across groups i.e. \mathbf{X}_g = \mathbf{X}
\forall g \in \{1, \ldots, G\}
in the following two cases:
i) first if \mathbf{X}
has tensor structure i.e.
\mathbf{X} = \bigotimes_{i=1}^d \mathbf{M}_i
for marginal n_i\times p_i
design matrices \mathbf{M}_1,\ldots, \mathbf{M}_d
, d \in \{ 1, 2, 3\}
, y
is a d + 1
dimensional response array
and x
is a list containing the d
marginal matrices
\mathbf{M}_1,\ldots, \mathbf{M}_d
. In this case softmaximin
solves
the soft maximin problem using minimal memory by way of tensor optimized
arithmetic, see also RH
.
ii) second, if the design matrix \mathbf{X}
is the inverse matrix of an
orthogonal wavelet transform softmaximin
solves the soft maximin problem
given the d + 1
dimensional response array y
and
x
the name of the wavelet family wt
, using the
pyramid algorithm to compute multiplications
involving \mathbf{X}
.
Note that when multiple values for \zeta
is provided it is possible to
distribute the computations across CPUs if openMP is available.
Value
An object with S3 Class "SMME".
spec |
A string indicating the array dimension (1, 2 or 3) and the penalty. |
coef |
A |
lambda |
A |
df |
A |
dimcoef |
An integer giving the number |
dimobs |
An integer giving the number of observations. For array data a
vector giving the dimension of the observation (response) array |
dim |
Integer indicating the dimension of of the array model. Equal to 1 for non array. |
wf |
A string indicating the wavelet name if used. |
diagnostics |
A list of length 3. Item |
endmod |
Vector of length |
Stops |
Convergence indicators. |
Author(s)
Adam Lund
Maintainer: Adam Lund, adam.lund@math.ku.dk
References
Lund, A., S. W. Mogensen and N. R. Hansen (2022). Soft Maximin Estimation for Heterogeneous Data. Scandinavian Journal of Statistics, vol. 49, no. 4, pp. 1761-1790. url = https://doi.org/10.1111/sjos.12580
Examples
#Non-array data
##size of example
set.seed(42)
G <- 10; n <- sample(100:500, G); p <- 60
x <- y <- list()
##group design matrices
for(g in 1:G){x[[g]] <- matrix(rnorm(n[g] * p), n[g], p)}
##common features and effects
common_features <- rbinom(p, 1, 0.1) #sparsity of comm. feat.
common_effects <- rnorm(p) * common_features
##group response
for(g in 1:G){
bg <- rnorm(p, 0, 0.5) * (1 - common_features) + common_effects
mu <- x[[g]] %*% bg
y[[g]] <- rnorm(n[g]) + mu
}
##fit model for range of lambda and zeta
system.time(fit <- softmaximin(x, y, zeta = c(0.1, 1), penalty = "lasso", alg = "npg"))
betahat <- fit$coef
##estimated common effects for specific lambda and zeta
zetano <- 2
modelno <- dim(betahat[[zetano]])[2]
m <- min(betahat[[zetano]][ , modelno], common_effects)
M <- max(betahat[[zetano]][ , modelno], common_effects)
plot(common_effects, type = "p", ylim = c(m, M), col = "red")
lines(betahat[[zetano]][ , modelno], type = "h")
#Array data
##size of example
set.seed(42)
G <- 50; n <- c(30, 20, 10); p <- c(7, 5, 4)
##marginal design matrices (Kronecker components)
x <- list()
for(i in 1:length(n)){x[[i]] <- matrix(rnorm(n[i] * p[i]), n[i], p[i])}
##common features and effects
common_features <- rbinom(prod(p), 1, 0.1) #sparsity of comm. feat.
common_effects <- rnorm(prod(p),0,0.1) * common_features
##group response
y <- array(NA, c(n, G))
for(g in 1:G){
bg <- rnorm(prod(p), 0, .1) * (1 - common_features) + common_effects
Bg <- array(bg, p)
mu <- RH(x[[3]], RH(x[[2]], RH(x[[1]], Bg)))
y[,,, g] <- array(rnorm(prod(n)), dim = n) + mu
}
##fit model for range of lambda and zeta
system.time(fit <- softmaximin(x, y, zeta = c(1, 10, 100), penalty = "lasso",
alg = "npg"))
betahat <- fit$coef
##estimated common effects for specific lambda and zeta
zetano <- 1
modelno <- dim(betahat[[zetano]])[2]
m <- min(betahat[[zetano]][, modelno], common_effects)
M <- max(betahat[[zetano]][, modelno], common_effects)
plot(common_effects, type = "p", ylim = c(m, M), col = "red")
lines(betahat[[zetano]][ , modelno], type = "h")
#Array data and wavelets
##size of example
set.seed(42)
G <- 50; p <- n <- c(2^3, 2^4, 2^5);
##common features and effects
common_features <- rbinom(prod(p), 1, 0.1) #sparsity of comm. feat.
common_effects <- rnorm(prod(p), 0, 1) * common_features
##group response
y <- array(NA, c(n, G))
for(g in 1:G){
bg <- rnorm(prod(p), 0, 0.1) * (1 - common_features) + common_effects
Bg <- array(bg, p)
mu <- iwt(Bg)
y[,,, g] <- array(rnorm(prod(n), 0, 0.5), dim = n) + mu
}
##fit model for range of lambda and zeta
system.time(fit <- softmaximin(x = "la8", y, zeta = c(0.1, 1, 10),
penalty = "lasso", alg = "fista"))
betahat <- fit$coef
##estimated common effects for specific lambda and zeta
zetano <- 3
modelno <- dim(betahat[[zetano]])[2]
m <- min(betahat[[zetano]][, modelno], common_effects)
M <- max(betahat[[zetano]][, modelno], common_effects)
plot(common_effects, type = "p", ylim = c(m, M), col = "red")
lines(betahat[[zetano]][ , modelno], type = "h")