SMME {SMME}R Documentation

Soft Maximin Estimation for Large Scale Heterogenous Data

Description

Efficient procedure for solving the Lasso or SCAD penalized soft maximin problem for large scale_y data. This software implements two proximal gradient based algorithms (NPG and FISTA) to solve different forms of the soft maximin problem from Lund et al., 2022. 1) For general group specific design the soft maximin problem is solved using the NPG algorithm. 2) For fixed identical d-array-tensor design across groups, where d = 1, 2, 3, the estimation procedure uses either the FISTA algorithm or the NPG algorithm and is implemented for the following two cases; i) For a tensor design matrix the algorithms use array arithmetic to speed up design matrix multiplications using only the tensor components ii) For a wavelet design matrix the algorithms use the pyramid algorithm to completely avoid the design matrix and speed up design matrix multiplications. Multi-threading is possible when openMP is available for R.

Note this package SMME replaces the SMMA package.

Usage

softmaximin(x,
            y,
            zeta,
            penalty = c("lasso", "scad"),
            alg = c("npg", "fista"),
            nlambda = 30,
            lambda.min.ratio = 1e-04,
            lambda = NULL,
            scale_y = 1,
            penalty.factor = NULL,
            reltol = 1e-05,
            maxiter = 1000,
            steps = 1,
            btmax = 100,
            c = 0.0001,
            tau = 2,
            M = 4,
            nu = 1,
            Lmin = 0,
            lse = TRUE,
            nthreads = 2)

Arguments

x

Either a list containing the G group specific design matrices of sizes n_i \times p_i (general model), a list containing the d (d \in \{ 1, 2, 3\}) tensor components (tensor array model) or a string indicating which wavelet design to use (wavelet array model), see wt for options.

y

list containing the G group specific response vectors of sizes n_i \times 1. Alternatively for a model with identical tensor design across G groups, yis an array of size n_1 \times\cdots\times n_d \times G (d \in \{ 1, 2, 3\}) containing the response values.

zeta

vector of strictly positive floats controlling the softmaximin approximation accuracy. When length(zeta) > 1 the procedure will distribute the computations using the nthreads parameter below when openMP is available.

penalty

string specifying the penalty type. Possible values are "lasso", "scad".

alg

string specifying the optimization algorithm. Possible values are "npg", "fista".

nlambda

positive integer giving the number of lambda values. Used when lambda is not specified.

lambda.min.ratio

strictly positive float giving the smallest value for lambda, as a fraction of \lambda_{max}; the (data dependent) smallest value for which all coefficients are zero. Used when lambda is not specified.

lambda

A sequence of strictly positive floats used as penalty parameters.

scale_y

strictly positive number that the response y is multiplied with.

penalty.factor

a length p vector of positive floats that are multiplied with each element in lambda to allow differential penalization on the coefficients. For tensor models an array of size p_1 \times \cdots \times p_d.

reltol

strictly positive float giving the convergence tolerance.

maxiter

positive integer giving the maximum number of iterations allowed for each lambda value.

steps

strictly positive integer giving the number of steps used in the multi-step adaptive lasso algorithm for non-convex penalties. Automatically set to 1 when penalty = "lasso".

btmax

strictly positive integer giving the maximum number of backtracking steps allowed in each iteration. Default is btmax = 100.

c

strictly positive float used in the NPG algorithm. Default is c = 0.0001.

tau

strictly positive float used to control the stepsize for NPG. Default is tau = 2.

M

positive integer giving the look back for the NPG. Default is M = 4.

nu

strictly positive float used to control the stepsize. A value less that 1 will decrease the stepsize and a value larger than one will increase it. Default is nu = 1.

Lmin

non-negative float used by the NPG algorithm to control the stepsize. For the default Lmin = 0 the maximum step size is the same as for the FISTA algorithm.

lse

logical variable indicating whether to use the log-sum-exp-loss. TRUE is default and yields the loss below and FALSE yields the exponential of this.

nthreads

integer giving the number of threads to use when openMP is available. Default is 2.

Details

Consider modeling heterogeneous data y_1,\ldots, y_n by dividing it into G groups \mathbf{y}_g = (y_1, \ldots, y_{n_g}), g \in \{ 1,\ldots, G\} and then using a linear model

\mathbf{y}_g = \mathbf{X}_gb_g + \epsilon_g, \quad g \in \{1,\ldots, G\},

to model the group response. Then b_g is a group specific p\times 1 coefficient, \mathbf{X}_g an n_g\times p group design matrix and \epsilon_g an n_g\times 1 error term. The objective is to estimate a common coefficient \beta such that \mathbf{X}_g\beta is a robust and good approximation to \mathbf{X}_gb_g across groups.

Following Lund et al., 2022, this objective may be accomplished by solving the soft maximin estimation problem

\min_{\beta}\frac{1}{\zeta}\log\bigg(\sum_{g = 1}^G \exp(-\zeta \hat V_g(\beta))\bigg) + \lambda \Vert\beta\Vert_1, \quad \zeta > 0,\lambda \geq 0.

Here \zeta essentially controls the amount of pooling across groups (\zeta \sim 0 effectively ignores grouping and pools observations) and

\hat V_g(\beta):=\frac{1}{n_g}(2\beta^\top \mathbf{X}_g^\top \mathbf{y}_g-\beta^\top \mathbf{X}_g^\top \mathbf{X}_g\beta),

is the empirical explained variance, see Lund et al., 2022 for more details and references.

The function softmaximin solves the soft maximin estimation problem in large scale settings for a sequence of penalty parameters \lambda_{max}>\ldots >\lambda_{min}>0 and a sequence of strictly positive softmaximin parameters \zeta_1, \zeta_2,\ldots.

The implementation also solves the problem above with the penalty given by the SCAD penalty, using the multiple step adaptive lasso procedure to loop over the inner proximal algorithm.

Two optimization algorithms are implemented in the SMME packages; a non-monotone proximal gradient (NPG) algorithm and a fast iterative soft thresholding algorithm (FISTA).

The implementation is particularly efficient for models where the design is identical across groups i.e. \mathbf{X}_g = \mathbf{X} \forall g \in \{1, \ldots, G\} in the following two cases: i) first if \mathbf{X} has tensor structure i.e.

\mathbf{X} = \bigotimes_{i=1}^d \mathbf{M}_i

for marginal n_i\times p_i design matrices \mathbf{M}_1,\ldots, \mathbf{M}_d , d \in \{ 1, 2, 3\}, y is a d + 1 dimensional response array and x is a list containing the d marginal matrices \mathbf{M}_1,\ldots, \mathbf{M}_d. In this case softmaximin solves the soft maximin problem using minimal memory by way of tensor optimized arithmetic, see also RH. ii) second, if the design matrix \mathbf{X} is the inverse matrix of an orthogonal wavelet transform softmaximin solves the soft maximin problem given the d + 1 dimensional response array y and x the name of the wavelet family wt, using the pyramid algorithm to compute multiplications involving \mathbf{X}.

Note that when multiple values for \zeta is provided it is possible to distribute the computations across CPUs if openMP is available.

Value

An object with S3 Class "SMME".

spec

A string indicating the array dimension (1, 2 or 3) and the penalty.

coef

A length(zeta)-list of p \times nlambda matrices containing the estimates of the model coefficients (\beta) for each lambda-value for which the procedure converged.

lambda

A length(zeta)-list vectors containing the sequence of penalty values used in the estimation procedure for which the procedure converged.

df

A length(zeta)-list of vectors indicating the nonzero model coefficients for each value of lambda for which the procedure converged.

dimcoef

An integer giving the number p of model parameters. For array data a vector giving the dimension of the model coefficient array \beta.

dimobs

An integer giving the number of observations. For array data a vector giving the dimension of the observation (response) array Y.

dim

Integer indicating the dimension of of the array model. Equal to 1 for non array.

wf

A string indicating the wavelet name if used.

diagnostics

A list of length 3. Item iter is a length(zeta)-list of vectors containing the number of iterations for each lambda value for which the algorithm converged. Item bt_iter is a length(zeta) vector with total number of backtracking steps performed across all (converged) lambda values for given zeta value. Key bt_enter is a length(zeta) vector with total number of times backtracking is initiated across all (converged) lambda values for given zeta value.

endmod

Vector of length length(zeta) with the number of models fitted for each zeta.

Stops

Convergence indicators.

Author(s)

Adam Lund

Maintainer: Adam Lund, adam.lund@math.ku.dk

References

Lund, A., S. W. Mogensen and N. R. Hansen (2022). Soft Maximin Estimation for Heterogeneous Data. Scandinavian Journal of Statistics, vol. 49, no. 4, pp. 1761-1790. url = https://doi.org/10.1111/sjos.12580

Examples

#Non-array data

##size of example
set.seed(42)
G <- 10; n <- sample(100:500, G); p <- 60
x <- y <- list()

##group design matrices
for(g in 1:G){x[[g]] <- matrix(rnorm(n[g] * p), n[g], p)}

##common features and effects
common_features <- rbinom(p, 1, 0.1) #sparsity of comm. feat.
common_effects <- rnorm(p) * common_features

##group response
for(g in 1:G){
bg <- rnorm(p, 0, 0.5) * (1 - common_features) + common_effects
mu <- x[[g]] %*% bg
y[[g]] <- rnorm(n[g]) + mu
}

##fit model for range of lambda and zeta
system.time(fit <- softmaximin(x, y, zeta = c(0.1, 1), penalty = "lasso", alg = "npg"))
betahat <- fit$coef

##estimated common effects for specific lambda and zeta
zetano <- 2
modelno <- dim(betahat[[zetano]])[2]
m <- min(betahat[[zetano]][ , modelno], common_effects)
M <- max(betahat[[zetano]][ , modelno], common_effects)
plot(common_effects, type = "p", ylim = c(m, M), col = "red")
lines(betahat[[zetano]][ , modelno], type = "h")

#Array data
##size of example
set.seed(42)
G <- 50; n <- c(30, 20, 10); p <- c(7, 5, 4)

##marginal design matrices (Kronecker components)
x <- list()
for(i in 1:length(n)){x[[i]] <- matrix(rnorm(n[i] * p[i]), n[i], p[i])}

##common features and effects
common_features <- rbinom(prod(p), 1, 0.1) #sparsity of comm. feat.
common_effects <- rnorm(prod(p),0,0.1) * common_features

##group response
 y <- array(NA, c(n, G))
for(g in 1:G){
bg <- rnorm(prod(p), 0, .1) * (1 - common_features) + common_effects
Bg <- array(bg, p)
mu <- RH(x[[3]], RH(x[[2]], RH(x[[1]], Bg)))
y[,,, g] <- array(rnorm(prod(n)), dim = n) + mu
}

##fit model for range of lambda and zeta
system.time(fit <- softmaximin(x, y, zeta = c(1, 10, 100), penalty = "lasso",
            alg = "npg"))
betahat <- fit$coef

##estimated common effects for specific lambda and zeta
zetano <- 1
modelno <- dim(betahat[[zetano]])[2]
m <- min(betahat[[zetano]][, modelno], common_effects)
M <- max(betahat[[zetano]][, modelno], common_effects)
plot(common_effects, type = "p", ylim = c(m, M), col = "red")
lines(betahat[[zetano]][ , modelno], type = "h")

#Array data and wavelets
##size of example
set.seed(42)
G <- 50; p <- n <- c(2^3, 2^4, 2^5);

##common features and effects
common_features <- rbinom(prod(p), 1, 0.1) #sparsity of comm. feat.
common_effects <- rnorm(prod(p), 0, 1) * common_features

##group response
y <- array(NA, c(n, G))
for(g in 1:G){
bg <- rnorm(prod(p), 0, 0.1) * (1 - common_features) + common_effects
Bg <- array(bg, p)
mu <- iwt(Bg)
y[,,, g] <- array(rnorm(prod(n), 0, 0.5), dim = n) + mu
}

##fit model for range of lambda and zeta
system.time(fit <- softmaximin(x = "la8", y, zeta = c(0.1, 1, 10),
                                penalty = "lasso", alg = "fista"))
betahat <- fit$coef

##estimated common effects for specific lambda and zeta
zetano <- 3
modelno <- dim(betahat[[zetano]])[2]
m <- min(betahat[[zetano]][, modelno], common_effects)
M <- max(betahat[[zetano]][, modelno], common_effects)
plot(common_effects, type = "p", ylim = c(m, M), col = "red")
lines(betahat[[zetano]][ , modelno], type = "h")

[Package SMME version 1.1.1 Index]