maximin {FRESHD}R Documentation

Maximin signal estimation

Description

Efficient procedure for solving the maximin estimation problem with identical design across groups, see (Lund, 2022).

Usage

maximin(y,
        x,
        penalty = "lasso", 
        alg ="aradmm",
        kappa = 0.99,
        nlambda = 30,
        lambda_min_ratio = 1e-04,
        lambda = NULL,
        penalty_factor = NULL,
        standardize = TRUE,
        tol = 1e-05,
        maxiter = 1000,
        delta = 1,
        gamma = 1,
        eta = 0.1,
        aux_par = NULL)

Arguments

y

Array of size n_1 \times\cdots\times n_d \times G containing the response values.

x

Either i) the design matrix, ii) a list containing the Kronecker components (2 or 3) if the design matrix has Kronecker structure or iii) a string indicating the name of the wavelet to use (see wt for options)

penalty

string specifying the penalty type. Possible values are "lasso".

alg

string specifying the optimization algorithm. Possible values are "admm", "aradmm", "tos", "tosacc".

kappa

Strictly positive float controlling the maximum sparsity in the solution. Only used with ADMM type algorithms. Should be between 0 and 1.

nlambda

Positive integer giving the number of lambda values. Used when lambda is not specified.

lambda_min_ratio

strictly positive float giving the smallest value for lambda, as a fraction of \lambda_{max}; the (data dependent) smallest value for which all coefficients are zero. Used when lambda is not specified.

lambda

Sequence of strictly positive floats used as penalty parameters.

penalty_factor

A vector of length p containing positive floats that are multiplied with each element in lambda to allow differential penalization on the coefficients. For tensor models an array of size p_1 \times \cdots \times p_d.

standardize

Boolean indicating if response y should be scaled. Default is TRUE to avoid numerical problems.

tol

Strictly positive float controlling the convergence tolerance.

maxiter

Positive integer giving the maximum number of iterations allowed for each lambda value.

delta

Positive float controlling the step size in the algorithm.

gamma

Positive float controlling the relaxation parameter in the algorithm. Should be between 0 and 2.

eta

Scaling parameter for the step size in the accelerated TOS algorithm. Should be between 0 and 1.

aux_par

Auxiliary parameters for the algorithms.

Details

For n heterogeneous data points divided into G equal sized groups with m<n data points in each, let y_g=(y_{g,1},\ldots,y_{g,m}) denote the vector of observations in group g. For a m\times p design matrix X consider the model

y_g=Xb_g+\epsilon_g

for b_g a random group specific coefficient vector and \epsilon_g an error term, see Meinshausen and Buhlmann (2015). For the model above following Lund (2022) this package solves the maximin estimation problem

\min_{\beta} -\hat V_g(\beta)) + \lambda\Vert\beta\Vert_1,\lambda \ge 0

where

\hat V_g(\beta):=\frac{1}{n}(2\beta^\top X^\top y_g - \beta^\top X^\top X\beta),

is the empirical explained variance in group g. See Lund, 2022 for more details and references.

The package solves the problem using different algorithms depending on X:

i) If X is orthogonal (e.g. the inverse wavelet transform) either an ADMM algorithm (standard or relaxed) or an adaptive relaxed ADMM (ARADMM) with auto tuned step size is used, see Xu et al (2017).

ii) For general X, a three operator splitting (TOS) algorithm is implemented, see Damek and Yin (2017). Note if the design is tensor structured, X = \bigotimes_{i=1}^d X_i for d\in\{1, 2,3\}, the procedure accepts a list containing the tensor components (matrices).

Value

An object with S3 Class "FRESHD".

spec

A string indicating the array dimension (1, 2 or 3) and the penalty.

coef

A p_1\cdots p_d \times nlambda matrix containing the estimates of the model coefficients (beta) for each lambda-value.

lambda

A vector containing the sequence of penalty values used in the estimation procedure.

df

The number of nonzero coefficients for each value of lambda.

dimcoef

A vector giving the dimension of the model coefficient array \beta.

dimobs

A vector giving the dimension of the observation (response) array Y.

dim

Integer indicating the dimension of of the array model. Equal to 1 for non array.

wf

A string indicating the wavelet name if used.

diagnostics

A list where item iter is a vector containing the number of iterations for each lambda value for which the algorithm converged. Item stop_maxiter is 1 if maximum iterations is reached otherwise zero. Item stop_sparse is 1 if maximum sparsity is reached otherwise zero.

Author(s)

Adam Lund

Maintainer: Adam Lund, adam.lund@math.ku.dk

References

Lund, Adam (2022). Fast Robust Signal Estimation for Heterogeneous data. In preparation.

Meinshausen, N and P. Buhlmann (2015). Maximin effects in inhomogeneous large-scale data. The Annals of Statistics. 43, 4, 1801-1830.

Davis, Damek and Yin, Wotao, (2017). A three-operator splitting scheme and its optimization applications. Set-valued and variational analysis. 25, 4, 829-858.

Xu, Zheng and Figueiredo, Mario AT and Yuan, Xiaoming and Studer, Christoph and Goldstein, Tom (2017). Adaptive relaxed admm: Convergence theory and practical implementation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 7389-7398.

Examples

## general 3d tensor design matrix
set.seed(42)
G <- 20; n <- c(65, 26, 13)*3; p <- c(13, 5, 4)*3
sigma <- 1

##marginal design matrices (Kronecker components)
x <- list()
for(i in 1:length(n)){x[[i]] <- matrix(rnorm(n[i] * p[i], 0, sigma), n[i], p[i])}

##common features and effects
common_features <- rbinom(prod(p), 1, 0.1)
common_effects <- rnorm(prod(p), 0, 0.1) * common_features

##simulate group response
y <- array(NA, c(n, G))
for(g in 1:G){
bg <- rnorm(prod(p), 0, 0.1) * (1 - common_features) + common_effects
Bg <- array(bg, p)
mu <- RH(x[[3]], RH(x[[2]], RH(x[[1]], Bg)))
y[,,, g] <- array(rnorm(prod(n), 0, var(mu)), dim = n) + mu
}

##fit model for range of lambda
system.time(fit <- maximin(y, x, penalty = "lasso", alg = "tosacc"))

##estimated common effects for specific lambda
modelno <- 10
betafit <- fit$coef[, modelno]
plot(common_effects, type = "h", ylim = range(betafit, common_effects), col = "red")
lines(betafit, type = "h")

##size of example
set.seed(42)
G <- 50; p <- n <- c(2^6, 2^5, 2^6);

##common features and effects
common_features <- rbinom(prod(p), 1, 0.1) #sparsity of comm. feat.
common_effects <- rnorm(prod(p), 0, 1) * common_features

##group response
y <- array(NA, c(n, G))
for(g in 1:G){
bg <- rnorm(prod(p), 0, 0.1) * (1 - common_features) + common_effects
Bg <- array(bg, p)
mu <- iwt(Bg)
y[,,, g] <- array(rnorm(prod(n),0, 0.5), dim = n) + mu
}

##orthogonal wavelet design with 1d data
G = 50; N1 = 2^10; p = 101; J = 2; amp = 20; sigma2 = 10
y <- matrix(0, N1, G)
z <- seq(0, 2, length.out = N1)
sig <- cos(10 * pi * z) + 1.5 * sin(5 * pi * z)

for (i in 1:G){
freqs <- sample(1:100, size = J, replace = TRUE)
y[, i] <- sig * 2 + rnorm(N1, sd = sqrt(sigma2))
for (j in 1:J){
y[, i] <- y[, i] + amp * sin(freqs[j] * pi * z + runif(1, -pi, pi))
}
}

system.time(fit <- maximin(y, "la8", alg = "aradmm", kappa = 0.9))
mmy <- predict(fit, "la8")
plot(mmy[,1], type = "l")
lines(sig, col = "red")


[Package FRESHD version 1.0 Index]