R: Maximin signal estimation

maximin {FRESHD}

R Documentation

Maximin signal estimation

Description

Efficient procedure for solving the maximin estimation problem with identical design across groups, see (Lund, 2022).

Usage

maximin(y,
        x,
        penalty = "lasso", 
        alg ="aradmm",
        kappa = 0.99,
        nlambda = 30,
        lambda_min_ratio = 1e-04,
        lambda = NULL,
        penalty_factor = NULL,
        standardize = TRUE,
        tol = 1e-05,
        maxiter = 1000,
        delta = 1,
        gamma = 1,
        eta = 0.1,
        aux_par = NULL)

Arguments

`y`	Array of size `n_1 \times\cdots\times n_d \times G` containing the response values.
`x`	Either i) the design matrix, ii) a list containing the Kronecker components (2 or 3) if the design matrix has Kronecker structure or iii) a string indicating the name of the wavelet to use (see `wt` for options)
`penalty`	string specifying the penalty type. Possible values are `"lasso"`.
`alg`	string specifying the optimization algorithm. Possible values are `"admm", "aradmm", "tos", "tosacc"`.
`kappa`	Strictly positive float controlling the maximum sparsity in the solution. Only used with ADMM type algorithms. Should be between 0 and 1.
`nlambda`	Positive integer giving the number of `lambda` values. Used when lambda is not specified.
`lambda_min_ratio`	strictly positive float giving the smallest value for `lambda`, as a fraction of `\lambda_{max}`; the (data dependent) smallest value for which all coefficients are zero. Used when lambda is not specified.
`lambda`	Sequence of strictly positive floats used as penalty parameters.
`penalty_factor`	A vector of length `p` containing positive floats that are multiplied with each element in `lambda` to allow differential penalization on the coefficients. For tensor models an array of size `p_1 \times \cdots \times p_d`.
`standardize`	Boolean indicating if response `y` should be scaled. Default is TRUE to avoid numerical problems.
`tol`	Strictly positive float controlling the convergence tolerance.
`maxiter`	Positive integer giving the maximum number of iterations allowed for each `lambda` value.
`delta`	Positive float controlling the step size in the algorithm.
`gamma`	Positive float controlling the relaxation parameter in the algorithm. Should be between 0 and 2.
`eta`	Scaling parameter for the step size in the accelerated TOS algorithm. Should be between 0 and 1.
`aux_par`	Auxiliary parameters for the algorithms.

Details

For n heterogeneous data points divided into G equal sized groups with m<n data points in each, let y_g=(y_{g,1},\ldots,y_{g,m}) denote the vector of observations in group g. For a m\times p design matrix X consider the model

y_g=Xb_g+\epsilon_g

for b_g a random group specific coefficient vector and \epsilon_g an error term, see Meinshausen and Buhlmann (2015). For the model above following Lund (2022) this package solves the maximin estimation problem

\min_{\beta} -\hat V_g(\beta)) + \lambda\Vert\beta\Vert_1,\lambda \ge 0

where

\hat V_g(\beta):=\frac{1}{n}(2\beta^\top X^\top y_g - \beta^\top X^\top X\beta),

is the empirical explained variance in group g. See Lund, 2022 for more details and references.

The package solves the problem using different algorithms depending on X:

i) If X is orthogonal (e.g. the inverse wavelet transform) either an ADMM algorithm (standard or relaxed) or an adaptive relaxed ADMM (ARADMM) with auto tuned step size is used, see Xu et al (2017).

ii) For general X, a three operator splitting (TOS) algorithm is implemented, see Damek and Yin (2017). Note if the design is tensor structured, X = \bigotimes_{i=1}^d X_i for d\in\{1, 2,3\}, the procedure accepts a list containing the tensor components (matrices).

Value

An object with S3 Class "FRESHD".

`spec`	A string indicating the array dimension (1, 2 or 3) and the penalty.
`coef`	A `p_1\cdots p_d \times` `nlambda` matrix containing the estimates of the model coefficients (`beta`) for each `lambda`-value.
`lambda`	A vector containing the sequence of penalty values used in the estimation procedure.
`df`	The number of nonzero coefficients for each value of `lambda`.
`dimcoef`	A vector giving the dimension of the model coefficient array `\beta`.
`dimobs`	A vector giving the dimension of the observation (response) array `Y`.
`dim`	Integer indicating the dimension of of the array model. Equal to 1 for non array.
`wf`	A string indicating the wavelet name if used.
`diagnostics`	A list where item `iter` is a vector containing the number of iterations for each `lambda` value for which the algorithm converged. Item `stop_maxiter` is 1 if maximum iterations is reached otherwise zero. Item `stop_sparse` is 1 if maximum sparsity is reached otherwise zero.

Author(s)

Adam Lund

Maintainer: Adam Lund, adam.lund@math.ku.dk

References

Lund, Adam (2022). Fast Robust Signal Estimation for Heterogeneous data. In preparation.

Meinshausen, N and P. Buhlmann (2015). Maximin effects in inhomogeneous large-scale data. The Annals of Statistics. 43, 4, 1801-1830.

Davis, Damek and Yin, Wotao, (2017). A three-operator splitting scheme and its optimization applications. Set-valued and variational analysis. 25, 4, 829-858.

Xu, Zheng and Figueiredo, Mario AT and Yuan, Xiaoming and Studer, Christoph and Goldstein, Tom (2017). Adaptive relaxed admm: Convergence theory and practical implementation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 7389-7398.

Examples

## general 3d tensor design matrix
set.seed(42)
G <- 20; n <- c(65, 26, 13)*3; p <- c(13, 5, 4)*3
sigma <- 1

##marginal design matrices (Kronecker components)
x <- list()
for(i in 1:length(n)){x[[i]] <- matrix(rnorm(n[i] * p[i], 0, sigma), n[i], p[i])}

##common features and effects
common_features <- rbinom(prod(p), 1, 0.1)
common_effects <- rnorm(prod(p), 0, 0.1) * common_features

##simulate group response
y <- array(NA, c(n, G))
for(g in 1:G){
bg <- rnorm(prod(p), 0, 0.1) * (1 - common_features) + common_effects
Bg <- array(bg, p)
mu <- RH(x[[3]], RH(x[[2]], RH(x[[1]], Bg)))
y[,,, g] <- array(rnorm(prod(n), 0, var(mu)), dim = n) + mu
}

##fit model for range of lambda
system.time(fit <- maximin(y, x, penalty = "lasso", alg = "tosacc"))

##estimated common effects for specific lambda
modelno <- 10
betafit <- fit$coef[, modelno]
plot(common_effects, type = "h", ylim = range(betafit, common_effects), col = "red")
lines(betafit, type = "h")

##size of example
set.seed(42)
G <- 50; p <- n <- c(2^6, 2^5, 2^6);

##common features and effects
common_features <- rbinom(prod(p), 1, 0.1) #sparsity of comm. feat.
common_effects <- rnorm(prod(p), 0, 1) * common_features

##group response
y <- array(NA, c(n, G))
for(g in 1:G){
bg <- rnorm(prod(p), 0, 0.1) * (1 - common_features) + common_effects
Bg <- array(bg, p)
mu <- iwt(Bg)
y[,,, g] <- array(rnorm(prod(n),0, 0.5), dim = n) + mu
}

##orthogonal wavelet design with 1d data
G = 50; N1 = 2^10; p = 101; J = 2; amp = 20; sigma2 = 10
y <- matrix(0, N1, G)
z <- seq(0, 2, length.out = N1)
sig <- cos(10 * pi * z) + 1.5 * sin(5 * pi * z)

for (i in 1:G){
freqs <- sample(1:100, size = J, replace = TRUE)
y[, i] <- sig * 2 + rnorm(N1, sd = sqrt(sigma2))
for (j in 1:J){
y[, i] <- y[, i] + amp * sin(freqs[j] * pi * z + runif(1, -pi, pi))
}
}

system.time(fit <- maximin(y, "la8", alg = "aradmm", kappa = 0.9))
mmy <- predict(fit, "la8")
plot(mmy[,1], type = "l")
lines(sig, col = "red")

[Package FRESHD version 1.0 Index]