rXb {hdi}R Documentation

Generate Data Design Matrix X and Coefficient Vector \beta

Description

Generate a random design matrix X and coefficient vector \beta useful for simulations of (high dimensional) linear models. In particular, the function rXb() can be used to exactly recreate the reference linear model datasets of the hdi paper.

Usage

rXb(n, p, s0,
    xtype = c("toeplitz", "exp.decay", "equi.corr"),
    btype = "U[-2,2]",
    permuted = FALSE, iteration = NA, do2S = TRUE,
    x.par = switch(xtype,
                   "toeplitz"  = 0.9,
                   "equi.corr" = 0.8,
                   "exp.decay" = c(0.4, 5)),
   verbose = TRUE)

rX(n, p, xtype, permuted, do2S = TRUE,
   par = switch(xtype,
                "toeplitz"  = 0.9,
                "equi.corr" = 0.8,
                "exp.decay" = c(0.4, 5)))

Arguments

n

integer; the sample size n (paper had always n = 100).

p

integer; the number of coefficients in the linear model. (paper had always p = 500).

s0

integer number of nonzero coefficients desired in the model; hence at most p.

xtype

a character string specifying the type of design matrix one wants to generate. Must be one of "toeplitz", "equi.corr" or "exp.decay".

btype

a character string specifying the type of nonzero coefficients (“beta”) one wants to generate. In the hdi paper, this has been one of "U[-2,2]", "U[0,2]", "U[0,4]", "bfix1", "bfix2" and "bfix10". In general, any string of the form "U[a,b]" or "bfix<c>" is allowed, where a, b, and <c> must be numbers (with a \le b).

permuted

logical specifying if the columns of the design matrix should be permuted.

iteration

integer or NA specifying if seeds should be set to generate reproducible realizations of the design type and coefficients type. NA corresponds to not setting seeds. Iteration numbers 1 to 50 correspond to the setups from the paper. If a seed is set, the original .Random.seed at the point of entering the function is saved and is restored upon exit of the data generation. If NA, the current .Random.seed is taken as usual in R.

do2S

logical indicating if in the case of xtypes "toeplitz" or "equi.corr", the p \times p covariance matrix should be inverted twice. Must be true, to regenerate the X matrices from the hdi paper exactly “to the last bit”.

x.par, par

the parameters to be used for the design matrix. Must be a numeric vector of length one or two. The default uses the parameters also used in the hdi paper.

verbose

should the function give a message if seeds are being set? (logical).

Details

Generation of the design matrix X:
For all xtype's, the X matrix will be multivariate normal, with mean zero and (co)variance matrix \Sigma = C determined from xtype, x.par and p as follows:

xtype = "toeplitz":

C <- par ^ abs(toeplitz(0:(p-1)))

xtype = "equi.corr":

\Sigma_{i,j} = \code{par} for i \ne j, and = 1 for i = j, i.e., on the diagonal.

xtype = "exp.decay":

C <- solve(par[1] ^ abs(toeplitz(0:(p-1)) / par[2]))

Value

For rXb():

A list with components

x

the generated n \times p design matrix X.

beta

the generated coefficient vector \beta (‘beta’).

For rX():

the generated n \times p design matrix X.

Author(s)

Ruben Dezeure dezeure@stat.math.ethz.ch

References

Dezeure, R., Bühlmann, P., Meier, L. and Meinshausen, N. (2015) High-dimensional inference: confidence intervals, p-values and R-software hdi. Statistical Science 30, 533–558.

Examples

## Generate the first realization of the linear model with design matrix
## type Toeplitz and coefficients type uniform between -2 and 2

dset <- rXb(n = 80, p = 20, s0 = 3,
            xtype = "toeplitz", btype = "U[-2,2]")
x <- dset$x
beta <- dset$beta

## generate 100 response vectors of this linear model
y <- as.vector( x %*% beta ) + replicate(100, rnorm(nrow(x)))

## Use  'beta_min' fulfilling  beta's  (non standard 'btype'):
str(ds2 <- rXb(n = 50, p = 12, s0 = 3,
               xtype = "exp.decay", btype = "U[0.1, 5]"))

## Generate a design matrix of type "toeplitz"
set.seed(3) # making it reproducible
X3 <- rX(n = 800, p = 500, xtype = "toeplitz", permuted = FALSE)

## permute the columns
set.seed(3)
Xp <- rX(n = 800, p = 500, xtype = "toeplitz", permuted = TRUE)

[Package hdi version 0.1-9 Index]