prototest.multivariate {prototest} | R Documentation |
Perform Prototype or F tests for Significance of Groups of Predictors in the Multivariate Model
Description
Perform prototype or F tests for significance of groups of predictors in the multivariate model. Choose either exact or approximate likelihood ratio prototype tests (ELR) or (ALR) or F test or marginal screening prototype test. Options for selective or non-selective tests. Further options for non-sampling or hit-and-run reference distributions for selective tests.
Usage
prototest.multivariate(x, y, groups, test.group, type = c("ELR", "ALR", "F", "MS"),
selected.col = NULL, lambda, mu = NULL, sigma = 1,
hr.iter = 50000, hr.burn.in = 5000, verbose = FALSE, tol = 10^-8)
Arguments
x |
input matrix of dimension n-by-p, where p is the number of predictors over all predictor groups of interest. Will be mean centered and standardised before tests are performed. |
y |
response variable. Vector of length n, assumed to be quantitative. |
groups |
group membership of the columns of |
test.group |
group label for which we test nullity. Should be one of the values seen in |
type |
type of test to be performed. Can select one at a time. Options include the exact and approximate likelihood ratio prototype tests of Reid et al (2015) (ELR, ALR), the F test and the marginal screening prototype test of Reid and Tibshirani (2015) (MS). Default is ELR. |
selected.col |
preselected columns selected by the user. Vector of indices in the set {1, 2, ... p}. Used in conjunction with |
lambda |
regularisation parameter for the lasso fit. Same for each group. Must be supplied when at least one group has unspecified columns in |
mu |
mean parameter for the response. See Details below. If supplied, it is first subtracted from the response to yield a zero-mean (at the population level) vector for which we proceed with testing. If |
sigma |
error standard deviation for the response. See Details below. Must be supplied. If not, it is assumed to be 1. Required for computation of some of the test statistics. |
hr.iter |
number of hit-and-run samples required in the reference distribution of the a selective test. Applies only if |
hr.burn.in |
number of burn-in hit-and-run samples. These are generated first so as to make subsequent hit-and-run realisations less dependent on the observed response. Samples are then discarded and do not inform the null reference distribution. |
verbose |
should progress be printed? |
tol |
convergence threshold for iterative optimisation procedures. |
Details
The model underpinning each of the tests is
y = \mu + \sum_{k = 1}^K \theta_k\cdot\hat{y}_k + \epsilon
where \epsilon \sim N(0, \sigma^2I)
and K is the number of predictor groups. \hat{y}_k
depends on the particular test considered.
In particular, for the ELR, ALR and F tests, we have \hat{y}_k = P_{M_k}\left(y-\mu\right)
, where P_{M_k} = X_{M_k}\left(X_{M_k}^\top X_{M_k}\right)^{-1}X_{M_k}^\top
. X_M
is the input matrix reduced to the columns with indices in the set M. M_k
is the set of indices selected from considering group k of predictors in isolation. This set is either provided by the user (via selected.col
) or is selected automatically (if selected.col
is NULL
). If the former, a non-selective test is performed; if the latter, a selective test is performed, with the restrictions Ay \leq b
, as set out in Lee et al (2015) and stacked as in Reid and Tibshirani (2015).
For the marginal screening prototype (MS) test, \hat{y}_k = x_{j^*}
where x_j
is the j^{th}
column of x
and j^* = {\rm argmax}_{j \in C_k} |x_j^\top y|
, where C_k
is the set of indices in the overall predictor set corresponding to predictors in the k^{th}
group.
All tests test the null hypothesis H_0: \theta_{k^*} = 0
, where k^*
is supplied by the user via test.group
. Details of each are described in Reid et al (2015).
Value
A list with the following four components:
ts |
The value of the test statistic on the observed data. |
p.val |
Valid p-value of the test. |
selected.col |
Vector with columns selected for prototype formation in the test. If initially |
y.hr |
Matrix with hit-and-run replications of the response. If sampled selective test was not performed, this will be |
Author(s)
Stephen Reid
References
Reid, S. and Tibshirani, R. (2015) Sparse regression and marginal testing using cluster prototypes. http://arxiv.org/pdf/1503.00334v2.pdf. Biostatistics doi: 10.1093/biostatistics/kxv049
Reid, S., Taylor, J. and Tibshirani, R. (2015) A general framework for estimation and inference from clusters of features. Available online: http://arxiv.org/abs/1511.07839.
See Also
Examples
require (prototest)
### generate data
set.seed (12345)
n = 100
p = 80
X = matrix (rnorm(n*p, 0, 1), ncol=p)
beta = rep(0, p)
beta[1:3] = 0.1 # three signal variables: number 1, 2, 3
signal = apply(X, 1, function(col){sum(beta*col)})
intercept = 3
y = intercept + signal + rnorm (n, 0, 1)
### treat all columns as if in same group and test for signal
# non-selective ELR test with nuisance intercept
elr = prototest.univariate (X, y, "ELR", selected.col=1:5)
# selective F test with nuisance intercept; non-sampling
f.test = prototest.univariate (X, y, "F", lambda=0.01, hr.iter=0)
print (elr)
print (f.test)
### assume variables occur in 4 equally sized groups
num.groups = 4
groups = rep (1:num.groups, each=p/num.groups)
# selective ALR test -- select columns 21-25 in 2nd group; test for signal in 1st; hit-and-run
alr = prototest.multivariate(X, y, groups, 1, "ALR", 21:25, lambda=0.005, hr.iter=20000)
# non-selective MS test -- specify first column in each group; test for signal in 1st
ms = prototest.multivariate(X, y, groups, 1, "MS", c(1,21,41,61))
print (alr)
print (ms)