test {FWDselect}R Documentation

Bootstrap based test for covariate selection

Description

Function that applies a bootstrap based test for covariate selection. It helps to determine the number of variables to be included in the model.

Usage

test(x, y, method = "lm", family = "gaussian", nboot = 50,
  speedup = TRUE, qmin = NULL, unique = FALSE, q = NULL,
  bootseed = NULL, cluster = TRUE, ncores = NULL)

Arguments

x

A data frame containing all the covariates.

y

A vector with the response values.

method

A character string specifying which regression method is used, i.e., linear models ("lm"), generalized additive models.

family

A description of the error distribution and link function to be used in the model: ("gaussian"), ("binomial") or ("poisson").

nboot

Number of bootstrap repeats.

speedup

A logical value. If TRUE (default), the testing procedure is computationally efficient since it considers one more variable to fit the alternative model than the number of variables used to fit the null. If FALSE, the fit of the alternative model is based on considering the best subset of variables of size greater than q, the one that minimizes an information criterion. The size of this subset must be given by the user filling the argument qmin.

qmin

By default NULL. If speedup is FALSE, qmin is an integer number selected by the user. To help you select this argument, it is recommended to visualize the graphical output of the plot function and choose the number q which minimizes the curve.

unique

A logical value. By default FALSE. If TRUE, the test is performed only for one null hypothesis, given by the argument q.

q

By default NULL. If unique is TRUE, q is the size of the subset of variables to be tested.

bootseed

Seed to be used in the bootstrap procedure.

cluster

A logical value. If TRUE (default), the testing procedure is parallelized.

ncores

An integer value specifying the number of cores to be used in the parallelized procedure. If NULL (default), the number of cores to be used is equal to the number of cores of the machine - 1.

Details

In a regression framework, let X_1, X_2, \ldots, X_p, a set of p initial variables and Y the response variable, we propose a procedure to test the null hypothesis of q significant variables in the model –q effects not equal to zero– versus the alternative in which the model contains more than q variables. Based on the general model

Y=m(\textbf{X})+\varepsilon \quad {\rm{where}} \quad m(\textbf{X})= m_{1}(X_{1})+m_{2}(X_{2})+\ldots+m_{p}(X_{p})

the following strategy is considered: for a subset of size q, considerations will be given to a test for the null hypothesis

H_{0} (q): \sum_{j=1}^p I_{\{m_j \ne 0\}} \le q

vs. the general hypothesis

H_{1} : \sum_{j=1}^p I_{\{m_j \ne 0\}} > q

Value

A list with two objects. The first one is a table containing

Hypothesis

Number of the null hypothesis tested

Statistic

Value of the T statistic

pvalue

pvalue obtained in the testing procedure

Decision

Result of the test for a significance level of 0.05

The second argument nvar indicates the number of variables that have to be included in the model.

Note

The detailed expression of the formulas are described in HTML help http://cran.r-project.org/web/packages/FWDselect/FWDselect.pdf

Author(s)

Marta Sestelo, Nora M. Villanueva and Javier Roca-Pardinas.

References

Sestelo, M., Villanueva, N. M. and Roca-Pardinas, J. (2013). FWDselect: an R package for selecting variables in regression models. Discussion Papers in Statistics and Operation Research, University of Vigo, 13/01.

See Also

selection

Examples

library(FWDselect)
data(diabetes)
x = diabetes[ ,2:11]
y = diabetes[ ,1]
test(x, y, method = "lm", cluster = FALSE, nboot = 5)

## for speedup = FALSE
# obj2 = qselection(x, y, qvector = c(1:9), method = "lm",
# cluster = FALSE)
# plot(obj2) # we choose q = 7 for the argument qmin
# test(x, y, method = "lm", cluster = FALSE, nboot = 5,
# speedup = FALSE, qmin = 7)


[Package FWDselect version 2.1.0 Index]