test {FWDselect} | R Documentation |
Bootstrap based test for covariate selection
Description
Function that applies a bootstrap based test for covariate selection. It helps to determine the number of variables to be included in the model.
Usage
test(x, y, method = "lm", family = "gaussian", nboot = 50,
speedup = TRUE, qmin = NULL, unique = FALSE, q = NULL,
bootseed = NULL, cluster = TRUE, ncores = NULL)
Arguments
x |
A data frame containing all the covariates. |
y |
A vector with the response values. |
method |
A character string specifying which regression method is used,
i.e., linear models ( |
family |
A description of the error distribution and link function to be
used in the model: ( |
nboot |
Number of bootstrap repeats. |
speedup |
A logical value. If |
qmin |
By default |
unique |
A logical value. By default |
q |
By default |
bootseed |
Seed to be used in the bootstrap procedure. |
cluster |
A logical value. If |
ncores |
An integer value specifying the number of cores to be used
in the parallelized procedure. If |
Details
In a regression framework, let X_1, X_2, \ldots, X_p
, a set of
p
initial variables and Y
the response variable, we propose a
procedure to test the null hypothesis of q
significant variables in
the model –q
effects not equal to zero– versus the alternative in
which the model contains more than q
variables. Based on the general
model
Y=m(\textbf{X})+\varepsilon \quad {\rm{where}} \quad
m(\textbf{X})= m_{1}(X_{1})+m_{2}(X_{2})+\ldots+m_{p}(X_{p})
the following
strategy is considered: for a subset of size q
, considerations will be
given to a test for the null hypothesis
H_{0} (q): \sum_{j=1}^p
I_{\{m_j \ne 0\}} \le q
vs. the general hypothesis
H_{1} :
\sum_{j=1}^p I_{\{m_j \ne 0\}} > q
Value
A list with two objects. The first one is a table containing
Hypothesis |
Number of the null hypothesis tested |
Statistic |
Value of the T statistic |
pvalue |
pvalue obtained in the testing procedure |
Decision |
Result of the test for a significance level of 0.05 |
The second argument nvar
indicates the number of variables that
have to be included in the model.
Note
The detailed expression of the formulas are described in HTML help http://cran.r-project.org/web/packages/FWDselect/FWDselect.pdf
Author(s)
Marta Sestelo, Nora M. Villanueva and Javier Roca-Pardinas.
References
Sestelo, M., Villanueva, N. M. and Roca-Pardinas, J. (2013). FWDselect: an R package for selecting variables in regression models. Discussion Papers in Statistics and Operation Research, University of Vigo, 13/01.
See Also
Examples
library(FWDselect)
data(diabetes)
x = diabetes[ ,2:11]
y = diabetes[ ,1]
test(x, y, method = "lm", cluster = FALSE, nboot = 5)
## for speedup = FALSE
# obj2 = qselection(x, y, qvector = c(1:9), method = "lm",
# cluster = FALSE)
# plot(obj2) # we choose q = 7 for the argument qmin
# test(x, y, method = "lm", cluster = FALSE, nboot = 5,
# speedup = FALSE, qmin = 7)