scglrThemeBackward {SCGLR} | R Documentation |
Theme Backward selection
Description
Perform component selection by cross-validation backward approach
Usage
scglrThemeBackward(formula, data, H, family, size = NULL,
weights = NULL, offset = NULL, na.action = na.omit,
crit = list(), method = methodSR(), kfolds = 10, type = "mspe",
st = FALSE)
Arguments
formula |
an object of class " |
data |
data frame. |
H |
vector of R integer. Number of components to keep for each theme |
family |
a vector of character of the same length as the number of dependent variables: "bernoulli", "binomial", "poisson" or "gaussian" is allowed. |
size |
describes the number of trials for the binomial dependent variables. A (number of statistical units * number of binomial dependent variables) matrix is expected. |
weights |
weights on individuals (not available for now) |
offset |
used for the poisson dependent variables. A vector or a matrix of size: number of observations * number of Poisson dependent variables is expected. |
na.action |
a function which indicates what should happen when the data contain NAs. The default is set to |
crit |
a list of two elements : maxit and tol, describing respectively the maximum number of iterations and the tolerance convergence criterion for the Fisher scoring algorithm. Default is set to 50 and 10e-6 respectively. |
method |
structural relevance criterion. Object of class "method.SCGLR"
built by |
kfolds |
number of folds - default is 10. Although nfolds can be as large as the sample size (leave-one-out CV),
it is not recommended for large datasets. Smallest value allowable is nfolds=2
Models for theme are specified symbolically. A model as the form |
type |
loss function to use for cross-validation. Currently six options are available depending on whether the responses are of the same distribution family. If the responses are all bernoulli distributed, then the prediction performance may be measured through the area under the ROC curve: type = "auc" In any other case one can choose among the following five options ("likelihood","aic","aicc","bic","mspe"). |
st |
logical (FALSE) theme build and fit order. TRUE means random, FALSE means sequential (T1, ..., Tr) |
Value
a list containing the path followed along the selection process, the associated mean square predictor error and the best configuration.
Examples
## Not run:
library(SCGLR)
# load sample data
data(genus)
# get variable names from dataset
n <- names(genus)
n <-n[!n%in%c("geology","surface","lon","lat","forest","altitude")]
ny <- n[grep("^gen",n)] # Y <- names that begins with "gen"
nx1 <- n[grep("^evi",n)] # X <- remaining names
nx2 <- n[-c(grep("^evi",n),grep("^gen",n))]
form <- multivariateFormula(ny,nx1,nx2,A=c("geology"))
fam <- rep("poisson",length(ny))
testcv <- scglrThemeBackward(form,data=genus,H=c(2,2),family=fam,offset = genus$surface,kfolds=2)
Cross-validation pathway
testcv$H_path
Plot criterion
plot(testcv$cv_path)
Best combination
testcv$H_best
## End(Not run)