nested.fs {nestfs}R Documentation

Nested cross-validated forward selection

Description

Run nested forward selection starting from a set of variables or a model.

Usage

nested.fs(formula, data, family, folds, ...)

nested.forward.selection(x, y, init.model, family, folds, ...)

Arguments

formula

An object of class formula (or one that can be coerced to that class) that describes the baseline model to be fitted.

data

Data frame or matrix containing outcome variable and predictors.

family

Type of model fitted: either gaussian() for linear regression or binomial() for logistic regression. This can be specified also as a function name (gaussian) or as a string ("gaussian").

folds

List of cross-validation folds, where each element contains the indices of the observations to be withdrawn in that fold.

...

Arguments to fs().

x

Dataframe of predictors: this should include all variables in the initial set and the variables that are allowed to enter the selected panel.

y

Outcome variable. If family=binomial, it can only contain two classes of values that can be coerced to 0-1.

init.model

Either a formula or a vector of names of the initial set of variables that define the model from which the forward selection should start.

Details

This function allows to obtain an unbiased estimate of the performance of the selected panels on withdrawn data by running forward selection on a predetermined set of folds.

nested.forward.selection provides the legacy interface used up to version 0.9.2. It is considered discontinued, and in the future it will be deprecated and eventually removed.

Value

An object of class nestfs of length equal to length(folds), where each element is an object of class fs containing the following additional fields:

fit

Predicted values for the withdrawn observations.

obs

Observed values for the withdrawn observations.

test.idx

Indices of the the withdrawn observations for this fold.

model

Summary of the model built using the selected panel.

See Also

fs(), summary.nestfs() and nested.performance().

Examples


data(diabetes)
folds <- create.folds(2, nrow(diabetes), seed=1)
nestfs.res <- nested.fs(Y ~ age + sex, diabetes, gaussian(), folds,
                        choose.from=1:10, num.inner.folds=5, max.iters=3)
summary(nestfs.res)



[Package nestfs version 1.0.3 Index]