sisal {sisal} | R Documentation |
Sequential Input Selection Algorithm (SISAL)
Description
Identifies relevant inputs using a backward selection type algorithm
with optional branching. Choices are made by assessing linear models
estimated with ordinary least squares or ridge regression in a
cross-validation setting.
Usage
sisal(X, y, Mtimes = 100, kfold = 10, hbranches = 1,
max.width = hbranches^2, q = 0.165, standardize = TRUE,
pruning.criterion = c("round robin", "random nodes",
"random edges", "greedy"),
pruning.keep.best = TRUE, pruning.reverse = FALSE,
verbose = 1, use.ridge = FALSE,
max.warn = getOption("nwarnings"), sp = -1, ...)
Arguments
X |
a numeric matrix where each column is a
predictor (independent variable) and each row is an observation
(data point)
|
y |
a numeric vector containing a sample of the response
(dependent) variable, in the same order as the rows of
X
|
Mtimes |
the number of times the cross-validation is repeated,
i.e. the number of predictions made for each data point. An
integral value (numeric or integer ).
|
kfold |
the number of approximately equally sized parts used for
partitioning the data on each cross-validation round. An integral
value (numeric or integer ).
|
hbranches |
the number of branches to take when removing a
variable from the model. In Tikka and Hollmén
(2008), the algorithm always removes the “weakest” variable
(hbranches equals 1 , also the default here). By
using a value larger than 1 , the algorithm creates branches
in the search graph by removing each of the hbranches
“weakest” variables, one at a time. The number of branches
created is naturally limited by the number of variables remaining in
the model at that point. See also max.width .
|
max.width |
the maximum number of nodes with a given number of
variables allowed in the search graph. The same limit is used for
all search levels. An integral value (numeric or
integer ). See pruning.criterion and
pruning.keep.best .
|
q |
a numeric value between 0 and 0.5
(endpoints excluded) defining the quantiles
1-q and q . The difference of these sample
quantiles is used as the width of the sampling distribution (a
measure of uncertainty) of each coefficient in a linear model. The
default value 0.165 is the same as used by Tikka and
Hollmén (2008). In case of a normally distributed
parameter, the width is approximately twice the standard deviation
(one standard deviation on both sides of the mean).
|
standardize |
a logical flag. If TRUE ,
standardizes the data to zero mean and unit variance. If
FALSE , uses original data. This affects the scale of the
results. If use.ridge is TRUE , this should be
set to TRUE or the search graph and the sets of selected
variables could be affected.
|
pruning.criterion |
a character string. Options are
"round robin" , "random nodes" , "random edges"
and "greedy" . Abbreviations are allowed. This affects how
the search tree is pruned if the number of nodes to explore is about
to exceed max.width . One of the following methods is
used to select max.width nodes for the next level of
search.
If "round robin" , the nodes of the current level
(i variables) take turns selecting nodes for the next
level (i-1 variables). The turns are taken in order of
increasing validation error. Each parent node chooses children
according to the order described in ‘Details’. If a duplicate
choice would be made, the turn is skipped.
If "random nodes" , random nodes are selected with uniform
probability.
If "random edges" , random nodes are selected, with the
probability of a node directly proportional to the number of edges
leading to it.
If "greedy" , a method similar to "round robin" is
used, but with the (virtual) looping order of parents and children
swapped. Whereas the outer loop in "round robin" operates
over children and the inner loop over parents, the outer loop in
"greedy" operates over parents and the inner loop over
children. That is, a "greedy" parent node selects all its
children before passing on the turn to the next parent.
|
pruning.keep.best |
a logical flag. If TRUE , the
nodes that would also be present in the hbranches = 1
case are immune to pruning. If FALSE , the result may
underperform the original Tikka and Hollmén (2008)
solution in terms of (the lowest) validation error as function of
the number of inputs.
|
pruning.reverse |
a logical flag. If TRUE , all
the methods described in pruning.criterion except
"random nodes" use reverse orders or inverse probabilities.
The default is FALSE .
|
verbose |
a numeric or integer verbosity level
from 0 (no output) to 5 (all possible diagnostics).
|
use.ridge |
a logical flag. If TRUE , the function
uses ridge regression with automatic selection of the regularization
(smoothing) parameter.
|
max.warn |
a numeric value giving the maximum number of
warnings to store in the returned object. If more warnings are
given, their total number is still recorded in the object.
|
sp |
a numeric value passed to magic if
use.ridge is TRUE . Initial value of the
regularization parameter. If negative (the default), initialization
is automatic.
|
... |
additional arguments passed to magic if
use.ridge is TRUE . It is an error to supply
arguments named "S" or "off" .
|
Details
When choosing which variable to drop from the model, the importance of
a variable is measured by looking at two variables derived from the
sampling distribution of its coefficient in the linear models of the
repeated cross-validation runs:
absolute value of
the median and
width of the distribution (see q
).
The importance of an input variable is the ratio of the median to
the width: hbranches
variables with the smallest ratios
are dropped, one variable in each branch. See max.width
and pruning.criterion
.
The main results of the function are described here. More details are
available in ‘Value’.
The function returns two sets of inputs variables:
- L.v
set corresponding to the smallest validation error.
- L.f
smallest set where validation error is close to the
smallest error. The margin is the standard deviation of the training
error measured in the node of the smallest validation error.
The mean of mean squared errors in the training and
validation sets are also returned (E.tr
,
E.v
). For the training set, the standard deviation of
MSEs (s.tr
) is also returned. The length of
these vectors is the number of variables in X
. The
i:th element in each of the vectors corresponds to the best
model with i input variables, where goodness is measured by the
mean MSE in the validation set.
Linear models fitted to the whole data set are also returned. Both
ordinary least square regression (lm.L.f
,
lm.L.v
, lm.full
) and ridge regression models
(magic.L.f
, magic.L.v
,
magic.full
) are computed, irrespective of the
use.ridge
setting. Both fitting methods are used for the
L.f
set of variables, the L.v
set and the
full set (all variables).
Value
A list
with class
"sisal"
. The items are:
L.f |
a numeric vector containing indices to columns of
X . See ‘Details’.
|
L.v |
a numeric index vector like L.f . See
‘Details’.
|
E.tr |
a numeric vector of length d + 1 .
See ‘Details’.
|
s.tr |
a numeric vector of length d + 1 .
See ‘Details’.
|
E.v |
a numeric vector of length d + 1 . See
‘Details’.
|
L.f.nobranch |
a numeric vector or NULL . Like
L.f but for the “no branching” solution.
NULL if branching is not used or if some elements of
branching.useful are missing.
|
L.v.nobranch |
like L.f.nobranch but related to
L.v .
|
E.tr.nobranch |
a numeric vector or NULL . Like
E.tr but for the “no branching” solution.
NULL when branching.useful is NULL . An
element is missing when the corresponding element of
branching.useful is missing.
|
s.tr.nobranch |
like E.tr.nobranch but related to
s.tr .
|
E.v.nobranch |
like E.tr.nobranch but related to
E.v .
|
n.evaluated |
a numeric vector of length d +
1 . The number of nodes evaluated for each model size, indexed by
the number of variables used plus one.
|
edges |
a list of directed edges between nodes in the
search graph. There is an edge from node A to node B if
and only if B was a candidate for a new node to be evaluated,
resulting from removing one variable in A. The i:th
element of the list contains edges directed away from the node
represented by the i:th element of vertices .
Each element is a list with one element, "edges" , which is a
numeric vector of indices to vertices , pointing
to the nodes towards which the edges are directed. There are no
edges directed away from pruned nodes or nodes representing a single
variable.
|
vertices |
a character vector the same size as
edges . Contains the names of the nodes in the search
graph. Each name contains the indices of the variables included in
the set in question, separated by dots.
|
vertices.logical |
a logical matrix containing an
alternative representation of vertices . Number of rows
is the length of vertices and number of columns is
d . The i:th column indicates whether the
i:th input variable is present in a given node. The row index
and the index to vertices are equivalent.
|
vertex.data |
A data.frame with information about each
node in the search graph (missing information means pruned node).
The rows correspond to items in vertices . The columns
are:
- E.tr
mean of MSEs, training.
- s.tr
standard deviation (n-1 ) of
MSEs, training.
- E.v
mean of MSEs, validation.
- E.v.level.rank
rank of the node among all the evaluated
(non-pruned) nodes with the same number of variables, in terms
of validation error. Smallest error is rank 1.
- n.rank.deficient
number of rank deficient linear models.
This problem arises when the number of input variables is large
compared to the number of observations and
use.ridge is FALSE .
- n.NA.models
number of models that could not be estimated
due to lack of any samples
- n.inputs
number of input variables used in the model
represented by the node.
- min.branches
the smallest branching factor large enough
for producing the node. This is a number k between
1 and hbranches . The value for the root
node (all input variables) is 1 . The value for other
nodes is the minimum of the set of values suggested by its
parents. The value suggested by an individual parent is the
min.branches value of the parent itself or the
ranking of the child in terms of increasing importance of the
removed variable (see ‘Details’), whichever is larger.
For example, when pruning.keep.best is TRUE ,
the hbranches = 1 search path can be followed by
looking for nodes where min.branches is 1 .
|
var.names |
names of the variables (column names of
X ).
|
n |
number of observations in the (X ,
y ) data.
|
d |
number of variables (columns) in X .
|
n.missing |
number of samples where either y or all
variables of X are missing.
|
n.clean |
number of complete samples in the data set
X , y .
|
lm.L.f |
lm model fitted to L.f
variables.
|
lm.L.v |
lm model fitted to L.v
variables.
|
lm.full |
lm model fitted to all variables.
|
magic.L.f |
magic model fitted to L.f
variables.
|
magic.L.v |
magic model fitted to L.v
variables.
|
magic.full |
magic model fitted to all
variables.
|
mean.y |
mean of y .
|
sd.y |
standard deviation (denominator n - 1 ) of
y .
|
zeroRange.y |
a logical value indicating whether all
non-missing elements of y are equal, with some numeric
tolerance.
|
mean.X |
column means of X .
|
sd.X |
standard deviation (denominator n - 1 ) of
each column in X .
|
zeroRange.X |
a logical vector. Like
zeroRange.y but for each column of X .
|
constant.X |
a logical vector where the i:th value
indicates whether the i:th column of X has a (nearly)
constant, non-zero value (NA values allowed).
|
params |
a named list containing the values used for most
of the parameter-like formal arguments of the
function, and also anything in ... . The names are the
names of the parameters.
|
pairwise.points |
a numeric square matrix with
d rows and columns. The count in row i, column
j indicates the number of times that variable i was
better than variable j. See ‘Details’ in
plotSelected.sisal .
|
pairwise.wins |
a logical square matrix with
d rows and columns. A TRUE value in row
i, column j indicates that i is more important
than variable j. Derived from pairwise.points .
|
pairwise.preferences |
a numeric vector with
d elements. Number of wins minus number of losses
(when another variable wins) per variable. Derived from
pairwise.wins .
|
pairwise.rank |
an integer vector of ranks according to
Copeland's pairwise aggregation method. Element number i is
the rank of variable (column) number i in X .
Derived from pairwise.preferences . See
‘Details’ in plotSelected.sisal .
|
path.length |
a numeric vector of path lengths. Consider
a path starting from the full model and continuing through
incrementally smaller models, each with the smallest validation
error among the nodes with that number of variables. However, the
path is broken at each point where the model with one less variable
cannot be constructed by removing one variable from the bigger model
(is not nested). The vector contains the lengths of the pieces.
Its length is the number of breaks plus one.
|
nested.path |
a numeric vector containing the indices
(column numbers) of the input variables in their removal order on
the “nested path”. The first element is the index of the
variable that was removed first. The remaining variable is the last
element. If the path does not exist, this is NULL . See
‘Details’ in plotSelected.sisal .
|
nested.rank |
an integer vector of ranks determined by
nested.path . Element number i is the rank of
variable (column) number i in X . NULL if
nested.path is NULL . See ‘Details’ in
plotSelected.sisal .
|
branching.useful |
If branching is enabled
(hbranches > 1 ), this is a logical vector of
length d . If the i:th element is TRUE ,
branching improved the best model with i variables in terms of
validation error. The result is NA if a comparison is not
possible (may happen if pruning.keep.best is
FALSE ). If branching is not used, this is NULL .
|
warnings |
warnings stored. A list of objects that
evaluate to a character string.
|
n.warn |
number of warnings produced. May be higher than number
of warnings stored.
|
Author(s)
Mikko Korpela
References
Tikka, J. and Hollmén, J. (2008) Sequential input
selection algorithm for long-term prediction of time series.
Neurocomputing, 71(13–15):2604–2615.
See Also
See magic
for information about the algorithm used for
estimating the regularization parameter and the corresponding linear
model when use.magic
is TRUE
.
See summary.sisal
for how to extract information from
the returned object.
Examples
library(stats)
set.seed(123)
X <- cbind(sine=sin((1:100)/5),
linear=seq(from=-1, to=1, length.out=100),
matrix(rnorm(800), 100, 8,
dimnames=list(NULL, paste("random", 1:8, sep="."))))
y <- drop(X %*% c(3, 10, 1, rep(0, 7)) + rnorm(100))
foo <- sisal(X, y, Mtimes=10, kfold=5)
print(foo) # selected inputs "L.v" are same as
summary(foo$lm.full) # significant coefficients of full model
[Package
sisal version 0.48
Index]