sfcb {SISIR}R Documentation

sfcb

Description

sfcb performs interval selection based on random forests

Usage

sfcb(
  X,
  Y,
  group.method = c("adjclust", "cclustofvar"),
  summary.method = c("pls", "basics", "cclustofvar"),
  selection.method = c("none", "boruta", "relief"),
  at = round(0.15 * ncol(X)),
  range.at = NULL,
  seed = NULL,
  repeats = 5,
  keep.time = TRUE,
  verbose = TRUE,
  parallel = FALSE
)

Arguments

X

input predictors (matrix or data.frame)

Y

target variable (vector whose length is equal to the number of rows in X)

group.method

group method. Default to "adjclust"

summary.method

summary method. Default to "pls"

selection.method

selection method. Default to "none" (no selection performed)

at

number of groups targeted for output results (integer). Not used when range.at is not NULL

range.at

(vector of integer) sequence of the numbers of groups for output results

seed

random seed (integer)

repeats

number of repeats for the final random forest computation

keep.time

keep computational times for each step of the method? (logical; default to TRUE)

verbose

print messages? (logical; default to TRUE)

parallel

not implemented yet

Value

an object of class "SFCB" with elements:

dendro

a dendrogram corresponding to the method chosen in group.method

groups

a list of length length(range.at) (or of length 1 if range.at == NULL) that contains the clusterings of input variables for the selected group numbers

summaries

a list of the same length than $groups that contains the summarized predictors according to the method chosen in summary.methods

selected

a list of the same length than $groups that contains the names of the variable selected by selection.method if it is not equal to "none"

mse

a data.frame with repeats \times length($groups) rows that contains Mean Squared Errors of the repeats random forests fitted for each number of groups

importance

a list of the same length than $groups that contains a data.frame providing variable importances for the variables in selected groups in repeats columns (one for each iteration of the random forest method). When summary.method == "basics", importance for mean and sd are provided in separated columns, in which case, the number of columns is equal to 2repeats

computational.times

a vector with 4 values corresponding to the computational times of (respectively) the group, summary, selection, and RF steps. Only if keep.time == TRUE

call

function call

Author(s)

Remi Servien, remi.servien@inrae.fr
Nathalie Vialaneix, nathalie.vialaneix@inrae.fr

References

Servien, R. and Vialaneix, N. (2023) A random forest approach for interval selection in functional regression. Preprint.

Examples

data(truffles)
out1 <- sfcb(rainfall, truffles, group.method = "adjclust", 
             summary.method = "pls", selection.method = "relief")
out2 <- sfcb(rainfall, truffles, group.method = "adjclust", 
             summary.method = "basics", selection.method = "none",
             range.at = c(5, 7))

[Package SISIR version 0.2.2 Index]