R: Default Function of SP-FSR for Feature Selection and Ranking

spFSR.default {spFSR}

R Documentation

Default Function of SP-FSR for Feature Selection and Ranking

Description

This is the default function of spFeatureSelection. See spFeatureSelection for example.

Usage

spFSR.default(
  task,
  wrapper = NULL,
  scoring = NULL,
  perturb.amount = 0.05,
  gain.min = 0.01,
  gain.max = 2,
  change.min = 0,
  change.max = 0.2,
  bb.bottom.threshold = 10^(-8),
  mon.gain.A = 100,
  mon.gain.a = 0.75,
  mon.gain.alpha = 0.6,
  hot.start.num.ft.factor = 15,
  hot.start.max.auto.num.ft = 150,
  use.hot.start = TRUE,
  hot.start.range = 0.2,
  rf.n.estimators = 50,
  gain.type = "bb",
  num.features.selected = 0L,
  iters.max = 100L,
  stall.limit = 35L,
  n.samples.max = 5000,
  ft.weighting = FALSE,
  encoding.type = "encode",
  is.debug = FALSE,
  stall.tolerance = 10^(-8),
  random.state = 1,
  rounding = 3,
  run.parallel = TRUE,
  n.jobs = NULL,
  show.info = TRUE,
  print.freq = 10L,
  num.cv.folds = 5L,
  num.cv.reps.eval = 3L,
  num.cv.reps.grad = 1L,
  num.grad.avg = 4L,
  perf.eval.method = "cv"
)

Arguments

`task`	A task `tsk` object created using mlr3 package. It must be either a `ClassifTask` or `RegrTask` object.
`wrapper`	A Learner `lrn` object created using mlr3 package or a `GraphLearner` object created using mlr3pipelines package. Multiple learners object is not supported. If left empty will select random forest by default.
`scoring`	A performance measure `msr` within the mlr3 package supported by the `task`. If left blank will select accuracy for classification and r-squared for regression.
`perturb.amount`	Perturbation amount for feature importances during gradient approximation. It must be a value between 0.01 and 0.1. Default value is 0.05.
`gain.min`	The minimum gain value. It must be greater than or equal to 0.001. Default value is 0.01.
`gain.max`	The maximum gain value. It must be greater than or equal to `gain.min`. Default value is 1.0.
`change.min`	The minimum change value. It must be non-negative. Default value is 0.0.
`change.max`	The maximum change value. It must be greater than `change.min`. Default is 0.2.
`bb.bottom.threshold`	The threshold value of denominator for the Barzilai-Borwein gain sequence. It must be positive. Default is 1/10^8.
`mon.gain.A`	Parameter for the monetone gain sequence. It must be a positive integer. Default is 100.
`mon.gain.a`	Parameter for the monetone gain sequence. It must be positive. Default is 0.75.
`mon.gain.alpha`	Parameter for the monetone gain sequence. It must be between (0, 1). Default is 0.6.
`hot.start.num.ft.factor`	The factor of features to select for hot start. Must be an integer greater than 1. Default is 15.
`hot.start.max.auto.num.ft`	The maximum initial number of features for automatic hot start. Must be an integer greater than 1. Default is 75.
`use.hot.start`	Logical argument. Whether hot start should be used. Default is True.
`hot.start.range`	Float, the initial range of imputations carried over from hot start. It must be between (0,1). Default is 0.2.
`rf.n.estimators`	integer, The number of trees to use in the random forest hot start. The default is 50.
`gain.type`	The gain sequence to use. Accepted methods are 'bb' for Barzilai-Borwein or 'mon' for a monetonic gain sequence. Default is 'bb'.
`num.features.selected`	Number of features selected. It must be a nonnegative integer and must not exceed the total number of features in the task. A value of 0 results in automatic feature selection. Default value is 0L.
`iters.max`	Maximum number of iterations to execute. The minimum value is 2L. Default value is 300L.
`stall.limit`	Number of iterations to stall, that is, to continue without at least `stall.tolerance` improvement to the measure value. The mininum value is 2L. Default value is 100L.
`n.samples.max`	The maximum number of samples to select from sampling. It must be a non-negative integer. Default is 2500.
`ft.weighting`	Logical argument. Include simultaneous feature weighting and selection?. Default is FALSE.
`encoding.type`	Encoding method for factor features for feature weighting, default is 'encoded'.
`is.debug`	Logical argument. Print additional debug messages? Default value is FALSE.
`stall.tolerance`	Value of stall tolerance. It must be strictly positive. Default value is 1/10^8.
`random.state`	random state used. Default is 1.
`rounding`	The number of digits to round results. It must be a positive integer. Default value is 3.
`run.parallel`	Logical argument. Perform cross-validations in parallel? Default value is TRUE.
`n.jobs`	Number of cores to use in case of a parallel run. It must be less than or equal to the total number of cores on the host machine. If set to `NULL` when `run.parallel` is `TRUE`, it is taken as one less of the total number of cores.
`show.info`	If set to `TRUE`, iteration information is displayed at print frequency.
`print.freq`	Iteration information printing frequency. It must be a positive integer. Default value is 10L.
`num.cv.folds`	The number of cross-validation folds when 'cv' is selected as `perf.eval.method`. The minimum value is 3L. Default value is 5L.
`num.cv.reps.eval`	The number of cross-validation repetitions for feature subset evaluation. It must be a positive integer. Default value is 3L.
`num.cv.reps.grad`	The number of cross-validation repetitions for gradient averaging. It must be a positive integer. Default value is 1L.
`num.grad.avg`	Number of gradients to average for gradient approximation. It must be a positive integer. Default value is 4L.
`perf.eval.method`	Performance evaluation method. It must be either 'cv' for cross-validation or 'resub' for resubstitution. Default is 'cv'.

Value

spFSR returns an object of class "spFSR". An object of class "spFSR" consists of the following:

`task.spfs`	An mlr3 package `tsk` object defined on the best performing features.
`wrapper`	An mlr3 package `lrn` object or a mlr3pipelines package `GraphLearner` object as specified by the user.
`scoring`	An mlr3 package `msr` as specified by the user.
`param best.model`	An mlr3 package `model` object trained by the `wrapper` using `task.spfs`.
`iter.results`	A `data.frame` object containing detailed information on each iteration.
`features`	Names of the best performing features.
`num.features`	The number of best performing features.
`importance`	A vector of importance ranks of the best performing features.
`total.iters`	The total number of iterations executed.
`best.iter`	The iteration where the best performing feature subset was encountered.
`best.value`	The best measure value encountered during execution.
`best.std`	The standard deviation corresponding to the best measure value encountered.
`run.time`	Total run time in minutes.
`results`	Dataframe with boolean of selected features, names and measure
`call`	Call.

References

David V. Akman et al. (2022) k-best feature selection and ranking via stochastic approximation, Expert Systems with Applications, Vol. 213. See doi:10.1016/j.eswa.2022.118864