run_rafs {RAFS}R Documentation

Robust Aggregative Feature Selection (RAFS)

Description

This is the main function of the RAFS library to run for analysis.

Usage

run_rafs(
  data,
  decision,
  k = 5,
  seeds = sample.int(32767, 10),
  fs_fun = default_fs_fun,
  dist_funs = default_dist_funs,
  hclust_methods = default_hclust_methods
)

Arguments

data

input data where columns are variables and rows are observations (all numeric)

decision

decision variable as a binary sequence of length equal to number of observations

k

number of folds for internal cross validation

seeds

a vector of seeds used for fold generation for internal cross validation

fs_fun

function to compute feature selection p-values, it must have the same signature as default_fs_fun (which is the default, see its help to learn more)

dist_funs

a list of feature dissimilarity functions computed over the relevant portion of the training dataset (see the example default_dist_funs and builtin_dist_funs to learn more)

hclust_methods

a vector of hclust methods to use

Details

Depending on your pipeline, you may want to also check out run_rafs_with_fs_results and compute_fs_results which this function simply wraps over.

The results from this function can be fed into one of the helper functions to analyse them further: get_rafs_reps_popcnts, get_rafs_rep_tuples_popcnts, get_rafs_rep_tuples_matrix and get_rafs_occurrence_matrix.

Value

A nested list with hclust results. The first level is per the cross validation run. The second level is per the feature dissimilarity function. The third (and last) level is per the hclust method.

Examples

library(MDFS)
mdfs_omp_set_num_threads(1)  # only to pass CRAN checks
data(madelon)
run_rafs(madelon$data, madelon$decision, 2, c(12345))

[Package RAFS version 0.2.4 Index]