R: WGCNA based fuzzy forest algorithm

wff {fuzzyforest}

R Documentation

WGCNA based fuzzy forest algorithm

Description

Fits fuzzy forests using WGCNA to cluster features into distinct modules. Requires installation of WGCNA package. Note that a formula interface for WGCNA based fuzzy forests also exists: wff.formula.

Usage

## Default S3 method:
wff(X, y, Z = NULL,
  WGCNA_params = WGCNA_control(power = 6),
  screen_params = screen_control(min_ntree = 500),
  select_params = select_control(min_ntree = 500), final_ntree = 5000,
  num_processors = 1, nodesize, test_features = NULL, test_y = NULL,
  ...)

wff(X, ...)

Arguments

`X`	A data.frame. Each column corresponds to a feature vector. WGCNA will be used to cluster the features in X. As a result, the features should be all be numeric. Non-numeric features may be input via Z.
`y`	Response vector. For classification, y should be a factor. For regression, y should be numeric.
`Z`	Additional features that are not to be screened out at the screening step. WGCNA is not carried out on features in Z.
`WGCNA_params`	Parameters for WGCNA. See blockwiseModules function from WGCNA and `WGCNA_control` for details. `WGCNA_params` is an object of type `WGCNA_control`.
`screen_params`	Parameters for screening step of fuzzy forests. See `screen_control` for details. `screen_params` is an object of type `screen_control`.
`select_params`	Parameters for selection step of fuzzy forests. See `select_control` for details. `select_params` is an object of type `select_control`.
`final_ntree`	Number of trees grown in the final random forest. This random forest contains all selected features.
`num_processors`	Number of processors used to fit random forests.
`nodesize`	Minimum terminal nodesize. 1 if classification. 5 if regression. If the sample size is very large, the trees will be grown extremely deep. This may lead to issues with memory usage and may lead to significant increases in the time it takes the algorithm to run. In this case, it may be useful to increase `nodesize`.
`test_features`	A data.frame containing features from a test set. The data.frame should contain the features in both X and Z.
`test_y`	The responses for the test set.
`...`	Additional arguments currently not used.

Value

An object of type fuzzy_forest. This object is a list containing useful output of fuzzy forests. In particular it contains a data.frame with list of selected features. It also includes the random forest fit using the selected features.

Note

This work was partially funded by NSF IIS 1251151 and AMFAR 8721SC.

References

Conn, D., Ngun, T., Ramirez C.M., Li, G. (2019). "Fuzzy Forests: Extending Random Forest Feature Selection for Correlated, High-Dimensional Data." Journal of Statistical Software, 91(9). doi: 10.18637/jss.v091.i09

Breiman, L. (2001). "Random Forests." Machine Learning, 45(1), 5-32. doi: 10.1023/A:1010933404324

Zhang, B. and Horvath, S. (2005). "A General Framework for Weighted Gene Co-Expression Network Analysis." Statistical Applications in Genetics and Molecular Biology, 4(1). doi: 10.2202/1544-6115.1128

Examples

data(ctg)
y <- ctg$NSP
X <- ctg[, 2:22]
WGCNA_params <- WGCNA_control(p = 6, minModuleSize = 1, nThreads = 1)
mtry_factor <- 1; min_ntree <- 500;  drop_fraction <- .5; ntree_factor <- 1
screen_params <- screen_control(drop_fraction = drop_fraction,
                                keep_fraction = .25, min_ntree = min_ntree,
                                ntree_factor = ntree_factor,
                                mtry_factor = mtry_factor)
select_params <- select_control(drop_fraction = drop_fraction,
                                number_selected = 5,
                                min_ntree = min_ntree,
                                ntree_factor = ntree_factor,
                                mtry_factor = mtry_factor)

library(WGCNA)
wff_fit <- wff(X, y, WGCNA_params = WGCNA_params,
                screen_params = screen_params,
                select_params = select_params,
                final_ntree = 500)

#extract variable importance rankings
vims <- wff_fit$feature_list

#plot results
modplot(wff_fit)

[Package fuzzyforest version 1.0.8 Index]