cuda_ml_rand_forest {cuda.ml} | R Documentation |

## Train a random forest model.

### Description

Train a random forest model for classification or regression tasks.

### Usage

```
cuda_ml_rand_forest(x, ...)
## Default S3 method:
cuda_ml_rand_forest(x, ...)
## S3 method for class 'data.frame'
cuda_ml_rand_forest(
x,
y,
mtry = NULL,
trees = NULL,
min_n = 2L,
bootstrap = TRUE,
max_depth = 16L,
max_leaves = Inf,
max_predictors_per_note_split = NULL,
n_bins = 128L,
min_samples_leaf = 1L,
split_criterion = NULL,
min_impurity_decrease = 0,
max_batch_size = 128L,
n_streams = 8L,
cuML_log_level = c("off", "critical", "error", "warn", "info", "debug", "trace"),
...
)
## S3 method for class 'matrix'
cuda_ml_rand_forest(
x,
y,
mtry = NULL,
trees = NULL,
min_n = 2L,
bootstrap = TRUE,
max_depth = 16L,
max_leaves = Inf,
max_predictors_per_note_split = NULL,
n_bins = 128L,
min_samples_leaf = 1L,
split_criterion = NULL,
min_impurity_decrease = 0,
max_batch_size = 128L,
n_streams = 8L,
cuML_log_level = c("off", "critical", "error", "warn", "info", "debug", "trace"),
...
)
## S3 method for class 'formula'
cuda_ml_rand_forest(
formula,
data,
mtry = NULL,
trees = NULL,
min_n = 2L,
bootstrap = TRUE,
max_depth = 16L,
max_leaves = Inf,
max_predictors_per_note_split = NULL,
n_bins = 128L,
min_samples_leaf = 1L,
split_criterion = NULL,
min_impurity_decrease = 0,
max_batch_size = 128L,
n_streams = 8L,
cuML_log_level = c("off", "critical", "error", "warn", "info", "debug", "trace"),
...
)
## S3 method for class 'recipe'
cuda_ml_rand_forest(
x,
data,
mtry = NULL,
trees = NULL,
min_n = 2L,
bootstrap = TRUE,
max_depth = 16L,
max_leaves = Inf,
max_predictors_per_note_split = NULL,
n_bins = 128L,
min_samples_leaf = 1L,
split_criterion = NULL,
min_impurity_decrease = 0,
max_batch_size = 128L,
n_streams = 8L,
cuML_log_level = c("off", "critical", "error", "warn", "info", "debug", "trace"),
...
)
```

### Arguments

`x` |
Depending on the context: * A __data frame__ of predictors. * A __matrix__ of predictors. * A __recipe__ specifying a set of preprocessing steps * created from [recipes::recipe()]. * A __formula__ specifying the predictors and the outcome. |

`...` |
Optional arguments; currently unused. |

`y` |
A numeric vector (for regression) or factor (for classification) of desired responses. |

`mtry` |
The number of predictors that will be randomly sampled at each split when creating the tree models. Default: the square root of the total number of predictors. |

`trees` |
An integer for the number of trees contained in the ensemble. Default: 100L. |

`min_n` |
An integer for the minimum number of data points in a node that are required for the node to be split further. Default: 2L. |

`bootstrap` |
Whether to perform bootstrap. If TRUE, each tree in the forest is built on a bootstrapped sample with replacement. If FALSE, the whole dataset is used to build each tree. |

`max_depth` |
Maximum tree depth. Default: 16L. |

`max_leaves` |
Maximum leaf nodes per tree. Soft constraint. Default: Inf (unlimited). |

`max_predictors_per_note_split` |
Number of predictor to consider per node split. Default: square root of the total number predictors. |

`n_bins` |
Number of bins used by the split algorithm. Default: 128L. |

`min_samples_leaf` |
The minimum number of data points in each leaf node. Default: 1L. |

`split_criterion` |
The criterion used to split nodes, can be "gini" or "entropy" for classifications, and "mse" or "mae" for regressions. Default: "gini" for classification; "mse" for regression. |

`min_impurity_decrease` |
Minimum decrease in impurity requried for node to be spilt. Default: 0. |

`max_batch_size` |
Maximum number of nodes that can be processed in a given batch. Default: 128L. |

`n_streams` |
Number of CUDA streams to use for building trees. Default: 8L. |

`cuML_log_level` |
Log level within cuML library functions. Must be one of "off", "critical", "error", "warn", "info", "debug", "trace". Default: off. |

`formula` |
A formula specifying the outcome terms on the left-hand side, and the predictor terms on the right-hand side. |

`data` |
When a __recipe__ or __formula__ is used, |

### Value

A random forest classifier / regressor object that can be used with the 'predict' S3 generic to make predictions on new data points.

### Examples

```
library(cuda.ml)
# Classification
model <- cuda_ml_rand_forest(
formula = Species ~ .,
data = iris,
trees = 100
)
predictions <- predict(model, iris[names(iris) != "Species"])
# Regression
model <- cuda_ml_rand_forest(
formula = mpg ~ .,
data = mtcars,
trees = 100
)
predictions <- predict(model, mtcars[names(mtcars) != "mpg"])
```

*cuda.ml*version 0.3.2 Index]