preprocess_data {mikropml} | R Documentation |
Preprocess data prior to running machine learning
Description
Function to preprocess your data for input into run_ml()
.
Usage
preprocess_data(
dataset,
outcome_colname,
method = c("center", "scale"),
remove_var = "nzv",
collapse_corr_feats = TRUE,
to_numeric = TRUE,
group_neg_corr = TRUE,
prefilter_threshold = 1
)
Arguments
dataset |
Data frame with an outcome variable and other columns as features. |
outcome_colname |
Column name as a string of the outcome variable
(default |
method |
Methods to preprocess the data, described in
|
remove_var |
Whether to remove variables with near-zero variance
( |
collapse_corr_feats |
Whether to keep only one of perfectly correlated features. |
to_numeric |
Whether to change features to numeric where possible. |
group_neg_corr |
Whether to group negatively correlated features together (e.g. c(0,1) and c(1,0)). |
prefilter_threshold |
Remove features which only have non-zero & non-NA
values N rows or fewer (default: 1). Set this to -1 to keep all columns at
this step. This step will also be skipped if |
Value
Named list including:
-
dat_transformed
: Preprocessed data. -
grp_feats
: If features were grouped together, a named list of the features corresponding to each group. -
removed_feats
: Any features that were removed during preprocessing (e.g. because there was zero variance or near-zero variance for those features).
If the progressr
package is installed, a progress bar with time elapsed
and estimated time to completion can be displayed.
More details
See the preprocessing vignette for more details.
Note that if any values in outcome_colname
contain spaces, they will be
converted to underscores for compatibility with caret
.
Author(s)
Zena Lapp, zenalapp@umich.edu
Kelly Sovacool, sovacool@umich.edu
Examples
preprocess_data(mikropml::otu_small, "dx")
# the function can show a progress bar if you have the progressr package installed
## optionally, specify the progress bar format
progressr::handlers(progressr::handler_progress(
format = ":message :bar :percent | elapsed: :elapsed | eta: :eta",
clear = FALSE,
show_after = 0
))
## tell progressor to always report progress
## Not run:
progressr::handlers(global = TRUE)
## run the function and watch the live progress udpates
dat_preproc <- preprocess_data(mikropml::otu_small, "dx")
## End(Not run)