jous {JOUSBoost} | R Documentation |
Jittering with Over/Under Sampling
Description
Perform probability estimation using jittering with over or undersampling.
Usage
jous(X, y, class_func, pred_func, type = c("under", "over"), delta = 10,
nu = 1, X_pred = NULL, keep_models = FALSE, verbose = FALSE,
parallel = FALSE, packages = NULL)
Arguments
X |
A matrix of continuous predictors. |
y |
A vector of responses with entries in |
class_func |
Function to perform classification. This function definition must be
exactly of the form |
pred_func |
Function to create predictions. This function definition must be
exactly of the form |
type |
Type of sampling: "over" for oversampling, or "under" for undersampling. |
delta |
An integer (greater than 3) to control the number of quantiles to estimate: |
nu |
The amount of noise to apply to predictors when oversampling data.
The noise level is controlled by |
X_pred |
A matrix of predictors for which to form probability estimates. |
keep_models |
Whether to store all of the models used to create
the probability estimates. If |
verbose |
If |
parallel |
If |
packages |
If |
Value
Returns a list containing information about the
parameters used in the jous
function call, as well as the following
additional components:
q |
The vector of target quantiles estimated by |
phat_train |
The in-sample probability estimates |
phat_test |
Probability estimates for the optional test data in |
models |
If |
confusion_matrix |
A confusion matrix for the in-sample fits. |
Note
The jous
function runs the classifier class_func
a total
of delta
times on the data, which can be computationally expensive.
Also,jous
cannot yet be applied to categorical predictors - in the
oversampling case, it is not clear how to "jitter" a categorical variable.
References
Mease, D., Wyner, A. and Buja, A. (2007). Costweighted boosting with jittering and over/under-sampling: JOUS-boost. J. Machine Learning Research 8 409-439.
Examples
## Not run:
# Generate data from Friedman model #
set.seed(111)
dat = friedman_data(n = 500, gamma = 0.5)
train_index = sample(1:500, 400)
# Apply jous to adaboost classifier
class_func = function(X, y) adaboost(X, y, tree_depth = 2, n_rounds = 200)
pred_func = function(fit_obj, X_test) predict(fit_obj, X_test)
jous_fit = jous(dat$X[train_index,], dat$y[train_index], class_func,
pred_func, keep_models = TRUE)
# get probability
phat_jous = predict(jous_fit, dat$X[-train_index, ], type = "prob")
# compare with probability from AdaBoost
ada = adaboost(dat$X[train_index,], dat$y[train_index], tree_depth = 2,
n_rounds = 200)
phat_ada = predict(ada, dat$X[train_index,], type = "prob")
mean((phat_jous - dat$p[-train_index])^2)
mean((phat_ada - dat$p[-train_index])^2)
## Example using parallel option
library(doParallel)
cl <- makeCluster(4)
registerDoParallel(cl)
# n.b. the packages='rpart' is not really needed here since it gets
# exported automatically by JOUSBoost, but for illustration
jous_fit = jous(dat$X[train_index,], dat$y[train_index], class_func,
pred_func, keep_models = TRUE, parallel = TRUE,
packages = 'rpart')
phat = predict(jous_fit, dat$X[-train_index,], type = 'prob')
stopCluster(cl)
## Example using SVM
library(kernlab)
class_func = function(X, y) ksvm(X, as.factor(y), kernel = 'rbfdot')
pred_func = function(obj, X) as.numeric(as.character(predict(obj, X)))
jous_obj = jous(dat$X[train_index,], dat$y[train_index], class_func = class_func,
pred_func = pred_func, keep_models = TRUE)
jous_pred = predict(jous_obj, dat$X[-train_index,], type = 'prob')
## End(Not run)