jous {JOUSBoost}R Documentation

Jittering with Over/Under Sampling


Perform probability estimation using jittering with over or undersampling.


jous(X, y, class_func, pred_func, type = c("under", "over"), delta = 10,
  nu = 1, X_pred = NULL, keep_models = FALSE, verbose = FALSE,
  parallel = FALSE, packages = NULL)



A matrix of continuous predictors.


A vector of responses with entries in c(-1, 1).


Function to perform classification. This function definition must be exactly of the form class_func(X, y) where X is a matrix and y is a vector with entries in c(-1, 1), and it must return an object on which pred_func can create predictions. See examples.


Function to create predictions. This function definition must be exactly of the form pred_func(fit_obj, X) where fit_obj is an object returned by class_func and X is a matrix of new data values, and it must return a vector with entries in c(-1, 1). See examples.


Type of sampling: "over" for oversampling, or "under" for undersampling.


An integer (greater than 3) to control the number of quantiles to estimate:


The amount of noise to apply to predictors when oversampling data. The noise level is controlled by nu * sd(X[,j]) for each predictor - the default of nu = 1 works well. Such "jittering" of the predictors is essential when applying jous to boosting type methods.


A matrix of predictors for which to form probability estimates.


Whether to store all of the models used to create the probability estimates. If type=FALSE, the user will need to re-run jous when creating probability estimates for test data.


If TRUE, print the function's progress to the terminal.


If TRUE, use parallel foreach to fit models. Must register parallel before hand, such as doParallel. See examples below.


If parallel = TRUE, a vector of strings containing the names of any packages used in class_func or pred_func. See examples below.


Returns a list containing information about the parameters used in the jous function call, as well as the following additional components:


The vector of target quantiles estimated by jous. Note that the estimated probabilities will be located at the midpoints of the values in q.


The in-sample probability estimates p(y=1x)p(y=1|x).


Probability estimates for the optional test data in X_test


If keep_models=TRUE, a list of models fitted to the resampled data sets.


A confusion matrix for the in-sample fits.


The jous function runs the classifier class_func a total of delta times on the data, which can be computationally expensive. Also,jous cannot yet be applied to categorical predictors - in the oversampling case, it is not clear how to "jitter" a categorical variable.


Mease, D., Wyner, A. and Buja, A. (2007). Costweighted boosting with jittering and over/under-sampling: JOUS-boost. J. Machine Learning Research 8 409-439.


## Not run: 
# Generate data from Friedman model #
dat = friedman_data(n = 500, gamma = 0.5)
train_index = sample(1:500, 400)

# Apply jous to adaboost classifier
class_func = function(X, y) adaboost(X, y, tree_depth = 2, n_rounds = 200)
pred_func = function(fit_obj, X_test) predict(fit_obj, X_test)

jous_fit = jous(dat$X[train_index,], dat$y[train_index], class_func,
                pred_func, keep_models = TRUE)
# get probability
phat_jous = predict(jous_fit, dat$X[-train_index, ], type = "prob")

# compare with probability from AdaBoost
ada = adaboost(dat$X[train_index,], dat$y[train_index], tree_depth = 2,
               n_rounds = 200)
phat_ada = predict(ada, dat$X[train_index,], type = "prob")

mean((phat_jous - dat$p[-train_index])^2)
mean((phat_ada - dat$p[-train_index])^2)

## Example using parallel option

cl <- makeCluster(4)

# n.b. the packages='rpart' is not really needed here since it gets
# exported automatically by JOUSBoost, but for illustration
jous_fit = jous(dat$X[train_index,], dat$y[train_index], class_func,
                pred_func, keep_models = TRUE, parallel = TRUE,
                packages = 'rpart')
phat = predict(jous_fit, dat$X[-train_index,], type = 'prob')

## Example using SVM

class_func = function(X, y) ksvm(X, as.factor(y), kernel = 'rbfdot')
pred_func = function(obj, X) as.numeric(as.character(predict(obj, X)))
jous_obj = jous(dat$X[train_index,], dat$y[train_index], class_func = class_func,
           pred_func = pred_func, keep_models = TRUE)
jous_pred = predict(jous_obj, dat$X[-train_index,], type = 'prob')

## End(Not run)

[Package JOUSBoost version 2.1.0 Index]