R: Compute Semi-Supervised Learning Curve

LearningCurveSSL {RSSL}

R Documentation

Compute Semi-Supervised Learning Curve

Description

Evaluate semi-supervised classifiers for different amounts of unlabeled training examples or different fractions of unlabeled vs. labeled examples.

Usage

LearningCurveSSL(X, y, ...)

## S3 method for class 'matrix'
LearningCurveSSL(X, y, classifiers, measures = list(Accuracy
  = measure_accuracy), type = "unlabeled", n_l = NULL,
  with_replacement = FALSE, sizes = 2^(1:8), n_test = 1000,
  repeats = 100, verbose = FALSE, n_min = 1, dataset_name = NULL,
  test_fraction = NULL, fracs = seq(0.1, 0.9, 0.1), time = TRUE,
  pre_scale = FALSE, pre_pca = FALSE, low_level_cores = 1, ...)

Arguments

`X`	design matrix
`y`	vector of labels
`...`	arguments passed to underlying function
`classifiers`	list; Classifiers to crossvalidate
`measures`	named list of functions giving the measures to be used
`type`	Type of learning curve, either "unlabeled" or "fraction"
`n_l`	Number of labeled objects to be used in the experiments (see details)
`with_replacement`	Indicated whether the subsampling is done with replacement or not (default: FALSE)
`sizes`	vector with number of unlabeled objects for which to evaluate performance
`n_test`	Number of test points if with_replacement is TRUE
`repeats`	Number of learning curves to draw
`verbose`	Print progressbar during execution (default: FALSE)
`n_min`	Minimum number of labeled objects per class in
`dataset_name`	character; Name of the dataset
`test_fraction`	numeric; If not NULL a fraction of the object will be left out to serve as the test set
`fracs`	list; fractions of labeled data to use
`time`	logical; Whether execution time should be saved.
`pre_scale`	logical; Whether the features should be scaled before the dataset is used
`pre_pca`	logical; Whether the features should be preprocessed using a PCA step
`low_level_cores`	integer; Number of cores to use compute repeats of the learning curve

Details

classifiers is a named list of classifiers, where each classifier should be a function that accepts 4 arguments: a numeric design matrix of the labeled objects, a factor of labels, a numeric design matrix of unlabeled objects and a factor of labels for the unlabeled objects.

measures is a named list of performance measures. These are functions that accept seven arguments: a trained classifier, a numeric design matrix of the labeled objects, a factor of labels, a numeric design matrix of unlabeled objects and a factor of labels for the unlabeled objects, a numeric design matrix of the test objects and a factor of labels of the test objects. See measure_accuracy for an example.

This function allows for two different types of learning curves to be generated. If type="unlabeled", the number of labeled objects remains fixed at the value of n_l, where sizes controls the number of unlabeled objects. n_test controls the number of objects used for the test set, while all remaining objects are used if with_replacement=FALSE in which case objects are drawn without replacement from the input dataset. We make sure each class is represented by at least n_min labeled objects of each class. For n_l, additional options include: "enough" which takes the max of the number of features and 20, max(ncol(X)+5,20), "d" which takes the number of features or "2d" which takes 2 times the number of features.

If type="fraction" the total number of objects remains fixed, while the fraction of labeled objects is changed. frac sets the fractions of labeled objects that should be considered, while test_fraction determines the fraction of the total number of objects left out to serve as the test set.

Value

LearningCurve object

Examples

set.seed(1)
df <- generate2ClassGaussian(2000,d=2,var=0.6)

classifiers <- list("LS"=function(X,y,X_u,y_u) {
 LeastSquaresClassifier(X,y,lambda=0)}, 
  "Self"=function(X,y,X_u,y_u) {
    SelfLearning(X,y,X_u,LeastSquaresClassifier)}
)

measures <- list("Accuracy" =  measure_accuracy,
                 "Loss Test" = measure_losstest,
                 "Loss labeled" = measure_losslab,
                 "Loss Lab+Unlab" = measure_losstrain
)

# These take a couple of seconds to run
## Not run: 
# Increase the number of unlabeled objects
lc1 <- LearningCurveSSL(as.matrix(df[,1:2]),df$Class,
                        classifiers=classifiers,
                        measures=measures, n_test=1800,
                        n_l=10,repeats=3)

plot(lc1)

# Increase the fraction of labeled objects, example with 2 datasets
lc2 <- LearningCurveSSL(X=list("Dataset 1"=as.matrix(df[,1:2]),
                               "Dataset 2"=as.matrix(df[,1:2])),
                        y=list("Dataset 1"=df$Class,
                               "Dataset 2"=df$Class),
                        classifiers=classifiers,
                        measures=measures,
                        type = "fraction",repeats=3,
                        test_fraction=0.9)

plot(lc2)

## End(Not run)

[Package RSSL version 0.9.7 Index]