LearningCurveSSL {RSSL} | R Documentation |
Compute Semi-Supervised Learning Curve
Description
Evaluate semi-supervised classifiers for different amounts of unlabeled training examples or different fractions of unlabeled vs. labeled examples.
Usage
LearningCurveSSL(X, y, ...)
## S3 method for class 'matrix'
LearningCurveSSL(X, y, classifiers, measures = list(Accuracy
= measure_accuracy), type = "unlabeled", n_l = NULL,
with_replacement = FALSE, sizes = 2^(1:8), n_test = 1000,
repeats = 100, verbose = FALSE, n_min = 1, dataset_name = NULL,
test_fraction = NULL, fracs = seq(0.1, 0.9, 0.1), time = TRUE,
pre_scale = FALSE, pre_pca = FALSE, low_level_cores = 1, ...)
Arguments
X |
design matrix |
y |
vector of labels |
... |
arguments passed to underlying function |
classifiers |
list; Classifiers to crossvalidate |
measures |
named list of functions giving the measures to be used |
type |
Type of learning curve, either "unlabeled" or "fraction" |
n_l |
Number of labeled objects to be used in the experiments (see details) |
with_replacement |
Indicated whether the subsampling is done with replacement or not (default: FALSE) |
sizes |
vector with number of unlabeled objects for which to evaluate performance |
n_test |
Number of test points if with_replacement is TRUE |
repeats |
Number of learning curves to draw |
verbose |
Print progressbar during execution (default: FALSE) |
n_min |
Minimum number of labeled objects per class in |
dataset_name |
character; Name of the dataset |
test_fraction |
numeric; If not NULL a fraction of the object will be left out to serve as the test set |
fracs |
list; fractions of labeled data to use |
time |
logical; Whether execution time should be saved. |
pre_scale |
logical; Whether the features should be scaled before the dataset is used |
pre_pca |
logical; Whether the features should be preprocessed using a PCA step |
low_level_cores |
integer; Number of cores to use compute repeats of the learning curve |
Details
classifiers
is a named list of classifiers, where each classifier should be a function that accepts 4 arguments: a numeric design matrix of the labeled objects, a factor of labels, a numeric design matrix of unlabeled objects and a factor of labels for the unlabeled objects.
measures
is a named list of performance measures. These are functions that accept seven arguments: a trained classifier, a numeric design matrix of the labeled objects, a factor of labels, a numeric design matrix of unlabeled objects and a factor of labels for the unlabeled objects, a numeric design matrix of the test objects and a factor of labels of the test objects. See measure_accuracy
for an example.
This function allows for two different types of learning curves to be generated. If type="unlabeled"
, the number of labeled objects remains fixed at the value of n_l
, where sizes
controls the number of unlabeled objects. n_test
controls the number of objects used for the test set, while all remaining objects are used if with_replacement=FALSE
in which case objects are drawn without replacement from the input dataset. We make sure each class is represented by at least n_min
labeled objects of each class. For n_l
, additional options include: "enough" which takes the max of the number of features and 20, max(ncol(X)+5,20), "d" which takes the number of features or "2d" which takes 2 times the number of features.
If type="fraction"
the total number of objects remains fixed, while the fraction of labeled objects is changed. frac
sets the fractions of labeled objects that should be considered, while test_fraction
determines the fraction of the total number of objects left out to serve as the test set.
Value
LearningCurve object
See Also
Other RSSL utilities:
SSLDataFrameToMatrices()
,
add_missinglabels_mar()
,
df_to_matrices()
,
measure_accuracy()
,
missing_labels()
,
split_dataset_ssl()
,
split_random()
,
true_labels()
Examples
set.seed(1)
df <- generate2ClassGaussian(2000,d=2,var=0.6)
classifiers <- list("LS"=function(X,y,X_u,y_u) {
LeastSquaresClassifier(X,y,lambda=0)},
"Self"=function(X,y,X_u,y_u) {
SelfLearning(X,y,X_u,LeastSquaresClassifier)}
)
measures <- list("Accuracy" = measure_accuracy,
"Loss Test" = measure_losstest,
"Loss labeled" = measure_losslab,
"Loss Lab+Unlab" = measure_losstrain
)
# These take a couple of seconds to run
## Not run:
# Increase the number of unlabeled objects
lc1 <- LearningCurveSSL(as.matrix(df[,1:2]),df$Class,
classifiers=classifiers,
measures=measures, n_test=1800,
n_l=10,repeats=3)
plot(lc1)
# Increase the fraction of labeled objects, example with 2 datasets
lc2 <- LearningCurveSSL(X=list("Dataset 1"=as.matrix(df[,1:2]),
"Dataset 2"=as.matrix(df[,1:2])),
y=list("Dataset 1"=df$Class,
"Dataset 2"=df$Class),
classifiers=classifiers,
measures=measures,
type = "fraction",repeats=3,
test_fraction=0.9)
plot(lc2)
## End(Not run)