R: An Expectation Maximization like approach to Semi-Supervised...

EMLeastSquaresClassifier {RSSL}

R Documentation

An Expectation Maximization like approach to Semi-Supervised Least Squares Classification

Description

As studied in Krijthe & Loog (2016), minimizes the total loss of the labeled and unlabeled objects by finding the weight vector and labels that minimize the total loss. The algorithm proceeds similar to EM, by subsequently applying a weight update and a soft labeling of the unlabeled objects. This is repeated until convergence.

Usage

EMLeastSquaresClassifier(X, y, X_u, x_center = FALSE, scale = FALSE,
  verbose = FALSE, intercept = TRUE, lambda = 0, eps = 1e-09,
  y_scale = FALSE, alpha = 1, beta = 1, init = "supervised",
  method = "block", objective = "label", save_all = FALSE,
  max_iter = 1000)

Arguments

`X`	matrix; Design matrix for labeled data
`y`	factor or integer vector; Label vector
`X_u`	matrix; Design matrix for unlabeled data
`x_center`	logical; Should the features be centered?
`scale`	Should the features be normalized? (default: FALSE)
`verbose`	logical; Controls the verbosity of the output
`intercept`	logical; Whether an intercept should be included
`lambda`	numeric; L2 regularization parameter
`eps`	Stopping criterion for the minimization
`y_scale`	logical; whether the target vector should be centered
`alpha`	numeric; the mixture of the new responsibilities and the old in each iteration of the algorithm (default: 1)
`beta`	numeric; value between 0 and 1 that determines how much to move to the new solution from the old solution at each step of the block gradient descent
`init`	objective character; "random" for random initialization of labels, "supervised" to use supervised solution as initialization or a numeric vector with a coefficient vector to use to calculate the initialization
`method`	character; one of "block", for block gradient descent or "simple" for LBFGS optimization (default="block")
`objective`	character; "responsibility" for hard label self-learning or "label" for soft-label self-learning
`save_all`	logical; saves all classifiers trained during block gradient descent
`max_iter`	integer; maximum number of iterations

Details

By default (method="block") the weights of the classifier are updated, after which the unknown labels are updated. method="simple" uses LBFGS to do this update simultaneously. Objective="responsibility" corresponds to the responsibility based, instead of the label based, objective function in Krijthe & Loog (2016), which is equivalent to hard-label self-learning.

References

Krijthe, J.H. & Loog, M., 2016. Optimistic Semi-supervised Least Squares Classification. In International Conference on Pattern Recognition (To Appear).

Examples

library(dplyr)
library(ggplot2)

set.seed(1)

df <- generate2ClassGaussian(200,d=2,var=0.2) %>% 
 add_missinglabels_mar(Class~.,prob = 0.96)

# Soft-label vs. hard-label self-learning
classifiers <- list(
 "Supervised"=LeastSquaresClassifier(Class~.,df),
 "EM-Soft"=EMLeastSquaresClassifier(Class~.,df,objective="label"),
 "EM-Hard"=EMLeastSquaresClassifier(Class~.,df,objective="responsibility")
)

df %>% 
 ggplot(aes(x=X1,y=X2,color=Class)) +
 geom_point() +
 coord_equal() +
 scale_y_continuous(limits=c(-2,2)) +
 stat_classifier(aes(linetype=..classifier..),
                 classifiers=classifiers)

[Package RSSL version 0.9.7 Index]