update.unsupervised {randomUniformForest}R Documentation

Update Unsupervised Learning object

Description

Update unsupervised learning object with new data in order to achieve incremental learning. New MDS (or spectral) points are predicted with new data and learning of MDS (spectral) points of the former unsupervised object. Note that new data are expected to have the same distribution than previous ones.

Usage

## S3 method for class 'unsupervised'
update(object, X, 
	oldData = NULL, 
	mapAndReduce = FALSE, 
	updateModel = FALSE, 
	...)

Arguments

object

(vector of) objects of class unsupervised, coming from unsupervised.randomUniformForest, that needs to be updated.

X

new data frame or matrix with same variables than the ones used to build 'object'.

oldData

former data frame or matrix needed, if 'object' was built from a small dataset (less than 10000 rows).

mapAndReduce

if TRUE, 'X' will be learned by chunks that will be combined to build the forest classifier for the MDS points.

updateModel

if TRUE, the resulted object will also embed a learned model of all the MDS points (hence learning its own predictions). Note that this option is recommended to be enabled since it will allow to assess the resulted model against, for example, any unsupervised algorithm, or itself, based on a subsample of 'X'.

...

all hyper parameters of randomUniformForest.

Value

An object of class unsupervised, which is a list with the following components:

proximityMatrix

the resulted dissimilarity matrix.

MDSModel

the resulted Multidimensional scaling model.

unsupervisedModel

the resulted unsupervised model with clustered observations in unsupervisedModel$cluster.

largeDataLearningModel

if the dataset is large, the resulted model that learned a sample of the MDS points, and predicted others points.

gapStatistics

if K-means algorithm has been called, the results of the gap statistic. Otherwise NULL.

rUFObject

Random Uniform Forests object.

nbClusters

Number of clusters found.

params

options of the model.

Author(s)

Saip Ciss saip.ciss@wanadoo.fr

See Also

combineUnsupervised, modifyClusters, mergeClusters, clusteringObservations, as.supervised

Examples

## not run
## Water Treatment Plant Data Set
## Data can be download at https://archive.ics.uci.edu/ml/datasets/Water+Treatment+Plant

# URL = "http://archive.ics.uci.edu/ml/machine-learning-databases/water-treatment/"
# dataset = "water-treatment.data"

# X = read.table(paste(URL, dataset, sep= ""), sep = ",")

## 1- Preprocessing
## first, look at the first column and format date
#  Dates = rm.string(as.character(X[,1]), "D-")
#  DatesAsStringTable = do.call(rbind, strsplit(Dates, "/"))
#  DatesasNumericTable = t(apply(DatesAsStringTable, 1, as.numeric))

##  Then, transform data as a pure R matrix and add new dates
#  XX = as.true.matrix(X)[,-1]
#  XX = cbind(DatesasNumericTable, XX)
#  colnames(XX)[1:3] = c("day", "month", "year")

# Look the new data
# head(XX)
# str(XX)

## and fill missing values,
# X.imputed = fillNA2.randomUniformForest(XX)

## 2 - run unsupervised analysis on the first half of dataset 
# subset.1 = 1:floor(nrow(X.imputed)/2)
# WaterTreatment.model.1 = unsupervised.randomUniformForest(X.imputed, subset = subset.1, 
# baseModel = "proximityThenDistance",  seed = 2014)

## assess roughly the model and visualize
#  WaterTreatment.model.1

## 3 - update model with the second half of dataset
# WaterTreatment.updated = update.unsupervised(WaterTreatment.model.1, 
# X.imputed[-subset.1,], oldData = X.imputed[subset.1,])

# WaterTreatment.updated

## view how MDS points have been learned :
## first component
# WaterTreatment.updated$largeDataLearningModel[[1]]

## second component
# WaterTreatment.updated$largeDataLearningModel[[2]]

[Package randomUniformForest version 1.1.6 Index]