cleanData {semiArtificial} | R Documentation |
Rejection of new instances based on their distance to existing instances
Description
The function contains three data cleaning methods,
the first two reject instances whose distance to their nearest neighbors in the existing data are too small
or too large. The first checks distance between instances disregarding class,
the second checks distances between instances taking only instances from the same class into account.
The third method reassigns response variable using the prediction model stored in the generator teObject
.
Usage
cleanData(teObject, newdat, similarDropP=NA, dissimilarDropP=NA,
similarDropPclass=NA, dissimilarDropPclass=NA,
nearestInstK=1, reassignResponse=FALSE, cleaningObject=NULL)
Arguments
teObject |
An object of class |
newdat |
A |
similarDropP |
With numeric parameters |
dissimilarDropP |
See |
similarDropPclass |
For classification problems only and similarly to the |
dissimilarDropPclass |
See |
nearestInstK |
An integer with default value of 1, controls how many generator's training instances we take into account when computing the distance distribution of nearest instances. |
reassignResponse |
is a |
cleaningObject |
is a list object with a precomputed distance distributions and predictor from previous runs of the same function. If provided, this saves computation time. |
Details
The function uses the training instances stored in the generator teObject
to compute distribution of distances from instances to their
nearestInstK
nearest instances. For classification problems the distributions can also be computed only for instances from the same class.
Using these near distance distributions the function rejects all instances too close or too far away from existing instances.
The default value of similarDropP
, dissimilarDropP
, similarDropPclass
, and dissimilarDropPclass
is NA and means that
the near/far values are not rejected. The same effect has value 0 for similarDropP
and similarDropPclass
, and value 1 for
dissimilarDropP
and dissimilarDropPclass
.
Value
The method returns a list
object with two components:
cleanData |
is a |
cleaningObject |
is a |
Author(s)
Marko Robnik-Sikonja
See Also
treeEnsemble
, newdata.TreeEnsemble
.
Examples
# inspect properties of the iris data set
plot(iris, col=iris$Species)
summary(iris)
irisEnsemble<- treeEnsemble(Species~.,iris,noTrees=10)
# use the generator to create new data with the generator
irisNewEns <- newdata(irisEnsemble, size=150)
#inspect properties of the new data
plot(irisNewEns, col = irisNewEns$Species) #plot generated data
summary(irisNewEns)
clObj <- cleanData(irisEnsemble, irisNewEns, similarDropP=0.05, dissimilarDropP=0.95,
similarDropPclass=0.05, dissimilarDropPclass=0.95,
nearestInstK=1, reassignResponse=FALSE, cleaningObject=NULL)
head(clObj$cleanData)