newdata {semiArtificial}R Documentation

Generate semi-artificial data using a generator

Description

Using a generator build with rbfDataGen or treeEnsemble the method generates size new instances.

Usage

## S3 method for class 'RBFgenerator'
newdata(object, size, var=c("estimated","Silverman"), 
                               classProb=NULL, defaultSpread=0.05, ... )
## S3 method for class 'TreeEnsemble'
newdata(object, fillData=NULL, 
                               size=ifelse(is.null(fillData),1,nrow(fillData)), 
                               onlyPath=FALSE, classProb=NULL, 
                               predictClass=FALSE, ...) 

Arguments

object

An object of class RBFgenerator or TreeEnsemble containing a generator structure as returned by rbfDataGen or treeEnsemble, respectively.

fillData

A dataframe with part of the values already specified. All missing values (i.e. NA values) are filled in by the generator.

size

A number of instances to generate. By default this is one instance, or in the case of existing fillData this is the number of rows in that dataframe.

var

For the generator of type RBFgenerator the parameter var determines the method of kernel width (variance) estimation. Supported options are "estimated" and "Silverman".

classProb

For classification problems, a vector of desired class value probability distribution. Default value classProb=NULL uses probability distribution of the generator's training instances.

defaultSpread

For the generator of type RBFgenerator the parameter is a numeric value replacing zero spread in case var="estimated" is used. The value defaultSpread=NULL keeps zero spread values.

onlyPath

For the generator of type TreeEnsemble and attribute density data in the leaves (densityData="leaf"), the parameter is a boolean variable indicating if only attributes on the path from the root to the leaf are generated in the leaf. If onlyPath=FALSE all value are generated in the first randomly chosen leaf of a tree, else only attributes on the path are generated and then the next random tree is selected.

predictClass

For classification problems and the generator of type TreeEnsemble the parameter determines if the class value is set through prediction with the forest (the constructed generator serves as a predictor) or set according to the class value distribution of the selected leaf.

...

Additional parameters passed to density estimation functions kde, logspline, and quantile.

Details

The function uses the object structure as returned by rbfDataGen or treeEnsemble. In case of RBFgenerator the object contains descriptions of the Gaussian kernels, which model the original data. The kernels are used to generate a required number of new instances. The kernel width of provided kernels can be set in two ways. By setting var="estimated" the estimated spread of the training instances that have the maximal activation value for the particular kernel is used. Using var="Silverman" width is set by the generalization of Silverman's rule of thumb to multivariate case (unreliable for larger dimensions).

In case of TreeEnsemble generator no additional parameters are needed, except for the number of generated instances.

Value

The method returns a data.frame object with required number of instances.

Author(s)

Marko Robnik-Sikonja

See Also

rbfDataGen, treeEnsemble.

Examples

# inspect properties of the iris data set
plot(iris, col=iris$Species)
summary(iris)

# create RBF generator
irisRBF<- rbfDataGen(Species~.,iris)
# create treesemble  generator
irisEnsemble<- treeEnsemble(Species~.,iris,noTrees=10)


# use the generator to create new data with both generators
irisNewRBF <- newdata(irisRBF, size=150)
irisNewEns <- newdata(irisEnsemble, size=150)

#inspect properties of the new data
plot(irisNewRBF, col = irisNewRBF$Species) #plot generated data
summary(irisNewRBF)
plot(irisNewEns, col = irisNewEns$Species) #plot generated data
summary(irisNewEns)

[Package semiArtificial version 2.4.1 Index]