R: A data generator based on RBF network

rbfDataGen {semiArtificial}

R Documentation

A data generator based on RBF network

Description

Using given formula and data the method builds a RBF network and extracts its properties thereby preparing a data generator which can be used with newdata.RBFgenerator method to generate semi-artificial data.

Usage

 rbfDataGen(formula, data, eps=1e-4, minSupport=1, 
            nominal=c("encodeBinary","asInteger"))

Arguments

`formula`	A formula specifying the response and variables to be modeled.
`data`	A data frame with training data.
`eps`	The minimal probability considered in data generator to be larger than 0.
`minSupport`	The minimal number of instances defining a Gaussian kernel to copy the kernel to the data generator.
`nominal`	The way how to treat nominal features. The option `"asInteger"` converts factors into integers and treats them as numeric features. The option `"encodeBinary"` converts each nominal attribute into a set of binary features, which encode the nominal value, e.g., for three valued attribute three binary attributes are constructed, each encoding a presence of one nominal value with 0 or 1.

Details

Parameter formula is used as a mechanism to select features (attributes) and the prediction variable (response, class). Only simple terms can be used and interaction terms are not supported. The simplest way is to specify just the response variable using e.g. class ~ .. See examples below.

A RBF network is build using rbfDDA from RSNNS package. The learned Gaussian kernels are extracted and used in data generation with newdata.RBFgenerator method.

Value

The created model is returned as a structure of class RBFgenerator, containing the following items:

`noGaussians`	The number of extracted Gaussian kernels.
`centers`	A matrix of Gaussian kernels' centers, with one row for each Gaussian kernel.
`probs`	A vector of kernel probabilities. Probabilities are defined as relative frequencies of training set instances with maximal activation in the given kernel.
`unitClass`	A vector of class values, one for each kernel.
`bias`	A vector of kernels' biases, one for each kernel. The bias is multiplied by the kernel activation to produce output value of given RBF network unit.
`spread`	A matrix of estimated variances for the kernels, one row for each kernel. The j-th value in i-th row represents the variance of training instances for j-th attribute with maximal activation in i-th Gaussian.
`gNoActivated`	A vector containing numbers of training instances with maximal activation in each kernel.
`noAttr`	The number of attributes in training data.
`datNames`	A vector of attributes' names.
`originalNames`	A vector of original attribute names.
`attrClasses`	A vector of attributes' classes (i.e., data types like `numeric` or `factor`).
`attrLevels`	A list of levels for discrete attributes (with class `factor`).
`attrOrdered`	A vector of type logical indicating whether the attribute is `ordered` (only possible for attributes of type `factor`.
`normParameters`	A list of parameters for normalization of attributes to [0,1].
`noCol`	The number of columns in the internally generated data set.
`isDiscrete`	A vector of type logical, each value indicating whether a respective attribute is discrete.
`noAttrGen`	The number of attributes to generate.
`nominal`	The value of parameter `nominal`.

Author(s)

Marko Robnik-Sikonja

References

Marko Robnik-Sikonja: Not enough data? Generate it!. Technical Report, University of Ljubljana, Faculty of Computer and Information Science, 2014

Other references are available from http://lkm.fri.uni-lj.si/rmarko/papers/

Examples

# use iris data set, split into training and testing, inspect the data
set.seed(12345)
train <- sample(1:nrow(iris),size=nrow(iris)*0.5)
irisTrain <- iris[train,]
irisTest <- iris[-train,]

# inspect properties of the original data
plot(irisTrain, col=irisTrain$Species)
summary(irisTrain)

# create rbf generator
irisGenerator<- rbfDataGen(Species~.,irisTrain)

# use the generator to create new data
irisNew <- newdata(irisGenerator, size=200)

#inspect properties of the new data
plot(irisNew, col = irisNew$Species) #plot generated data
summary(irisNew)

[Package semiArtificial version 2.4.1 Index]