rbfDataGen {semiArtificial} | R Documentation |
A data generator based on RBF network
Description
Using given formula
and data
the method builds a RBF network and extracts its properties thereby preparing a data generator which can be used
with newdata.RBFgenerator
method to generate semi-artificial data.
Usage
rbfDataGen(formula, data, eps=1e-4, minSupport=1,
nominal=c("encodeBinary","asInteger"))
Arguments
formula |
A formula specifying the response and variables to be modeled. |
data |
A data frame with training data. |
eps |
The minimal probability considered in data generator to be larger than 0. |
minSupport |
The minimal number of instances defining a Gaussian kernel to copy the kernel to the data generator. |
nominal |
The way how to treat nominal features. The option |
Details
Parameter formula
is used as a mechanism to select features (attributes)
and the prediction variable (response, class). Only simple terms can be used and
interaction terms are not supported. The simplest way is
to specify just the response variable using e.g. class ~ .
. See examples below.
A RBF network is build using rbfDDA
from RSNNS package.
The learned Gaussian kernels are extracted and used in data generation with
newdata.RBFgenerator
method.
Value
The created model is returned as a structure of class RBFgenerator
, containing the following items:
noGaussians |
The number of extracted Gaussian kernels. |
centers |
A matrix of Gaussian kernels' centers, with one row for each Gaussian kernel. |
probs |
A vector of kernel probabilities. Probabilities are defined as relative frequencies of training set instances with maximal activation in the given kernel. |
unitClass |
A vector of class values, one for each kernel. |
bias |
A vector of kernels' biases, one for each kernel. The bias is multiplied by the kernel activation to produce output value of given RBF network unit. |
spread |
A matrix of estimated variances for the kernels, one row for each kernel. The j-th value in i-th row represents the variance of training instances for j-th attribute with maximal activation in i-th Gaussian. |
gNoActivated |
A vector containing numbers of training instances with maximal activation in each kernel. |
noAttr |
The number of attributes in training data. |
datNames |
A vector of attributes' names. |
originalNames |
A vector of original attribute names. |
attrClasses |
A vector of attributes' classes (i.e., data types like |
attrLevels |
A list of levels for discrete attributes (with class |
attrOrdered |
A vector of type logical indicating whether the attribute is |
normParameters |
A list of parameters for normalization of attributes to [0,1]. |
noCol |
The number of columns in the internally generated data set. |
isDiscrete |
A vector of type logical, each value indicating whether a respective attribute is discrete. |
noAttrGen |
The number of attributes to generate. |
nominal |
The value of parameter |
Author(s)
Marko Robnik-Sikonja
References
Marko Robnik-Sikonja: Not enough data? Generate it!. Technical Report, University of Ljubljana, Faculty of Computer and Information Science, 2014
Other references are available from http://lkm.fri.uni-lj.si/rmarko/papers/
See Also
Examples
# use iris data set, split into training and testing, inspect the data
set.seed(12345)
train <- sample(1:nrow(iris),size=nrow(iris)*0.5)
irisTrain <- iris[train,]
irisTest <- iris[-train,]
# inspect properties of the original data
plot(irisTrain, col=irisTrain$Species)
summary(irisTrain)
# create rbf generator
irisGenerator<- rbfDataGen(Species~.,irisTrain)
# use the generator to create new data
irisNew <- newdata(irisGenerator, size=200)
#inspect properties of the new data
plot(irisNew, col = irisNew$Species) #plot generated data
summary(irisNew)