artificial.data {rmcfs} | R Documentation |
Creates artificial dataset
Description
Creates data.frame
with artificial data. The last six columns are nominal and highly correlated to feature 'class'. This data set consists of objects from 3 classes, A, B and C, that contain 40, 20, 10 objects, respectively (70 objects altogether). For each object, 6 binary features (A1, A2, B1, B2, C1 and C2) are created and they are 'ideally' or 'almost ideally' correlated with class feature. If an object's 'class' equals 'A', then its features A1 and A2 are set to class value 'A'; otherwise A1 = A2 = 0. If an object's 'class' is 'B' or 'C', the processing is analogous, but some random corruption is introduced. For 2 observations from class 'B' and both attributes B1/B2, their values 'B' are replaced by '0'. For 4 observations from class 'C' and both attributes C1/C2, their values 'C' are replaced by '0'. The number of corrupted values for each class is defined by corruption
parameter. The data also contains additional rnd_features = 500
random numerical features with uniformly [0,1] distributed values.
Usage
artificial.data(rnd_features = 500, size = c(40, 20, 10),
corruption = c(0, 2, 4), seed = NA)
Arguments
rnd_features |
number of numerical random features. |
size |
size of classes A, B, and C. |
corruption |
defines the number of corrupted values for a pairs of columns A1/A2, B1/B2, C1/C2, |
seed |
seed for random number generator. |
Value
data.frame with six important features.
Examples
d <- artificial.data(rnd_features = 500)
showme(d)