data.gen.blobs {synthesis} | R Documentation |
Gaussian Blobs
Description
Gaussian Blobs
Usage
data.gen.blobs(
nobs = 100,
features = 2,
centers = 3,
sd = 1,
bbox = c(-10, 10),
do.plot = TRUE
)
Arguments
nobs |
The data length to be generated. |
features |
Features of dataset. |
centers |
Either the number of centers, or a matrix of the chosen centers. |
sd |
The level of Gaussian noise, default 1. |
bbox |
The bounding box of the dataset. |
do.plot |
Logical value. If TRUE (default value), a plot of the generated Blobs is shown. |
Details
This function generates a matrix of features creating multiclass datasets by allocating each class one or more normally-distributed clusters of points. It can control both centers and standard deviations of each cluster. For example, we want to generate a dataset of weight and height (two features) of 500 people (data length), including three groups, baby, children, and adult. Centers are the average weight and height for each group, assuming both weight and height are normally distributed (i.e. follow Gaussian distribution). The standard deviation (sd) is the sd of the Gaussian distribution while the bounding box (bbox) is the range for each generated cluster center when only the number of centers is given.
Value
A list of two variables, x and classes.
References
Amos Elberg (2018). clusteringdatasets: Datasets useful for testing clustering algorithms. R package version 0.1.1. https://github.com/elbamos/clusteringdatasets
Examples
Blobs=data.gen.blobs(nobs=1000, features=2, centers=3, sd=1, bbox=c(-10,10), do.plot=TRUE)