data.gen.blobs {synthesis}R Documentation

Gaussian Blobs

Description

Gaussian Blobs

Usage

data.gen.blobs(
  nobs = 100,
  features = 2,
  centers = 3,
  sd = 1,
  bbox = c(-10, 10),
  do.plot = TRUE
)

Arguments

nobs

The data length to be generated.

features

Features of dataset.

centers

Either the number of centers, or a matrix of the chosen centers.

sd

The level of Gaussian noise, default 1.

bbox

The bounding box of the dataset.

do.plot

Logical value. If TRUE (default value), a plot of the generated Blobs is shown.

Details

This function generates a matrix of features creating multiclass datasets by allocating each class one or more normally-distributed clusters of points. It can control both centers and standard deviations of each cluster. For example, we want to generate a dataset of weight and height (two features) of 500 people (data length), including three groups, baby, children, and adult. Centers are the average weight and height for each group, assuming both weight and height are normally distributed (i.e. follow Gaussian distribution). The standard deviation (sd) is the sd of the Gaussian distribution while the bounding box (bbox) is the range for each generated cluster center when only the number of centers is given.

Value

A list of two variables, x and classes.

References

Amos Elberg (2018). clusteringdatasets: Datasets useful for testing clustering algorithms. R package version 0.1.1. https://github.com/elbamos/clusteringdatasets

Examples

Blobs=data.gen.blobs(nobs=1000, features=2, centers=3, sd=1, bbox=c(-10,10), do.plot=TRUE)

[Package synthesis version 1.2.5 Index]