gendata {ider}R Documentation

Data generator for intrinsic dimension estimation.

Description

gendata generates various artificial datasets for intrinsic dimension estimation experiments.

Usage

gendata(
  DataName = "SwissRoll",
  n = 300,
  p = NULL,
  noise = NULL,
  ol = NULL,
  curv = 1,
  seed = 123,
  sorted = FALSE
)

Arguments

DataName

Name of dataset, one of the following:

  • SwissRoll: SwissRoll data, 2D manifold in 3D space.

  • NDSwissRoll: Non-deformable SwissRoll data, 2D manifold in 3D space.

  • Moebius: Moebius strip, 2D manifold in 3D space.

  • SphericalShell: Spherical Shell, (p-1)-dimensional manifold in p-dimensional space.

  • Sinusoidal: Sinusoidal data, 1D manifold in 3D space.

  • Spiral: Spiral-shaped 1D manifold in 2D space.

  • Cylinder: Cylinder-shaped 2D manifold in 3D space.

  • SShape: S-shaped 2D manifold in 3D space.

  • ldbl: LDB(line - disc - filled ball - line), embedded in 3D space (original dataset).

n

number of data points to be generated.

p

ambient dimension of the dataset.

noise

parameter to control noise level in the dataset. In many cases, it is used for sd of rnorm used inside the function.

ol

percentage of outliers, i.e., n * ol outliers are added to the generated dataset.

curv

a parameter to control the complexity of the embedded manifold.

seed

random number seed.

sorted

logical. If TRUE, the index of the generated dataset is sorted with respect to x-axis for the ease of visualization.

Details

This function generates various artificial datasets often used in manifold learning and dimension estimation researches. For some datasets, complexity of the shape is controlled by the parameter curv. The parameters noise and outlier are used for adding noise and/or outliers for the dataset.

Value

Data matrix. For ldbl dataset, it outputs a list composed of x: data matrix and tDim: true intrinsic dimension for each point.

Author(s)

Hideitsu Hino hideitsu.hino@gmail.com

Examples

## global intrinsic dimension estimate
x <- gendata(DataName='SwissRoll')
estmle <- lbmle(x=x,k1=3,k2=5)
print(estmle)

## local intrinsic dimension estimate
tmp <- gendata(DataName='ldbl',n=1000)
x <- tmp$x
estmada <- mada(x=x,local=TRUE)
head(estmada)  ## estimated local intrinsic dimensions
head(tmp$tDim) ## true local intrinsic dimensions

[Package ider version 0.1.1 Index]