dimensionality {ECoL}R Documentation

Measures of dimensionality

Description

These measures give an indicative of data sparsity. They capture how sparse a datasets tend to have regions of low density. These regions are know to be more difficult to extract good classification and regression models.

Usage

dimensionality(...)

## Default S3 method:
dimensionality(x, y, measures = "all", ...)

## S3 method for class 'formula'
dimensionality(formula, data, measures = "all", ...)

Arguments

...

Not used.

x

A data.frame contained only the input attributes.

y

A response vector with one value for each row/component of x.

measures

A list of measures names or "all" to include all them.

formula

A formula to define the output column.

data

A data.frame dataset contained the input and output attributes.

Details

The following measures are allowed for this method:

"T2"

Average number of points per dimension (T2) is given by the ratio between the number of examples and dimensionality of the dataset.

"T3"

Average number of points per PCA (T3) is similar to T2, but uses the number of PCA components needed to represent 95 variability as the base of data sparsity assessment.

"T4"

Ratio of the PCA Dimension to the Original (T4) estimates the proportion of relevant and the original dimensions for a dataset.

Value

A list named by the requested dimensionality measure.

References

Ana C Lorena, Ivan G Costa, Newton Spolaor and Marcilio C P Souto. (2012). Analysis of complexity indices for classification problems: Cancer gene expression data. Neurocomputing 75, 1, 33–42.

See Also

Other complexity-measures: balance, correlation, linearity, neighborhood, network, overlapping, smoothness

Examples

## Extract all dimensionality measures for classification task
data(iris)
dimensionality(Species ~ ., iris)

## Extract all dimensionality measures for regression task
data(cars)
dimensionality(speed ~ ., cars)

[Package ECoL version 0.3.0 Index]