dimensionality {ECoL} | R Documentation |
Measures of dimensionality
Description
These measures give an indicative of data sparsity. They capture how sparse a datasets tend to have regions of low density. These regions are know to be more difficult to extract good classification and regression models.
Usage
dimensionality(...)
## Default S3 method:
dimensionality(x, y, measures = "all", ...)
## S3 method for class 'formula'
dimensionality(formula, data, measures = "all", ...)
Arguments
... |
Not used. |
x |
A data.frame contained only the input attributes. |
y |
A response vector with one value for each row/component of x. |
measures |
A list of measures names or |
formula |
A formula to define the output column. |
data |
A data.frame dataset contained the input and output attributes. |
Details
The following measures are allowed for this method:
- "T2"
Average number of points per dimension (T2) is given by the ratio between the number of examples and dimensionality of the dataset.
- "T3"
Average number of points per PCA (T3) is similar to T2, but uses the number of PCA components needed to represent 95 variability as the base of data sparsity assessment.
- "T4"
Ratio of the PCA Dimension to the Original (T4) estimates the proportion of relevant and the original dimensions for a dataset.
Value
A list named by the requested dimensionality measure.
References
Ana C Lorena, Ivan G Costa, Newton Spolaor and Marcilio C P Souto. (2012). Analysis of complexity indices for classification problems: Cancer gene expression data. Neurocomputing 75, 1, 33–42.
See Also
Other complexity-measures: balance
,
correlation
, linearity
,
neighborhood
, network
,
overlapping
, smoothness
Examples
## Extract all dimensionality measures for classification task
data(iris)
dimensionality(Species ~ ., iris)
## Extract all dimensionality measures for regression task
data(cars)
dimensionality(speed ~ ., cars)