correlation {ECoL} | R Documentation |
Measures of feature correlation
Description
Regression task. These measures calculate the correlation of the values of the features to the outputs. If at least one feature is highly correlated to the output, this indicates that simpler functions can be fitted to the data.
Usage
correlation(...)
## Default S3 method:
correlation(x, y, measures = "all",
summary = c("mean", "sd"), ...)
## S3 method for class 'formula'
correlation(formula, data, measures = "all",
summary = c("mean", "sd"), ...)
Arguments
... |
Not used. |
x |
A data.frame contained only the input attributes. |
y |
A response vector with one value for each row/component of x. |
measures |
A list of measures names or |
summary |
A list of summarization functions or empty for all values. See
summarization method to more information. (Default:
|
formula |
A formula to define the output column. |
data |
A data.frame dataset contained the input and output attributes. |
Details
The following measures are allowed for this method:
- "C1"
Maximum feature correlation to the output (C1) calculate the maximum absolute value of the Spearman correlation between each feature and the outputs.
- "C2"
Average feature correlation to the output (C2) computes the average of the Spearman correlations of all features to the output.
- "C3"
Individual feature efficiency (C3) calculates, for each feature, the number of examples that must be removed from the dataset until a high Spearman correlation value to the output is achieved.
- "C4"
Collective feature efficiency (C4) computes the ratio of examples removed from the dataset based on an iterative process of linear fitting between the features and the target attribute.
Value
A list named by the requested correlation measure.
References
Ana C Lorena and Aron I Maciel and Pericles B C Miranda and Ivan G Costa and Ricardo B C Prudencio. (2018). Data complexity meta-features for regression problems. Machine Learning, 107, 1, 209–246.
See Also
Other complexity-measures: balance
,
dimensionality
, linearity
,
neighborhood
, network
,
overlapping
, smoothness
Examples
## Extract all correlation measures for regression task
data(cars)
correlation(speed ~ ., cars)