neighborhood {ECoL}R Documentation

Measures of neighborhood

Description

Classification task. The Neighborhood measures analyze the neighborhoods of the data items and try to capture class overlapping and the shape of the decision boundary. They work over a distance matrix storing the distances between all pairs of data points in the dataset.

Usage

neighborhood(...)

## Default S3 method:
neighborhood(x, y, measures = "all",
  summary = c("mean", "sd"), ...)

## S3 method for class 'formula'
neighborhood(formula, data, measures = "all",
  summary = c("mean", "sd"), ...)

Arguments

...

Not used.

x

A data.frame contained only the input attributes.

y

A factor response vector with one label for each row/component of x.

measures

A list of measures names or "all" to include all them.

summary

A list of summarization functions or empty for all values. See summarization method to more information. (Default: c("mean", "sd"))

formula

A formula to define the class column.

data

A data.frame dataset contained the input attributes and class.

Details

The following measures are allowed for this method:

"N1"

Fraction of borderline points (N1) computes the percentage of vertexes incident to edges connecting examples of opposite classes in a Minimum Spanning Tree (MST).

"N2"

Ratio of intra/extra class nearest neighbor distance (N2) computes the ratio of two sums: intra-class and inter-class. The former corresponds to the sum of the distances between each example and its closest neighbor from the same class. The later is the sum of the distances between each example and its closest neighbor from another class (nearest enemy).

"N3"

Error rate of the nearest neighbor (N3) classifier corresponds to the error rate of a one Nearest Neighbor (1NN) classifier, estimated using a leave-one-out procedure in dataset.

"N4"

Non-linearity of the nearest neighbor classifier (N4) creates a new dataset randomly interpolating pairs of training examples of the same class and then induce a the 1NN classifier on the original data and measure the error rate in the new data points.

"T1"

Fraction of hyperspheres covering data (T1) builds hyperspheres centered at each one of the training examples, which have their radios growth until the hypersphere reaches an example of another class. Afterwards, smaller hyperspheres contained in larger hyperspheres are eliminated. T1 is finally defined as the ratio between the number of the remaining hyperspheres and the total number of examples in the dataset.

"LSC"

Local Set Average Cardinality (LSC) is based on Local Set (LS) and defined as the set of points from the dataset whose distance of each example is smaller than the distance from the exemples of the different class. LSC is the average of the LS.

Value

A list named by the requested neighborhood measure.

References

Albert Orriols-Puig, Nuria Macia and Tin K Ho. (2010). Documentation for the data complexity library in C++. Technical Report. La Salle - Universitat Ramon Llull.

Enrique Leyva, Antonio Gonzalez and Raul Perez. (2014). A Set of Complexity Measures Designed for Applying Meta-Learning to Instance Selection. IEEE Transactions on Knowledge and Data Engineering 27, 2, 354–367.

See Also

Other complexity-measures: balance, correlation, dimensionality, linearity, network, overlapping, smoothness

Examples

## Extract all neighborhood measures for classification task
data(iris)
neighborhood(Species ~ ., iris)

[Package ECoL version 0.3.0 Index]