R: Generalized k-Nearest Neighbor Classification

gknn {cba}

R Documentation

Generalized k-Nearest Neighbor Classification

Description

Compute the k-nearest neighbor classification given a matrix of cross-distances and a factor of class values. For each row the majority class is found, where ties are broken at random (default). If there are ties for the kth nearest neighbor, all candidates are included in the vote (default).

Usage

gknn(x, y, k = 1, l = 0, break.ties = TRUE, use.all = TRUE,
     prob = FALSE)

Arguments

`x`	a cross-distances matrix.
`y`	a factor of class values of the columns of `x`.
`k`	number of nearest neighbors to consider.
`l`	minimum number of votes for a definite decision.
`break.ties`	option to break ties.
`use.all`	option to consider all neighbors that are tied with the kth neighbor.
`prob`	optionally return proportions of winning votes.

Details

The rows of the cross-distances matrix are interpreted as referencing the test samples and the columns as referencing the training samples.

The options are fashioned after knn in package class but are extended for tie breaking of votes, e.g. if only definite (majority) votes are of interest.

Missing class values are not allowed because that would collide with a missing classification result.

Missing distance values are ignored but with the possible consequence of missing classification results. Note that this depends on the options settings, e.g.

Value

Returns a factor of class values (of the rows of x) which may be NA in the case of doubt (no definite decision), ties, or missing neighborhood information.

The proportions of winning votes are returned as attribute prob (if option prob was used).

Author(s)

Christian Buchta

Examples

## Not run: 
### extend Rock example
data(Votes)
x <- as.dummy(Votes[-17])
rc <- rockAll(x, n=2, m=100, theta=0.73, predict=FALSE, debug=TRUE)
gc <- gknn(dist(x, rc$y, method="binary"), rc$cl, k=3)
table(gc[rc$s], rc$cl)

## End(Not run)

[Package cba version 0.2-24 Index]