R: k-Nearest Neighbour Classification

kNN {liver}

R Documentation

k-Nearest Neighbour Classification

Description

kNN is used to perform k-nearest neighbour classification for test set using training set. For each row of the test set, the k nearest (based on Euclidean distance) training set vectors are found. Then, the classification is done by majority vote (ties broken at random). This function provides a formula interface to the knn function of R package class. In addition, it allows normalization of the given data using the transform function.

Usage

kNN( formula, train, test, k = 1, transform = FALSE, type = "class", l = 0, 
     use.all = TRUE, na.rm = FALSE )

Arguments

`formula`	a formula, with a response but no interaction terms. For the case of data frame, it is taken as the model frame (see `model.frame)`.
`train`	data frame or matrix of train set cases.
`test`	data frame or matrix of test set cases.
`k`	number of neighbours considered.
`transform`	a character with options `FALSE` (default), `"minmax"`, and `"zscore"`. Option `"minmax"` means no transformation. This option allows the users to use normalized version of the train and test sets for the kNN aglorithm.
`type`	either `"class"` (default) for the predicted class or `"prob"` for model confidence values.
`l`	minimum vote for definite decision, otherwise `doubt`. (More precisely, less than `k-l` dissenting votes are allowed, even if `k` is increased by ties.)
`use.all`	controls handling of ties. If true, all distances equal to the `k`th largest are included. If false, a random selection of distances equal to the `k`th is chosen to use exactly `k` neighbours.
`na.rm`	a logical value indicating whether NA values in `x` should be stripped before the computation proceeds.

Value

When type = "class" (default), a factor vector is returned, in which the doubt will be returned as NA. When type = "prob", a matrix of confidence values is returned (one column per class).

Author(s)

Reza Mohammadi a.mohammadi@uva.nl and Kevin Burke kevin.burke@ul.ie

References

Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge.

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.

Examples

data( risk )

train = risk[ 1:100, ]
test  = risk[   101, ]

kNN( risk ~ income + age, train = train, test = test )

[Package liver version 1.15 Index]