R: Generalized k Nearest Neighbors

gknn {scrime}

R Documentation

Generalized k Nearest Neighbors

Description

Predicts the classes of new observations with k Nearest Neighbors based on an user-specified distance measure.

Usage

gknn(data, cl, newdata, nn = 5, distance = NULL, use.weights = FALSE, ...)

Arguments

`data`	a numeric matrix in which each row represents an observation and each column a variable. If `distance` is `"smc"`, `"cohen"` or `"pcc"`, the values in `data` must be integers between 1 and `n_{cat}`, where `n_{cat}` is the maximum number of levels one of the variables can take. Missing values are allowed.
`cl`	a numeric vector of length `nrow(data)` giving the class labels of the observations represented by the rows of `data`. `cl` must consist of integers between 1 and `n_{cl}`, where `n_{cl}` is the number of groups.
`newdata`	a numeric matrix in which each row represents a new observation for which the class label should be predicted and each column consists of the same variable as the corresponding column of `data`.
`nn`	an integer specifying the number of nearest neighbors used to classify the new observations.
`distance`	character vector naming the distance measure used to identify the `nn` nearest neighbors. Must be one of `"smc"`, `"cohen"`, `"pcc"`, `"euclidean"`, `"maximum"`, `"manhattan"`, `"canberra"`, and `"minkowski"`. If `NULL`, it is determined in an ad hoc way if the data seems to be categorical. If this is the case `distance` is set to `"smc"`. Otherwise, it is set to `"euclidean"`.
`use.weights`	should the votes of the nearest neighbors be weighted by the reciprocal of the distances to the new observation when the class of a new observation should be predicted?
`...`	further arguments for the distance measure. If, e.g., `distance = "minkowski"`, then `p` can also be specified, see `dist`. If `distance = "pcc"`, then `version` can also be specified, see `pcc`.

Value

The predicted classes of the new observations.

Author(s)

Holger Schwender, holger.schwender@udo.edu

References

Schwender, H.\ (2007). Statistical Analysis of Genotype and Gene Expression Data. Dissertation, Department of Statistics, University of Dortmund.

Examples

## Not run: 
# Using the example from the function knn.

library(class)
data(iris3)
train <- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3])
test <- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3])
cl <- c(rep(2, 25), rep(1, 25), rep(1, 25))

knn.out <- knn(train, test, as.factor(cl), k = 3, use.all = FALSE)
gknn.out <- gknn(train, cl, test, nn = 3)

# Both applications lead to the same predictions.

knn.out == gknn.out

# But gknn allows to use other distance measures than the Euclidean 
# distance. E.g., the Manhattan distance.

gknn(train, cl, test, nn = 3, distance = "manhattan")


## End(Not run)

[Package scrime version 1.3.5 Index]