gknn {e1071}R Documentation

Generalized k-Nearest Neighbors Classification or Regression

Description

gknn is an implementation of the k-nearest neighbours algorithm making use of general distance measures. A formula interface is provided.

Usage

## S3 method for class 'formula'
gknn(formula, data = NULL, ..., subset, na.action = na.pass, scale = TRUE)
## Default S3 method:
gknn(x, y, k = 1, method = NULL, 
                       scale = TRUE, use_all = TRUE, 
                       FUN = mean, ...)
## S3 method for class 'gknn'
predict(object, newdata, 
                         type = c("class", "votes", "prob"), 
                         ...,
                         na.action = na.pass)

Arguments

formula

a symbolic description of the model to be fit.

data

an optional data frame containing the variables in the model. By default the variables are taken from the environment which ‘gknn’ is called from.

x

a data matrix.

y

a response vector with one label for each row/component of x. Can be either a factor (for classification tasks) or a numeric vector (for regression).

k

number of neighbours considered.

scale

a logical vector indicating the variables to be scaled. If scale is of length 1, the value is recycled as many times as needed. By default, numeric matrices are scaled to zero mean and unit variance. The center and scale values are returned and used for later predictions. Note that the default metric for data frames is the Gower metric which standardizes the values to the unit interval.

method

Argument passed to dist() from the proxy package to select the distance metric used: a function, or a mnemonic string referencing the distance measure. Defaults to "Euclidean" for metric matrices, to "Jaccard" for logical matrices and to "Gower" for data frames.

use_all

controls handling of ties. If true, all distances equal to the kth largest are included. If false, a random selection of distances equal to the kth is chosen to use exactly k neighbours.

FUN

function used to aggregate the k nearest target values in case of regression.

object

object of class gknn.

newdata

matrix or data frame with new instances.

type

character specifying the return type in case of class predictions: for "class", the class labels; for "prob", the class distribution for all k neighbours considered; for "votes", the raw counts.

...

additional parameters passed to dist()

subset

An index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.)

na.action

A function to specify the action to be taken if NAs are found. The default action is na.pass. (NOTE: If given, this argument must be named.)

Value

For gknn(), an object of class "gknn" containing the data and the specified parameters. For predict.gknn(), a vector of predictions, or a matrix with votes for all classes. In case of an overall class tie, the predicted class is chosen by random.

Author(s)

David Meyer (David.Meyer@R-project.org)

See Also

dist (in package proxy)

Examples

data(iris)

model <- gknn(Species ~ ., data = iris)
predict(model, iris[c(1, 51, 101),])

test = c(45:50, 95:100, 145:150)

model <- gknn(Species ~ ., data = iris[-test,], k = 3, method = "Manhattan")
predict(model, iris[test,], type = "votes")

model <- gknn(Species ~ ., data = iris[-test], k = 3, method = "Manhattan")
predict(model, iris[test,], type = "prob")


[Package e1071 version 1.7-14 Index]