rqda {dawai} | R Documentation |
Restricted Quadratic Discriminant Analysis
Description
Build quadratic classification rules with additional information expressed as inequality restrictions among the populations means.
Usage
rqda(x, ...)
## S3 method for class 'matrix'
rqda(x, ...)
## S3 method for class 'data.frame'
rqda(x, grouping, ...)
## S3 method for class 'formula'
rqda(formula, data, ...)
## Default S3 method:
rqda(x, grouping, subset = NULL, resmatrix = NULL, restext = NULL,
gamma = c(0, 1), prior = NULL, ...)
Arguments
formula |
A formula of the form |
data |
Data frame from which variables specified in |
x |
(Required if no formula is given as the principal argument.) A data frame or matrix containing the explanatory variables. |
grouping |
(Required if no formula is given as the principal argument.) A numeric vector or factor with numeric levels specifying the class for each observation. |
subset |
An index vector specifying the cases to be used in the training sample. |
resmatrix |
A matrix specifying the linear restrictions on the mean vectors: |
restext |
(Required if no |
gamma |
A vector of values in the unit interval that determine the classification rules with additional information (see references). |
prior |
The prior probabilities of class membership. If unspecified, the class proportions for the training set are used. If present, the probabilities must be specified in the order of the factor levels. |
... |
Arguments passed to or from other methods. |
Details
Specifying the prior
will affect the classification and error unless over-ridden in predict.rlda
and err.est.rlda
, respectively.
Value
An object of class 'rqda'
containing the following components:
call |
The (matched) function call. |
trainset |
Matrix with the training set used (first columns) and the class for each observation (last column). |
restrictions |
Edited character string with the linear restrictions on the mean vectors detailed. |
resmatrix |
The matrix with the restrictions on the mean vectors used. |
prior |
Prior probabilities of class membership used. |
counts |
The number of observations of the classes used. |
N |
The total number of observations used. |
samplemeans |
Matrix with the sample means in rows. |
samplevariances |
Array with the sample covariance matrices of the classes. |
gamma |
Gamma values used. |
estimatedmeans |
Array with the estimated means for each classification rule. |
apparent |
Apparent error rate for each classification rule. |
Note
This function may be called using either a formula and data frame, or a data frame and grouping factor, or a matrix and grouping factor as the first two arguments. All other arguments are optional.
Classes must be identified, either in a column of data
or in the grouping
vector, by natural numbers varying from 1 to the number of classes. The number of classes must be greater than 1.
If there are missing values in either data
, x
or grouping
, corresponding observations will be deleted.
To overcome singularity of the covariance matrices, the number of observations in each class must be greater or equal than the number of explanatory variables.
Author(s)
David Conde
References
Conde, D., Fernandez, M. A., Rueda, C., and Salvador, B. (2012). Classification of samples into two or more ordered populations with application to a cancer trial. Statistics in Medicine, 31, 3773-3786.
Conde, D., Fernandez, M. A., Salvador, B., and Rueda, C. (2015). dawai: An R Package for Discriminant Analysis with Additional Information. Journal of Statistical Software, 66(10), 1-19. URL http://www.jstatsoft.org/v66/i10/.
Fernandez, M. A., Rueda, C., Salvador, B. (2006). Incorporating additional information to normal linear discriminant rules. Journal of the American Statistical Association, 101, 569-577.
See Also
predict.rqda
, err.est.rqda
, rlda
, predict.rlda
, err.est.rlda
Examples
data(Vehicle2)
levels(Vehicle2$Class)
## "bus" "opel" "saab" "van"
data <- Vehicle2[, 1:4]
grouping = Vehicle2$Class
levels(grouping) <- c(4, 2, 1, 3)
## classes ordered by increasing size
##
## according to variable definitions, we can consider
## the following restrictions on the means vectors:
## mu11 >= mu21 >= mu31 >= mu41
## mu12 >= mu22 >= mu32 >= mu42
## mu13 >= mu23 >= mu33 >= mu43
##
## we can specify these restrictions by restext = "s>1,2,3"
set.seed(7964)
values <- runif(dim(data)[1])
trainsubset <- values < 0.2
obj <- rqda(data, grouping, subset = trainsubset,
gamma = (1:5)/5, restext = "s>1,2,3")
obj
## we can see that the apparent error rate of the restricted
## rules increase with gamma:
## gamma=0.2 gamma=0.4 gamma=0.6 gamma=0.8 gamma=1
## 30.40936 30.99415 30.99415 30.99415 31.57895