R: Polyclass: polychotomous regression and multiple...

polyclass {polspline}

R Documentation

Polyclass: polychotomous regression and multiple classification

Description

Fit a polychotomous regression and multiple classification using linear splines and selected tensor products.

Usage

polyclass(data, cov, weight, penalty, maxdim, exclude, include,
additive = FALSE, linear, delete = 2, fit,  silent = TRUE, 
normweight = TRUE, tdata, tcov, tweight, cv, select, loss, seed)

Arguments

`data`	vector of classes: `data` should ranges over consecutive integers with 0 or 1 as the minimum value.
`cov`	covariates: matrix with as many rows as the length of `data`.
`weight`	optional vector of case-weights. Should have the same length as `data`.
`penalty`	the parameter to be used in the AIC criterion if the model selection is carried out by AIC. The program chooses the number of knots that minimizes `-2 * loglikelihood + penalty * (dimension)`. The default is to use `penalty = log(length(data))` as in BIC. If the model selection is carried out by cross-validation or using a test set, the program uses the number of knots that minimizes `loss + penalty * dimension * (loss for smallest model)`. In this case the default of `penalty` is 0.
`maxdim`	maximum dimension (default is `\min(n, 4 * n^{1/3}*(cl-1)`, where `n` is `length(data)` and `cl` the number of classes.
`exclude`	combinations to be excluded - this should be a matrix with 2 columns - if for example `exclude[1, 1] = 2` and `exclude[1, 2] = 3` no interaction between covariate 2 and 3 is included. 0 represents time.
`include`	those combinations that can be included. Should have the same format as `exclude`. Only one of `exclude` and `include` can be specified .
`additive`	should the model selection be restricted to additive models?
`linear`	vector indicating for which of the variables no knots should be entered. For example, if `linear = c(2, 3)` no knots for either covariate 2 or 3 are entered. 0 represents time.
`delete`	should complete basis functions be deleted at once (2), should only individual dimensions be deleted (1) or should only the addition stage of the model selection be carried out (0)?
`fit`	`polyclass` object. If `fit` is specified, `polyclass` adds basis functions starting with those in `fit`.
`silent`	suppresses the printing of diagnostic output about basis functions added or deleted, Rao-statistics, Wald-statistics and log-likelihoods.
`normweight`	should the weights be normalized so that they average to one? This option has only an effect if the model is selected using AIC.
`tdata`, `tcov`, `tweight`	test set. Should satisfy the same requirements as `data`, `cov` and `weight`. If all test set weights are one, `tweight` can be omitted. If `tdata` and `tcov` are specified, the model selection is carried out using this test set, irrespective of the input for `penalty` or `cv`.
`cv`	in how many subsets should the data be divided for cross-validation? If `cv` is specified and tdata is omitted, the model selection is carried out by cross-validation.
`select`	if a test set is provided, or if the model is selected using cross validation, should the model be select that minimizes (misclassification) loss (0), that maximizes test set log-likelihood (1) or that minimizes test set squared error loss (2)?
`loss`	a rectangular matrix specifying the loss function, whose size is the number of classes times number of actions. Used for cross-validation and test set model selection. `loss[i, j]` contains the loss for assigning action `j` to an object whose true class is `i`. The default is 1 minus the identity matrix. `loss` does not need to be square.
`seed`	optional seed for the random number generator that determines the sequence of the cases for cross-validation. If the seed has length 12 or more, the first twelve elements are assumed to be `.Random.seed`, otherwise the function `set.seed` is used. If `seed` is 0 or `rep(0, 12)`, it is assumed that the user has already provided a (random) ordering. If `seed` is not provided, while a fit with an element `fit\$seed` is provided, `.Random.seed` is set using `set.seed(fit\$seed)`. Otherwise the present value of `.Random.seed` is used.

Value

The output is an object of class polyclass, organized to serve as input for plot.polyclass, beta.polyclass, summary.polyclass, ppolyclass (fitted probabilities), cpolyclass (fitted classes) and rpolyclass (random classes). The function returns a list with the following members:

`call`	the command that was executed.
`ncov`	number of covariates.
`ndim`	number of dimensions of the fitted model.
`nclass`	number of classes.
`nbas`	number of basis functions.
`naction`	number of possible actions that are considered.
`fcts`	matrix of size `nbas x (nclass + 4)`. each row is a basis function. First element: first covariate involved (`NA` = constant); second element: which knot (`NA` means: constant or linear); third element: second covariate involved (`NA` means: this is a function of one variable); fourth element: knot involved (if the third element is `NA`, of no relevance); fifth, sixth,... element: beta (coefficient) for class one, two, ...
`knots`	a matrix with `ncov` rows. Covariate `i` has row `i+1`, time has row 1. First column: number of knots in this dimension; other columns: the knots, appended with `NA`s to make it a matrix.
`cv`	in how many sets was the data divided for cross-validation. Only provided if `method = 2`.
`loss`	the loss matrix used in cross-validation and test set. Only provided if `method = 1` or `method = 2`.
`penalty`	the parameter used in the AIC criterion. Only provided if `method = 0`.
`method`	0 = AIC, 1 = test set, 2 = cross-validation.
`ranges`	column `i` gives the range of the `i`-th covariate.
`logl`	matrix with eight or eleven columns. Summarizes fits. Column one indicates the dimension, column column two the AIC or loss value, whichever was used during the model selection appropriate, column three four and five give the training set log-likelihood, (misclassification) loss and squared error loss, columns six to eight give the same information for the test set, column nine (or column six if `method = 0` or `method = 2`) indicates whether the model was fitted during the addition stage (1) or during the deletion stage (0), column ten and eleven (or seven and eight) the minimum and maximum penalty parameter for which AIC would have selected this model.
`sample`	sample size.
`tsample`	the sample size of the test set. Only prvided if `method = 1`.
`wgtsum`	sum of the case weights.
`covnames`	names of the covariates.
`classnames`	(numerical) names of the classes.
`cv.aic`	the penalty value that was determined optimal by by cross validation. Only provided if `method = 2`.
`cv.tab`	table with three columns. Column one and two indicate the penalty parameter range for which the cv-loss in column three would be realized. Only provided if `method = 2`.
`seed`	the random seed that was used to determine the order of the cases for cross-validation. Only provided if `method = 2`.
`delete`	were complete basis functions deleted at once (2), were only individual dimensions deleted (1) or was only the addition stage of the model selection carried out (0)?
`beta`	moments of basisfunctions. Needed for `beta.polyclass`.
`select`	if a test set is provided, or if the model is selected using cross validation, was the model selected that minimized (misclassification) loss (0), that maximized test set log-likelihood (1) or that minimized test set squared error loss (2)?
`anova`	matrix with three columns. The first two elements in a line indicate the subspace to which the line refers. The third element indicates the percentage of variance explained by that subspace.
`twgtsum`	sum of the test set case weights (only if `method = 1`).

Author(s)

Charles Kooperberg clk@fredhutch.org.

References

Charles Kooperberg, Smarajit Bose, and Charles J. Stone (1997). Polychotomous regression. Journal of the American Statistical Association, 92, 117–127.

Charles J. Stone, Mark Hansen, Charles Kooperberg, and Young K. Truong. The use of polynomial splines and their tensor products in extended linear modeling (with discussion) (1997). Annals of Statistics, 25, 1371–1470.

Examples

data(iris)
fit.iris <- polyclass(iris[,5], iris[,1:4])

[Package polspline version 1.1.25 Index]