getPoly {polyreg} | R Documentation |
Get polynomial terms
Description
Generate polynomial terms of predictor variables for a data frame or data matrix.
Usage
getPoly(xdata = NULL, deg = 1, maxInteractDeg = deg,
Xy = NULL, standardize = FALSE,
noisy = TRUE, intercept = FALSE, returnDF = TRUE,
modelFormula = NULL, retainedNames = NULL, ...)
Arguments
xdata |
Data matrix or data frame without response variable. Categorical variables (> 2 levels) should be passed as factors, not dummy variables or integers, to ensure the polynomial matrix is constructed properly. |
deg |
The max degree of power terms. Default 1 so just returns model matrix by default. |
maxInteractDeg |
The max degree of nondummy interaction terms. x1 * x2 is degree 2. x1^3 * x2^2 is degree 5. Implicitly constrained by deg. For example, if deg = 3 and maxInteractDegree = 2, x1^1 * x2^2 (i.e., degree 3) will be included but x1^2 * x2^2 (i.e., degree 4) will not. |
Xy |
The dataframe with the response in the final column (provide xdata or Xy but not both).Categorical variables (> 2 levels) should be passed as factors, not dummy variables or integers, to ensure the polynomial matrix is constructed properly. |
standardize |
Standardize all continuous variables? (Default: FALSE.) |
noisy |
Output progress updates? (Default: TRUE.) |
intercept |
Include intercept? (Default: FALSE.) |
returnDF |
Return a data.frame (as opposed to model.matrix)? (Default: TRUE.) |
modelFormula |
Internal use. Formula used to generate the training model matrix. Note: anticipates that polynomial terms are generated using internal functions of library(polyreg). Also, providing modelFormula bypasses deg and maxInteractDeg. |
retainedNames |
Internal use. colnames of polyMatrix object$xdata. Requires modelFormula be inputted as well. |
... |
Additional arguments to be passed to model.matrix() via polyreg:::model_matrix(). Note na.action = "na.omit". |
Details
The getPoly
function takes in a data frame or data matrix and
generates polynomial terms of predictor variables.
Note the subtleties involving dummy variables. The square, cubic and so on terms are the same as the original variable, and the various duplicates must be eliminated.
Similarly, after dummy variable are created from a categorical
variable having more than two levels, the resulting columns will be
orthogonal to each other. In almost
all cases, this argument should be set to TRUE at the training stage, and
then in predictions one should use the vector of names in the
component in the return value;
predict.polyFit
does the latter automatically.
Note: If a column that is an R factor has levels with spaces in the names, this will interfere with the parsing, and must be avoided.
Value
The return value of getPoly
is a polyMatrix
object. This is an S3 class containing a model.matrix
xdata
of the generated polynomial terms. The predictor
variables have column names V1, V2, etc. The object also contains
modelFormula
, the formula used to construct the model matrix, and
XtestFormula
, the formula which should be used out-of-sample
(when y_test is not available).
Examples
N <- 125
rawdata <- data.frame(x1 = rnorm(N),
x2 = rnorm(N),
group = sample(letters[1:5], N, replace=TRUE),
z = sample(c("treatment", "control"), N, replace=TRUE),
result = sample(c("win", "lose", "tie"), N, replace=TRUE))
head(rawdata)
P <- length(levels(rawdata$group)) - 1 +
length(levels(rawdata$z)) - 1 +
length(levels(rawdata$result)) - 1 +
sum(unlist(lapply(rawdata, is.numeric)))
# quadratic polynomial, includes interactions
# since maxInteractDeg defaults to deg
X <- getPoly(rawdata, 2)$xdata
ncol(X) # 40
# cubic polynomial, no interactions
X <- getPoly(rawdata, 3, 1)$xdata
ncol(X) # 13
# cubic polynomial, interactions
X <- getPoly(rawdata, 3, 2)$xdata
ncol(X) # 58
# cubic polynomial, interactions
X <- getPoly(rawdata, 3)$xdata
ncol(X) # 101
# making final column the response variable, y
# results in TRUE (fewer columns)
ncol(getPoly(Xy=rawdata, deg=2)$xdata) < ncol(getPoly(rawdata, 2)$xdata)
# preparing polynomial matrices for crossvalidation
# getPoly() returns a polyMatrix() object containing XtestFormula
# which should be used to ensure factors are handled correctly out-of-sample
Xtrain <- getPoly(rawdata[1:100,],2)
Xtest <- getPoly(rawdata[101:125,], 2, modelFormula = Xtrain$XtestFormula)