| polywog {polywog} | R Documentation |
Polynomial regression with oracle variable selection
Description
Fits a regression model using a polynomial basis expansion of the input variables, with penalization via the adaptive LASSO or SCAD to provide oracle variable selection.
Usage
polywog(formula, data, subset, weights, na.action, degree = 3,
family = c("gaussian", "binomial"), method = c("alasso", "scad"),
penwt.method = c("lm", "glm"), unpenalized = character(0),
.parallel = FALSE, boot = 0, control.boot = control.bp(.parallel =
.parallel), lambda = NULL, nlambda = 100, lambda.min.ratio = 1e-04,
nfolds = 10, foldid = NULL, thresh = ifelse(method == "alasso", 1e-07,
0.001), maxit = ifelse(method == "alasso", 1e+05, 5000), model = TRUE,
X = FALSE, y = FALSE)
Arguments
formula |
model formula specifying the response and input variables. See "Details" for more information. |
data |
a data frame, list or environment containing the variables specified in the model formula. |
subset |
an optional vector specifying a subset of observations to be used in fitting. |
weights |
an optional vector specifying weights for each observation to be used in fitting. |
na.action |
a function specifying what to do with observations
containing |
degree |
integer specifying the degree of the polynomial expansion of the input variables. |
family |
|
method |
variable selection method: |
penwt.method |
estimator for obtaining first-stage estimates in
logistic models when |
unpenalized |
names of model terms to be exempt from the adaptive
penalty (only available when |
.parallel |
logical: whether to perform k-fold cross-validation in
parallel (only available when |
boot |
number of bootstrap iterations (0 for no bootstrapping). |
control.boot |
list of arguments to be passed to
|
lambda |
a vector of values from which the penalty factor is to be
selected via k-fold cross-validation. |
nlambda |
number of values of the penalty factor to examine via
cross-validation if |
lambda.min.ratio |
ratio of the lowest value to the highest in the
generated sequence of values of the penalty factor if |
nfolds |
number of folds to use in cross-validation to select the penalization factor. |
foldid |
optional vector manually assigning fold numbers to each
observation used for fitting (only available when |
thresh |
convergence threshold, passed as the |
maxit |
maximum number of iterations to allow in adaptive LASSO or SCAD fitting. |
model |
logical: whether to include the model frame in the returned object. |
X |
logical: whether to include the raw design matrix (i.e., the matrix of input variables prior to taking their polynomial expansion) in the returned object. |
y |
logical: whether to include the response variable in the returned object. |
Details
The design matrix for the regression is a polynomial basis expansion of the
matrix of raw input variables. This includes all powers and interactions of
the input variables up to the specified degree. For example, the
following terms will be included in polywog(y ~ x1 + x2, degree = 3,
...):
terms of degree 0: intercept
terms of degree 1:
x1,x2terms of degree 2:
x1^2,x2^2,x1*x2terms of degree 3:
x1^3,x2^3,x1*x2^2,x1^2*x2
To exclude certain terms from the basis expansion, use a model formula like
y ~ x1 + x2 | z1 + z2. Only the degree 1 terms of z1 and
z2 will be included.
It is possible that the "raw" basis expansion will be rank-deficient, such
as if there are binary input variables (in which case x_i = x_i^n for
all n > 0). The procedure detects collinearity via qr and
removes extraneous columns before fitting.
For both the adaptive LASSO and SCAD, the penalization factor \lambda
is chosen by k-fold cross-validation. The selected value minimizes the
average mean squared error of out-of-sample fits. (To select both
\lambda and the polynomial degree simultaneously via cross-validation,
see cv.polywog.)
The cross-validation process may be run in parallel via
foreach by registering an appropriate backend and specifying
.parallel = TRUE. The appropriate backend is system-specific; see
foreach for information on selecting and registering a
backend. The bootstrap iterations may also be run in parallel by
specifying control.boot = control.bp(.parallel = TRUE).
Value
An object of class "polywog", a list containing:
coefficientsthe estimated coefficients.
lambdavalue of the penalty factor
\lambdaused to fit the final model.lambda.cva list containing the results of the cross-validation procedure used to select the penalty factor:
lambdavalues of the penalty factor tested in cross-validation.
cvErrorout-of-fold prediction error corresponding to each value of
lambda.lambdaMinvalue of
lambdawith the minimal cross-validation error.errorMinminimized value of the cross-validation error.
fitted.valuesthe fitted mean values for each observation used in fitting.
lmcoefcoefficients from an unpenalized least-squares regression of the response variable on the polynomial expansion of the input variables.
penwtadaptive weight given to each term in the LASSO penalty (
NULLfor models fit via SCAD).formulamodel formula, as a
Formulaobject.degreedegree of the polynomial basis expansion.
familymodel family,
"gaussian"or"binomial".weightsobservation weights if specified.
methodthe specified regularization method.
penwt.methodthe specified method for calculating the adaptive LASSO weights (
NULLfor models fit via SCAD).unpenalizedlogical vector indicating which terms were not included in the LASSO penalty.
threshconvergence threshold used in fitting.
maxititeration limit used in fitting.
termsthe
termsobject used to construct the model frame.polyTermsa matrix indicating the power of each raw input term (columns) in each term of the polynomial expansion used in fitting (rows).
nobsthe number of observations used to fit the model.
na.actioninformation on how
NAvalues in the input data were handled.xlevelslevels of factor variables used in fitting.
varNamesnames of the raw input variables included in the model formula.
callthe original function call.
modelif
model = TRUE, the model frame used in fitting; otherwiseNULL.Xif
X = TRUE, the raw model matrix (i.e., prior to taking the polynomial expansion); otherwiseNULL. For calculating the expanded model matrix, seemodel.matrix.polywog.yif
y = TRUE, the response variable used in fitting; otherwiseNULL.boot.matrixif
boot > 0, a sparse matrix of class"dgCMatrix"where each column is the estimate from a bootstrap replicate. SeebootPolywogfor more information on bootstrapping.
Author(s)
Brenton Kenkel and Curtis S. Signorino
References
Brenton Kenkel and Curtis S. Signorino. 2012. "A Method for Flexible Functional Form Estimation: Bootstrapped Basis Regression with Variable Selection." Typescript, University of Rochester.
See Also
To estimate variation via the bootstrap, see
bootPolywog. To generate fitted values, see
predVals (and the underlying method
predict.polywog). For plots, see plot.polywog.
The polynomial degree may be selected via cross-validation using
cv.polywog.
Adaptive LASSO estimates are provided via glmnet and
cv.glmnet from the glmnet package. SCAD estimates are
via ncvreg and cv.ncvreg in the ncvreg
package.
Examples
## Using occupational prestige data
data(Prestige, package = "carData")
Prestige <- transform(Prestige, income = income / 1000)
## Fit a polywog model with bootstrap iterations
## (note: using low convergence threshold to shorten computation time of the
## example, *not* recommended in practice!)
set.seed(22)
fit1 <- polywog(prestige ~ education + income + type,
data = Prestige,
degree = 2,
boot = 5,
thresh = 1e-4)
## Basic information
print(fit1)
summary(fit1)
## See how fitted values change with education holding all else fixed
predVals(fit1, "education", n = 10)
## Plot univariate relationships
plot(fit1)
## Use SCAD instead of adaptive LASSO
fit2 <- update(fit1, method = "scad", thresh = 1e-3)
cbind(coef(fit1), coef(fit2))