pentrace {rms} | R Documentation |
Trace AIC and BIC vs. Penalty
Description
For an ordinary unpenalized fit from lrm
or ols
and for a vector or list of penalties,
fits a series of logistic or linear models using penalized maximum likelihood
estimation, and saves the effective degrees of freedom, Akaike Information
Criterion (AIC
), Schwarz Bayesian Information Criterion (BIC
), and
Hurvich and Tsai's corrected AIC
(AIC_c
). Optionally
pentrace
can
use the nlminb
function to solve for the optimum penalty factor or
combination of factors penalizing different kinds of terms in the model.
The effective.df
function prints the original and effective
degrees of freedom for a penalized fit or for an unpenalized fit and
the best penalization determined from a previous invocation of
pentrace
if method="grid"
(the default).
The effective d.f. is computed separately for each class of terms in
the model (e.g., interaction, nonlinear).
A plot
method exists to plot the results, and a print
method exists
to print the most pertinent components. Both AIC
and BIC
may be plotted if
there is only one penalty factor type specified in penalty
. Otherwise,
the first two types of penalty factors are plotted, showing only the AIC
.
Usage
pentrace(fit, penalty, penalty.matrix,
method=c('grid','optimize'),
which=c('aic.c','aic','bic'), target.df=NULL,
fitter, pr=FALSE, tol=1e-7,
keep.coef=FALSE, complex.more=TRUE, verbose=FALSE, maxit=12,
subset, noaddzero=FALSE)
effective.df(fit, object)
## S3 method for class 'pentrace'
print(x, ...)
## S3 method for class 'pentrace'
plot(x, method=c('points','image'),
which=c('effective.df','aic','aic.c','bic'), pch=2, add=FALSE,
ylim, ...)
Arguments
fit |
a result from |
penalty |
can be a vector or a list. If it is a vector, all types of terms in
the model will be penalized by the same amount, specified by elements in
|
object |
an object returned by |
penalty.matrix |
see |
method |
The default is |
which |
the objective to maximize for either |
target.df |
applies only to |
fitter |
a fitting function. Default is |
pr |
set to |
tol |
tolerance for declaring a matrix singular (see |
keep.coef |
set to |
complex.more |
By default if |
verbose |
set to |
maxit |
maximum number of iterations to allow in a model fit (default=12).
This is passed to the appropriate fitter function with the correct
argument name. Increase |
subset |
a logical or integer vector specifying rows of the design and response
matrices to subset in fitting models. This is most useful for
bootstrapping |
noaddzero |
set to |
x |
a result from |
pch |
used for |
add |
set to |
ylim |
2-vector of y-axis limits for plots other than effective d.f. |
... |
other arguments passed to |
Value
a list of class "pentrace"
with elements penalty, df, objective, fit, var.adj, diag, results.all
, and
optionally Coefficients
.
The first 6 elements correspond to the fit that had the best objective
as named in the which
argument, from the sequence of fits tried.
Here fit
is the fit object from fitter
which was a penalized fit,
diag
is the diagonal of the matrix used to compute the effective
d.f., and var.adj
is Gray (1992) Equation 2.9, which is an improved
covariance matrix for the penalized beta. results.all
is a data
frame whose first few variables are the components of penalty
and
whose other columns are df, aic, bic, aic.c
. results.all
thus
contains a summary of results for all fits attempted. When
method="optimize"
, only two components are returned: penalty
and
objective
, and the object does not have a class.
Author(s)
Frank Harrell
Department of Biostatistics
Vanderbilt University
fh@fharrell.com
References
Gray RJ: Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. JASA 87:942–951, 1992.
Hurvich CM, Tsai, CL: Regression and time series model selection in small samples. Biometrika 76:297–307, 1989.
See Also
lrm
, ols
, solvet
, rmsMisc
, image
Examples
n <- 1000 # define sample size
set.seed(17) # so can reproduce the results
age <- rnorm(n, 50, 10)
blood.pressure <- rnorm(n, 120, 15)
cholesterol <- rnorm(n, 200, 25)
sex <- factor(sample(c('female','male'), n,TRUE))
# Specify population model for log odds that Y=1
L <- .4*(sex=='male') + .045*(age-50) +
(log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male'))
# Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)]
y <- ifelse(runif(n) < plogis(L), 1, 0)
f <- lrm(y ~ blood.pressure + sex * (age + rcs(cholesterol,4)),
x=TRUE, y=TRUE)
p <- pentrace(f, seq(.2,1,by=.05))
plot(p)
p$diag # may learn something about fractional effective d.f.
# for each original parameter
pentrace(f, list(simple=c(0,.2,.4), nonlinear=c(0,.2,.4,.8,1)))
# Bootstrap pentrace 5 times, making a plot of corrected AIC plot with 5 reps
n <- nrow(f$x)
plot(pentrace(f, seq(.2,1,by=.05)), which='aic.c',
col=1, ylim=c(30,120)) #original in black
for(j in 1:5)
plot(pentrace(f, seq(.2,1,by=.05), subset=sample(n,n,TRUE)),
which='aic.c', col=j+1, add=TRUE)
# Find penalty giving optimum corrected AIC. Initial guess is 1.0
# Not implemented yet
# pentrace(f, 1, method='optimize')
# Find penalty reducing total regression d.f. effectively to 5
# pentrace(f, 1, target.df=5)
# Re-fit with penalty giving best aic.c without differential penalization
f <- update(f, penalty=p$penalty)
effective.df(f)