R: Global, Parameterwise and Joint Shrinkage of Regression...

shrink {shrink}

R Documentation

Global, Parameterwise and Joint Shrinkage of Regression Coefficients

Description

Obtain global, parameterwise and joint post-estimation shrinkage factors for regression coefficients from fit objects of class lm, glm, coxph, or mfp.

Usage

shrink(
  fit,
  type = c("parameterwise", "global", "all"),
  method = c("jackknife", "dfbeta"),
  join = NULL,
  notes = TRUE,
  postfit = TRUE
)

Arguments

`fit`	a fit object of class `lm`, `glm`, `coxph`, or `mfp`. The fit object must have been called with `x = TRUE` (and `y = TRUE` in case of class `lm`).
`type`	of shrinkage, either `"parameterwise"` (default), `"global"` shrinkage, or `"all"`.
`method`	of shrinkage estimation, either `"jackknife"` (based on leave-one-out resampling, default) or `"dfbeta"` (excellent approximation based on DFBETA residuals).
`join`	compute optional joint shrinkage factors for sets of specified columns of the design matrix, if `type =` `"parameterwise"`. See details.
`notes`	print notes. Default is TRUE.
`postfit`	obtain fit with shrunken regression coefficients. This option is only available for models without an intercept. Default is TRUE.

Details

While global shrinkage modifies all regression coefficients by the same factor, parameterwise shrinkage factors differ between regression coefficients. With variables which are either highly correlated or associated with regard to contents, such as several columns of a design matrix describing a nonlinear effect, parameterwise shrinkage factors are not interpretable. Joint shrinkage of a set of such associated design variables will give one common shrinkage factor for this set.

Joint shrinkage factors may be useful when analysing highly correlated and/or such associated columns of the design matrix, e.g. dummy variables corresponding to a categorical explanatory variable with more than two levels, two variables and their pairwise interaction term, or several transformations of an explantory variable enabling estimation of nonlinear effects. The analyst can define 'joint' shrinkage factors by specifing the join option if type = "parameterwise". join expects a list with at least one character vector including the names of the columns of the design matrix for which a joint shrinkage factor is requested. For example the following specification of join = list(c("dummy1", "dummy2", "dummy3"), c("main1", "main2", "interaction"), c("varX.fp1", "varX.fp2")) requests the joint shrinkage factors for a) "dummy1", "dummy2" and "dummy3", b) "main1", "main2" and "interaction" and c) "varX.fp1" and "varX.fp2".

Restricted cubic splines using `rcs`

shrink also works for models incorporating restricted cubic splines computed with the rcs function from the rms package. A joint shrinkage factor of explanatory variable varX transformed with rcs can be obtained by join = list(c("rcs(varX)")) or by stating the names of the rcs-transformed variables as given in the respective fit object. (These two notations should not be mixed within one call to shrink.)

Jackknife versus DFBETA method

For linear regression models (lm or glm with family = "gaussian") shrinkage factors obtained by Jackknife and the DFBETA approximation will be identical. For all other types of regression, the computational effort of estimating shrinkage factors may be greatly reduced by using method = "dfbeta" instead. However, for (very) small data sets method = "jackknife" may be of advantage, as the use of DFBETA residuals may underestimate the influence of some highly influential observations.

Shrunken intercept

A shrunken intercept is estimated as follows: For all columns of the design matrix except for the intercept the shrinkage factors are multiplied with the respective regression coefficients and a linear predictor is computed. Then the shrunken intercept is estimated by modeling fit$y ~ offset(linear predictor).

For regression models without an intercept, i.e., fit objects of class coxph, the shrunken regression coefficients can be directly estimated. This postfit is retained in the $postfit component of the shrink object.

Value

shrink returns an object with the following components:

`ShrinkageFactors`	a vector of shrinkage factors of regression coefficients.
`ShrinkageFactorsVCOV`	the covariance matrix of the shrinkage factors.
`ShrunkenRegCoef`	a vector with the shrunken regression coefficients.
`postfit`	an optional postfit model with shrunken regression coefficients and associated standard errors for models without an intercept.
`fit`	the original (unshrunken) fit object.
`type`	the requested shrinkage `type`.
`method`	the requested shrinkage `method`.
`join`	the requested joint shrinkage factors.
`call`	the function call.

If type = "all" then the object returned by shrink additionally contains

`global`	a list with the following elements: `ShrinkageFactors`, `ShrinkageFactorsVCOV` and `ShrunkenRegCoef`.
`parameterwise`	a list with the following elements: `ShrinkageFactors`, `ShrinkageFactorsVCOV` and `ShrunkenRegCoef`.
`joint`	an optional list with the following elements: `ShrinkageFactors`, `ShrinkageFactorsVCOV` and `ShrunkenRegCoef`.

Note

For fit objects of class mfp with family != cox regression coefficients of fit (obtained by coef(fit)) and fit$fit may not always be identical, because of mfp's pretransformation applied to the explanatory variables in the model. The shrink function uses a) the names as given in names(coef(fit)) and b) the regression coefficients as given in summary(fit) which correspond to the pretransformed explanatory variables.

References

Dunkler D, Sauerbrei W, Heinze G (2016). Global, Parameterwise and Joint Shrinkage Factor Estimation. Journal of Statistical Software. 69(8), 1-19. doi:10.18637/jss.v069.i08
Sauerbrei W (1999) The use of resampling methods to simplify regression models in medial statistics. Applied Statistics 48(3): 313-329.
Verweij P, van Houwelingen J (1993) Cross-validation in survival analysis. Statistics in Medicine 12(24): 2305-2314.

Examples

## Example with mfp (family = cox)
data("GBSG")
library("mfp")
fit1 <- mfp(Surv(rfst, cens) ~ fp(age, df = 4, select = 0.05) +
              fp(prm, df = 4, select = 0.05), family = cox, data = GBSG)

shrink(fit1, type = "global", method = "dfbeta")

dfbeta.pw <- shrink(fit1, type = "parameterwise", method = "dfbeta")
dfbeta.pw
dfbeta.pw$postfit

# correlations between shrinkage factors and standard errors of shrinkage factors
cov2cor(dfbeta.pw$ShrinkageFactorsVCOV)
sqrt(diag(dfbeta.pw$ShrinkageFactorsVCOV))

shrink(fit1, type = "parameterwise", method = "dfbeta",
       join = list(c("age.1", "age.2")))

#shrink(fit1, type = "global", method = "jackknife")
#shrink(fit1, type = "parameterwise", method = "jackknife")
#shrink(fit1, type = "parameterwise", method = "jackknife",
#       join = list(c("age.1", "age.2")))

# obtain global, parameterwise and joint shrinkage with one call to 'shrink'
shrink(fit1, type = "all", method = "dfbeta",
       join = list(c("age.1", "age.2")))

## Example with rcs
library("rms")
fit2 <- coxph(Surv(rfst, cens) ~ rcs(age) + log(prm + 1), data = GBSG, x = TRUE)

shrink(fit2, type = "global", method = "dfbeta")
shrink(fit2, type = "parameterwise", method = "dfbeta")
shrink(fit2, type = "parameterwise", method = "dfbeta",
       join = list(c("rcs(age)")))
shrink(fit2, type = "parameterwise", method = "dfbeta",
       join = list(c("rcs(age)"), c("log(prm + 1)")))


## Examples with glm & mfp (family = binomial)
set.seed(888)
intercept <- 1
beta <- c(0.5, 1.2)
n <- 1000
x1 <- rnorm(n, mean = 1, sd = 1)
x2 <- rbinom(n, size = 1, prob = 0.3)
linpred <- intercept + x1 * beta[1] + x2 * beta[2]
prob <- exp(linpred) / (1 + exp(linpred))
runis <- runif(n, 0, 1)
ytest <- ifelse(test = runis < prob, yes = 1, no = 0)
simdat <- data.frame(cbind(y = ifelse(runis < prob, 1, 0), x1, x2))

fit3 <- glm(y ~ x1 + x2, family = binomial, data = simdat, x = TRUE)
summary(fit3)

shrink(fit3, type = "global", method = "dfbeta")
shrink(fit3, type = "parameterwise", method = "dfbeta")
shrink(fit3, type = "parameterwise", method = "dfbeta", join = list(c("x1", "x2")))


utils::data("Pima.te", package="MASS")
utils::data("Pima.tr", package="MASS")
Pima <- rbind(Pima.te, Pima.tr)
fit4 <- mfp(type ~ npreg + glu + bmi + ped + fp(age, select = 0.05),
            family = binomial, data = Pima)
summary(fit4)

shrink(fit4, type = "global", method = "dfbeta")
shrink(fit4, type = "parameterwise", method = "dfbeta")
# fit objects of class mfp: for 'join' use variable names as given in 'names(coef(fit4))'
shrink(fit4, type = "parameterwise", method = "dfbeta", join = list(c("age.1")))


## Examples with glm & mfp (family = gaussian) and lm
utils::data("anorexia", package = "MASS")
contrasts(anorexia$Treat) <- contr.treatment(n = 3, base = 2)
fit5 <- glm(Postwt ~ Prewt + Treat, family = gaussian, data = anorexia, x = TRUE)
fit5

shrink(fit5, type = "global", method = "dfbeta")
# which is identical to the more time-consuming jackknife approach:
# shrink(fit5, type = "global", method = "jackknife")

shrink(fit5, type = "parameterwise", method = "dfbeta")
shrink(fit5, type = "parameterwise", method = "dfbeta",
       join = list(c("Treat1", "Treat3")))


fit6 <- lm(Postwt ~ Prewt + Treat, data = anorexia, x = TRUE, y = TRUE)
fit6

shrink(fit6, type = "global", method = "dfbeta")
shrink(fit6, type = "parameterwise", method = "dfbeta")
shrink(fit6, type = "parameterwise", method = "dfbeta",
       join=list(c("Treat1", "Treat3")))


utils::data("GAGurine", package = "MASS")
fit7 <- mfp(Age ~ fp(GAG, select = 0.05), family = gaussian, data = GAGurine)
summary(fit7)

shrink(fit7, type = "global", method = "dfbeta")
shrink(fit7, type = "parameterwise", method = "dfbeta")
# fit objects of class mfp: for 'join' use variable names as given in 'names(coef(fit7))'
shrink(fit7, type = "parameterwise", method = "dfbeta",
       join = list(c("GAG.1", "GAG.2")))

[Package shrink version 1.2.3 Index]