WH_2d {WH}R Documentation

2D Whittaker-Henderson Smoothing

Description

Main package function to apply Whittaker-Henderson smoothing in a bidimensional survival analysis framework. It takes as input a matrix of observed events and a matrix of associated central exposure, both depending on two covariates, and build a smooth version of the log-hazard rate. Smoothing parameters may be supplied or automatically chosen according to a specific criterion such as "REML" (the default), "AIC", "BIC" or "GCV". Whittaker-Henderson may be applied in a full maximum likelihood framework or an approximate gaussian framework. As Whittaker-Henderson smoothing relies on full-rank smoothers, computation time and memory usage in the bidimensional case may prove overwhelming and the function integrates a rank-reduction procedure to avoid such issues.

Usage

WH_2d(
  d,
  ec,
  lambda,
  criterion,
  method,
  max_dim = 200,
  p,
  q = c(2, 2),
  framework,
  y,
  wt,
  quiet = FALSE,
  ...
)

Arguments

d

Matrix of observed events, whose rows and columns must be named.

ec

Matrix of central exposure. The central exposure corresponds to the sum of the exposure duration over the insured population. An individual experiencing an event of interest during the year will no longer be exposed afterward and the exposure should be computed accordingly.

lambda

Smoothing parameter vector of size 2. If missing, an optimization procedure will be used to find the optimal smoothing parameter. If supplied, no optimal smoothing parameter search will take place unless the method argument is also supplied, in which case lambda will be used as the starting parameter for the optimization procedure.

criterion

Criterion to be used for the selection of the optimal smoothing parameter. Default is "REML" which stands for restricted maximum likelihood. Other options include "AIC", "BIC" and "GCV".

method

Method to be used to find the optimal smoothing parameter. Default to "fixed_lambda" if lambda is supplied, meaning no optimization is performed. Otherwise, default to "perf" which means the faster performance iteration method is used. The alternative "outer" method is safer but slower. Both those methods rely on the optim function from package stats.

max_dim

Number of parameters to be kept in the optimization problem. Default is 200. Values higher than 1000 may result in very high computation times and memory usage.

p

Optional vector of size 2. Maximum number of eigenvectors to keep on each dimension after performing the eigen decomposition of the penalization matrix. If missing, will be automatically computed so that the number of elements of the (square) matrices involved in the optimization problem remains lower that the max_dim argument

q

Order of penalization vector of size 2. Polynoms of degrees ⁠(q[[1]] - 1,q[[2]] - 1)⁠ are considered smooth and are therefore unpenalized. Should be left to the default of c(2,2) for most practical applications.

framework

Default framework is "ml" which stands for maximum likelihood unless the y argument is also provided, in which case an "reg" or (approximate) regression framework is used.

y

Optional matrix of observations whose rows and columns should be named. Used only in the regression framework and if the d and ec arguments are missing (otherwise y is automatically computed from d and ec). May be useful when using Whittaker-Henderson smoothing outside of the survival analysis framework.

wt

Optional matrix of weights. As for the observation vector y, Used only in the regression framework and if the d and ec arguments are missing (otherwise wt is set equal to d). May be useful when using Whittaker-Henderson smoothing outside of the survival analysis framework.

quiet

Should messages and warnings be silenced ? Default to FALSE, may be set to TRUE is the function is called repeatedly.

...

Additional parameters passed to the smoothing function called.

Value

An object of class WH_2d i.e. a list containing :

Examples

keep_age <- which(rowSums(portfolio_LTC$ec) > 5e2)
keep_duration <- which(colSums(portfolio_LTC$ec) > 1e3)

d  <- portfolio_LTC$d[keep_age, keep_duration]
ec <- portfolio_LTC$ec[keep_age, keep_duration]

y <- log(d / ec) # observation vector
y[d == 0] <- - 20
wt <- d

# Maximum likelihood
WH_2d(d, ec, lambda = c(1e2, 1e2))
WH_2d(d, ec) # performance iteration default method
WH_2d(d, ec, method = "outer") # slower but safer outer iteration method
WH_2d(d, ec, criterion = "GCV")
# alternative optimization criteria for smoothing parameter selection

# Regression
WH_2d(y = y, wt = wt, lambda = c(1e2, 1e2)) # regression framework is triggered when y is supplied
WH_2d(d, ec, framework = "reg", lambda = c(1e2, 1e2))
# setting framework = "reg" forces computation of y from d and ec

# Rank reduction
keep_age <- which(rowSums(portfolio_LTC$ec) > 1e2)
keep_duration <- which(colSums(portfolio_LTC$ec) > 1e2)

d  <- portfolio_LTC$d[keep_age, keep_duration]
ec <- portfolio_LTC$ec[keep_age, keep_duration]

prod(dim(d)) # problem dimension is 627 !
WH_2d(d, ec)
# rank-reduction is used to find an approximate solution using 200 parameters


[Package WH version 1.1.1 Index]