R: Generate a Cross-Basis Matrix for a DLNM

crossbasis {dlnm}

R Documentation

Generate a Cross-Basis Matrix for a DLNM

Description

The function generates the basis matrices for the two dimensions of predictor and lags, given the functions selected to model the relationship in each space. Then, these one-dimensions basis matrices are combined in order to create the related cross-basis matrix, which can be included in a model formula to fit distributed lag linear (DLMs) and non-linear models (DLNMs).

Usage

crossbasis(x, lag, argvar=list(), arglag=list(), group=NULL, ...)

## S3 method for class 'crossbasis'
summary(object, ...)

Arguments

`x`	either a numeric vector representing a complete series of ordered observations (for time series data), or a matrix of exposure histories over the same lag period for each observation. See Details below.
`lag`	either an integer scalar or vector of length 2, defining the the maximum lag or the lag range, respectively.
`argvar`, `arglag`	lists of arguments to be passed to the function `onebasis` for generating the two basis matrices for predictor and lags, respectively. See Details below.
`group`	a factor or a list of factors defining groups of observations. Only for time series data.
`object`	a object of class `"crossbasis"`.
`...`	additional arguments. See Details below.

Details

The argument x defines the type of data. If a n-dimensional vector, the data are interpreted as a time series of equally-spaced and complete observations. If a n \times (L-\ell_0+1) matrix, the data are interpreted as a set of complete exposure histories at equally-spaced lags over the same lag period from \ell_0 to L for each observation. The latter is general and can be used for applying DLMs and DLNMs beyond time series data. Lags are usually positive integers: if not provided, by default the minimum lag L0 is set to 0, and the maximum lag L is set to 0 if x is a vector or to ncol(x)-1 otherwise. Negative lags are rarely needed but allowed.

The lists in argvar and arglag are passed to onebasis, which calls existing or user-defined functions to build the related basis matrices. The two lists should contain the argument fun defining the chosen function, and a set of additional arguments of the function. The argvar list is applied to x, in order to generate the matrix for the space of the predictor. The arglag list is applied to a new vector given by the sequence obtained by lag, in order to generate the matrix for the space of lags. By default, the basis functions for lags are defined with an intercept (if not otherwise stated). Some arguments can be automatically re-set by onebasis. Then, the two set of basis matrices are combined in order to create the related cross-basis matrix.

Common choices for fun are represented by ns and bs from package splines or by the internal functions of the package dlnm, namely poly, strata, thr, integer and lin. In particular, DLMs can be considered a special case of DLNMs with a linear function in argvar. Functions ps and cr are used to specify penalized models with an external method (see cbPen). See help(onebasis) and the help pages of these functions for information on the additional arguments to be specified. Also, other existing or user-defined functions can be applied.

The argument group, only used for time series data, defines groups of observations representing independent series. Each series must be consecutive, complete and ordered.

Value

A matrix object of class "crossbasis" which can be included in a model formula in order to fit a DLM or DLNM. It contains the attributes df (vector of length 2 with the df for each dimension), range (range of the original vector of observations), lag (lag range), argvar and arglag (lists of arguments defining the basis functions in each space, which can be modified if compared to lists used in the call). The method summary.crossbasis returns a summary of the cross-basis matrix and the related attributes, and can be used to check the options for the basis functions chosen for the two dimensions.

Warnings

In previous versions of the package the function adopted a different usage. In particular, the argvar list should not include a cen argument any more (see Note in this help page or onebasis). Users are strongly suggested to comply with the current usage, as backward compatibility may be discontinued in future versions of the package.

Meaningless combinations of arguments in argvar and arglag passed to onebasis could lead to collinear variables, with identifiability problems in the model and the exclusion of some of them.

It is strongly recommended to avoid the inclusion of an intercept in the basis for x (intercept in argvar should be FALSE, as default), otherwise a rank-deficient cross-basis matrix will be specified, causing some of the cross-variables to be excluded in the regression model. Conversely, an intercept is included by default in the basis for the space of lags.

Note

Missing values in x are allowed, but this causes the observation (for non-time series data with x as a matrix) or the following observations corresponding to the lag period (for time series data with x as a vector series) to be set to NA. Although correct, this could generate computational problems in the presence of a high number of missing observations.

The name of the crossbasis object will be used by crosspred in order to extract the related estimated parameters. If more than one variable is transformed through cross-basis functions in the same model, different names must be specified.

Before version 2.2.0 of dlnm, the argvar list could include a cen argument to be passed internally to onebasis for centering the basis. This step is now moved to the prediction stage, with a cen argument in crosspred or crossreduce (see the related help pages). For backward compatibility, the use of cen in crossbasis is still allowed (with a warning), but may be discontinued in future versions.

Author(s)

Antonio Gasparrini <antonio.gasparrini@lshtm.ac.uk>

References

Gasparrini A. Distributed lag linear and non-linear models in R: the package dlnm. Journal of Statistical Software. 2011;43(8):1-20. [freely available here].

Gasparrini A, Scheipl F, Armstrong B, Kenward MG. A penalized framework for distributed lag non-linear models. Biometrics. 2017;73(3):938-948. [freely available here]

Gasparrini A. Modeling exposure-lag-response associations with distributed lag non-linear models. Statistics in Medicine. 2014;33(5):881-899. [freely available here]

Gasparrini A., Armstrong, B.,Kenward M. G. Distributed lag non-linear models. Statistics in Medicine. 2010;29(21):2224-2234. [freely available here]

Examples

### example of application in time series analysis - see vignette("dlnmTS")

# create the crossbasis objects and summarize their contents
cb1.pm <- crossbasis(chicagoNMMAPS$pm10, lag=15, argvar=list(fun="lin"),
  arglag=list(fun="poly",degree=4))
cb1.temp <- crossbasis(chicagoNMMAPS$temp, lag=3, argvar=list(df=5),
  arglag=list(fun="strata",breaks=1))
summary(cb1.pm)
summary(cb1.temp)

# run the model and get the predictions for pm10
library(splines)
model1 <- glm(death ~ cb1.pm + cb1.temp + ns(time, 7*14) + dow,
  family=quasipoisson(), chicagoNMMAPS)
pred1.pm <- crosspred(cb1.pm, model1, at=0:20, bylag=0.2, cumul=TRUE)

# plot the lag-response curves for specific and incremental cumulative effects
plot(pred1.pm, "slices", var=10, col=3, ylab="RR", ci.arg=list(density=15,lwd=2),
  main="Lag-response curve for a 10-unit increase in PM10")
plot(pred1.pm, "slices", var=10, col=2, cumul=TRUE, ylab="Cumulative RR",
  main="Lag-response curve of incremental cumulative effects")

### example of application beyond time series - see vignette("dlnmExtended")

# generate the matrix of exposure histories from the 5-year periods
Qnest <- t(apply(nested, 1, function(sub) exphist(rep(c(0,0,0,sub[5:14]), 
  each=5), sub["age"], lag=c(3,40))))

# define the cross-basis
cbnest <- crossbasis(Qnest, lag=c(3,40), argvar=list("bs",degree=2,df=3),
  arglag=list(fun="ns",knots=c(10,30),intercept=FALSE))
summary(cbnest)

# run the model and predict
library(survival)
mnest <- clogit(case~cbnest+strata(riskset), nested)
pnest <- crosspred(cbnest,mnest, cen=0, at=0:20*5)

# bi-dimensional exposure-lag-response association
plot(pnest, zlab="OR", xlab="Exposure", ylab="Lag (years)")
# lag-response curve for dose 60
plot(pnest, var=50, ylab="OR for exposure 50", xlab="Lag (years)", xlim=c(0,40))
# exposure-response curve for lag 10
plot(pnest, lag=5, ylab="OR at lag 5", xlab="Exposure", ylim=c(0.95,1.15))

### example of extended predictions - see vignette("dlnmExtended")

# compute exposure profiles and exposure history
expnested <- rep(c(10,0,13), c(5,5,10))
hist <- exphist(expnested, time=length(expnested), lag=c(3,40))

# predict association with a specific exposure history
pnesthist <- crosspred(cbnest, mnest, cen=0, at=hist)
with(pnesthist, c(allRRfit,allRRlow,allRRhigh))

### example of user-defined functions - see vignette("dlnmExtended")

# define a log function
mylog <- function(x) log(x+1)

# define the cross-basis
cbnest2 <- crossbasis(Qnest, lag=c(3,40), argvar=list("mylog"),
  arglag=list(fun="ns",knots=c(10,30),intercept=FALSE))
summary(cbnest2)

# run the model and predict
mnest2 <- clogit(case~cbnest2+strata(riskset), nested)
pnest2 <- crosspred(cbnest2, mnest2, cen=0, at=0:20*5)

# plot and compare with previous fit
plot(pnest2, zlab="OR", xlab="Exposure", ylab="Lag (years)")
plot(pnest2, var=50, ylab="OR for exposure 50", xlab="Lag (years)", xlim=c(0,40))
lines(pnest, var=50, lty=2)
plot(pnest2, lag=5, ylab="OR at lag 5", xlab="Exposure", ylim=c(0.95,1.15))
lines(pnest, lag=5, lty=2)

### example of penalized models - see vignette("dlnmPenalized")

# to be added soon

[Package dlnm version 2.4.7 Index]