R: Constrained groupwise additive index models

cgaim {cgaim}

R Documentation

Constrained groupwise additive index models

Description

Fits constrained groupwise additive index models (CGAIM) to data. CGAIM fits indices subjected to constraints on their coefficients and shape of their association with the outcome. Such constraints can be specified in the formula through g for grouped terms and s for smooth covariates.

Usage

cgaim(formula, data, weights, subset, na.action, Cmat = NULL, bvec = NULL,
  control = list())

Arguments

`formula`	A CGAIM formula with index terms `g`, smooth terms `s` and linear terms. See details.
`data`	A data.frame containing the variables of the model.
`weights`	An optional vector of observation weights.
`subset`	An optional vector specifying a subset of observations to be used in the fitting process.
`na.action`	A function indicating how to treat NAs. The default is set by the `na.action` setting of `options`. See `na.fail`.
`Cmat`	A constraint matrix for index coefficients alpha. Columns must match all variables entering any index through `g`. See details.
`bvec`	A vector of lower bounds for the constraints in `Cmat`. Potentially recycled to match the number of constraints.
`control`	A list of parameters controlling the fitting process. See `cgaim.control`.

Details

The CGAIM is expressed

y_{i} = \beta_{0} + \sum_{j} \beta_{j} g_{j}(\alpha_{j}^{T} x_{ij}) + \sum_{k} \gamma_{k} f_{k}(w_{ik}) + \sum_{l} \theta_{l} u_{il} + e_{i}

where the x_{ij} are variables entering grouped indices, the w_{ik} are smooth covariates and the u_{il} are linear covariates.

The formula interface considers g to identify index terms, s for smooth functions and can also include linear terms as usual. All smooth terms can be shape constrained.

The CGAIM allows for linear constraints on the alpha coefficients. Such constraints can be specified through the g interface in the formula, or through alpha.control$Cmat. The g interface is used for constraints meant for a specific index only. In this case, common constraints can easily be specified through the acons argument (see build_constraints). Alternatively, more general constraint can be specified by passing a matrix to the Cmat argument. Constraints encompassing several indices can be specified through an element Cmat in alpha.control. Its number of columns must match the total number of index coefficients alpha to estimate. In all cases, arguments bvec are used to specify the bounds of constraints.

Both indices (g) and smooth covariate terms (s) allow shape constraints. See dedicated help for the list of constraints allowed.

The CGAIM is fitted through an iterative algorithm that alternates between estimating the ridge functions g_{j} (and other non-index terms) and updating the coefficients \alpha_{j}. The smoothing of ridge functions currently supports three methods: scam (the default), cgam and scar. The list smooth.control controls the smoothing with allowed parameters defined in cgaim.control.

Value

A cgaim object, i.e. a list with components:

`alpha`	A named list of index coefficients.
`gfit`	A matrix containing the ridge and smooth functions evaluated at the observations. Note that column ordering puts indices first and covariates after.
`indexfit`	A matrix containing the indices evaluated at the observations.
`beta`	A vector containing the intercept and the scale coefficient of each ridge and smooth function. Includes the `\gamma_{k}` of the CGAIM model above. Note that ordering puts indices first and covariates after.
`index`	A vector identifying to which index the columns of the element `x` belong.
`fitted`	A vector of fitted responses.
`residuals`	A vector of residuals.
`rss`	The residual sum of squares of the fit.
`flag`	A flag indicating how the algorithm stopped. 1 for proper convergence, 2 when the algorithm stopped for failing to decrease the RSS and 3 when the maximum number of iterations has been reached.
`niter`	Number of iterations performed.
`edf`	Effective degrees of freedom of the estimator.
`gcv`	Generalized cross validation score.
`dg`	A matrix containing derivatives of ridge and smooth functions.
`gse`	A matrix containing standard errors of ridge and smooth functions.
`active`	A logical vector indicating which constraints are active at convergence.
`Cmat`	The constraint matrix used to fit index coefficients alpha. Will include all constraints given through `g` and the `Cmat` parameter.
`bvec`	The lower bound vector associated with `Cmat`.
`x`	A matrix containing the variables entering the indices. The variables are mapped to each index through the element `index`.
`y`	The response vector.
`weights`	The weights used for estimation.
`sm_mod`	A list of model elements for the smoothing step of the estimation. Notably includes the matrix `Xcov` that includes the covariates not entering any index. Other elements depend on the method chosen for smoothing.
`control`	The control list used to fit the cgaim.
`terms`	The model terms.

Note

A model without intercept can only be fitted when the smoothing step is performed with scam.

Examples

## Simulate some data
n <- 200
x1 <- rnorm(n)
x2 <- rnorm(n)
x3 <- rnorm(n)
x4 <- rnorm(n)
mu <- 4 * exp(8 * x1) / (1 + exp(8 * x1)) + exp(x3)
y <- mu + rnorm(n)
df1 <- data.frame(y, x1, x2, x3, x4)

## Fit an unconstrained the model
ans <- cgaim(y ~ g(x1, x2) + g(x3, x4), data = df1)

# Compute confidence intervals
# In practice, higher B values are warranted
cia <- confint(ans, B = 100)
cia$alpha
cia$beta

# Display ridge functions
plot(ans, ci = cia)

# Predict
newdf <- as.data.frame(matrix(rnorm(100), 25, 4))
names(newdf) <- sprintf("x%i", 1:4)
yhat <- predict(ans, newdf)

## Fit constrained model
ans2 <- cgaim(y ~ g(x1, x2, acons = list(monotone = -1)) + 
  g(x3, x4, fcons = "cvx"), data = df1)

# Check results
ans2
plot(ans2)

# Same result
Cmat <- as.matrix(Matrix::bdiag(list(build_constraints(2, monotone = -1), 
  build_constraints(2, first = 1))))
ans3 <- cgaim(y ~ g(x1, x2) + g(x3, x4, fcons = "cvx"), data = df1,
  Cmat = Cmat)

## A mis-specified model
ans4 <- cgaim(y ~ g(x1, x2, acons = list(monotone = 1)) + 
  g(x3, x4, fcons = "dec"), data = df1)

[Package cgaim version 1.0.1 Index]