R: Single-index models with a surface-link (main function)

simsl {simsl}

R Documentation

Single-index models with a surface-link (main function)

Description

simsl is the wrapper function for fitting a single-index model with a surface-link (SIMSL). The function estimates a linear combination (a single-index) of baseline covariates X, and models a nonlinear interactive structure between the single-index and a treatment variable defined on a continuum, by estimating a smooth link function on the index-treatment domain. The resulting simsl object can be used to estimate an optimal dose rule for a new patient with baseline clinical information.

Usage

simsl(y, A, X, Xm = NULL, family = "gaussian", R = NULL,
  bs = c("ps", "ps"), k = c(8, 8), m = list(NA, NA), sp = NULL,
  knots = NULL, sep.A.effect = FALSE, mc = c(TRUE, FALSE),
  method = "GCV.Cp", beta.ini = NULL, ind.to.be.positive = NULL,
  random.effect = FALSE, z = NULL, gamma = 1, pen.order = 0,
  lambda = 0, max.iter = 10, eps.iter = 0.01, trace.iter = TRUE,
  center.X = TRUE, scale.X = TRUE, uncons.final.fit = TRUE,
  bootstrap = FALSE, nboot = 200, boot.conf = 0.95, seed = 1357)

Arguments

`y`	a n-by-1 vector of treatment outcomes; y is a member of the exponential family; any distribution supported by `mgcv::gam`; y can also be an ordinal categorial response with `R` categories taking a value from 1 to `R`.
`A`	a n-by-1 vector of treatment variable; each element is assumed to take a value on a continuum.
`X`	a n-by-p matrix of baseline covarates.
`Xm`	a n-by-q design matrix associated with an X main effect model; the defult is `NULL` and it is taken as a vector of zeros
`family`	specifies the distribution of y; e.g., "gaussian", "binomial", "poisson"; can be any family supported by `mgcv::gam`; can also be "ordinal", for an ordinal categorical response y.
`R`	the number of response categories for the case of family = "ordinal".
`bs`	basis type for the treatment (A) and single-index domains, respectively; the defult is "ps" (p-splines); any basis supported by `mgcv::gam` can be used, e.g., "cr" (cubic regression splines); see `mgcv::s` for detail.
`k`	basis dimension for the treatment (A) and single-index domains, respectively.
`m`	a length 2 list (e.g., m=list(c(2,3), c(2,2))), for the treatment (A) and single-index domains, respectively, where each element specifies the order of basis and penalty (note, for bs="ps", c(2,3) means a 2nd order P-spline basis (cubic spline) and a 3rd order difference penalty; the default "NA" sets c(2,2) for each domain); see `mgcv::s` for details.
`sp`	a vector of smoothing parameters; Smoothing parameters must be supplied in the order that the smooth terms appear in the model formula (i.e., A, and then the single-index); negative elements indicate that the parameter should be estimated, and hence a mixture of fixed and estimated parameters is possible; see `mgcv::gam` for detail.
`knots`	a list containing user-specified knot values to be used for basis construction, for the treatment (A) and single-index domains, respectively.
`sep.A.effect`	If `TRUE`, the g term of SIMSL is further decomposed into: the A main effect + the A-by-X interaction effect; the default is `FALSE`.
`mc`	a length 2 vector indicating which marginals (i.e., A and the single-index, respectively) should have centering (i.e., the sum-to-zero) constraints applied; the default is `mc = c(TRUE, FALSE)` (see `mgcv::te` for detail of the constraint), which is sufficient for the so-called "orthogonality" constraint of the SIMSL.
`method`	the smoothing parameter estimation method; "GCV.Cp" to use GCV for unknown scale parameter and Mallows' Cp/UBRE/AIC for known scale; any method supported by `mgcv::gam` can be used.
`beta.ini`	an initial value for `beta.coef`; a p-by-1 vector; the defult is `NULL`, in which case a linear model estimate is used.
`ind.to.be.positive`	for identifiability of the solution `beta.coef`, the user can restrict the jth (e.g., j=1) component of `beta.coef` to be positive; by default, we match the "overall" sign of `beta.coef` with that of the linear estimate (i.e., the initial estimate), by restricting the inner product between the two to be positive.
`random.effect`	if `TRUE`, as part of the main effects, the user can incorporate z-specific random intercepts.
`z`	a factor that specifies the random intercepts when `random.effect = TRUE`.
`gamma`	increase this beyond 1 to produce smoother models. `gamma` multiplies the effective degrees of freedom in the GCV or UBRE/AIC (see `mgcv::gam` for detail); the default is 1.
`pen.order`	0 indicates the ridge penalty; 1 indicates the 1st difference penalty; 2 indicates the 2nd difference penalty, used in a penalized least squares (LS) estimation of `beta.coef`.
`lambda`	a regularization parameter associated with the penalized LS for `beta.coef` update.
`max.iter`	an integer specifying the maximum number of iterations for `beta.coef` update.
`eps.iter`	a value specifying the convergence criterion of algorithm.
`trace.iter`	if `TRUE`, trace the estimation process and print the differences in `beta.coef`.
`center.X`	if `TRUE`, center X to have zero mean.
`scale.X`	if `TRUE`, scale X to have unit variance.
`uncons.final.fit`	if `TRUE`, once the convergence in the estimates of `beta.coef` is reached, include the main effect associated with the fitted single-index (beta.coef'X) to the final surface-link estimate.
`bootstrap`	if `TRUE`, compute bootstrap confidence intervals for the single-index coefficients, `beta.coef`; the default is `FALSE`.
`nboot`	when `bootstrap=TRUE`, a value specifying the number of bootstrap replications.
`boot.conf`	a value specifying the confidence level of the bootstrap confidence intervals; the defult is `boot.conf = 0.95`.
`seed`	when `bootstrap=TRUE`, randomization seed used in bootstrap resampling.

Details

SIMSL captures the effect of covariates via a single-index and their interaction with the treatment via a 2-dimensional smooth link function. Interaction effects are determined by shapes of the link surface. The SIMSL allows comparing different individual treatment levels and constructing individual treatment rules, as functions of a biomarker signature (single-index), efficiently utilizing information on patient’s characteristics. The resulting simsl object can be used to estimate an optimal dose rule for a new patient with baseline clinical information.

Value

a list of information of the fitted SIMSL including

`beta.coef`	the estimated single-index coefficients.
`g.fit`	a `mgcv:gam` object containing information about the estimated 2-dimensional link function.
`beta.ini`	the initial value used in the estimation of `beta.coef`
`beta.path`	solution path of `beta.coef` over the iterations
`d.beta`	records the change in `beta.coef` over the solution path, `beta.path`
`X.scale`	sd of pretreatment covariates X
`X.center`	mean of pretreatment covariates X
`A.range`	range of the observed treatment variable A
`p`	number of baseline covariates X
`n`	number of subjects
`boot.ci`	`boot.conf`-level bootstrap CIs (LB, UB) associated with `beta.coef`
`boot.mat`	a (nboot x p) matrix of bootstrap estimates of `beta.coef`

Author(s)

Park, Petkova, Tarpey, Ogden

Examples


set.seed(1234)
n.test <- 500


## simulation 1
# generate training data
p <- 30
n <- 200
X <- matrix(runif(n*p,-1,1),ncol=p)
A <- runif(n,0,2)
D_opt <- 1 + 0.5*X[,2] + 0.5*X[,1]
mean.fn <- function(X, D_opt, A){ 8 + 4*X[,1] - 2*X[,2] - 2*X[,3] - 25*((D_opt-A)^2) }
mu <-   mean.fn(X, D_opt, A)
y <- rnorm(length(mu),mu,1)
# fit SIMSL
simsl.obj <- simsl(y=y, A=A, X=X)

# generate testing data
X.test <- matrix(runif(n.test*p,-1,1),ncol=p)
A.test <- runif(n.test,0,2)
f_opt.test <- 1 + 0.5*X.test[,2] + 0.5*X.test[,1]
pred <- pred.simsl(simsl.obj, newX= X.test)  # make prediction based on the estimated SIMSL
value <- mean(8 + 4*X.test[,1] - 2*X.test[,2] - 2*X.test[,3] - 25*((f_opt.test- pred$trt.rule)^2))
value  # "value" of the estimated treatment rule; the "oracle" value is 8.


## simulation 2
p <- 10
n <- 400
# generate training data
X <- matrix(runif(n*p,-1,1),ncol=p)
A <- runif(n,0,2)
f_opt <- I(X[,1] > -0.5)*I(X[,1] < 0.5)*0.6 + 1.2*I(X[,1] > 0.5) +
 1.2*I(X[,1] < -0.5) + X[,4]^2 + 0.5*log(abs(X[,7])+1) - 0.6
mu <-   8 + 4*cos(2*pi*X[,2]) - 2*X[,4] - 8*X[,5]^3 - 15*abs(f_opt-A)
y  <- rnorm(length(mu),mu,1)
Xq <- cbind(X, X^2)  # include a quadratic term
# fit SIMSL
simsl.obj <- simsl(y=y, A=A, X=Xq)

# generate testing data
X.test <- matrix(runif(n.test*p,-1,1),ncol=p)
A.test <- runif(n.test,0,2)
f_opt.test <- I(X.test[,1] > -0.5)*I(X.test[,1] < 0.5)*0.6 + 1.2*I(X.test[,1] > 0.5) +
 1.2*I(X.test[,1] < -0.5) + X.test[,4]^2 + 0.5*log(abs(X.test[,7])+1) - 0.6
Xq.test <- cbind(X.test, X.test^2)
pred <- pred.simsl(simsl.obj, newX= Xq.test)  # make prediction based on the estimated SIMSL
value <- mean(8 + 4*cos(2*pi*X.test[,2]) - 2*X.test[,4] - 8*X.test[,5]^3 -
              15*abs(f_opt.test-pred$trt.rule))
value  # "value" of the estimated treatment rule; the "oracle" value is 8.



 ### air pollution data application
 data(chicago); head(chicago)
 chicago <- chicago[,-3][complete.cases(chicago[,-3]), ]
 chicago <- chicago[-c(2856:2859), ]  # get rid of the gross outliers in y
 chicago <- chicago[-which.max(chicago$pm10median), ]  # get rid of the gross outliers in x

 # create lagged variables
 lagard <- function(x,n.lag=5) {
   n <- length(x); X <- matrix(NA,n,n.lag)
   for (i in 1:n.lag) X[i:n,i] <- x[i:n-i+1]
   X
 }
 chicago$pm10 <- lagard(chicago$pm10median)
 chicago <- chicago[complete.cases(chicago), ]
 # create season varaible
 chicago$time.day <- round(chicago$time %%  365)

 # fit SIMSL for modeling the season-by-pm10 interactions on their effects on outcomes
 simsl.obj <- simsl(y=chicago$death, A=chicago$time.day, X=chicago[,7], bs=c("cc","ps"),
                    ind.to.be.positive = 1, family="poisson", method = "REML",
                    bootstrap =FALSE) # bootstrap = TRUE
 simsl.obj$beta.coef  # the estimated single-index coefficients
 summary(simsl.obj$g.fit)
 #round(simsl.obj$boot.ci,3)
 mgcv::vis.gam(simsl.obj$g.fit, view=c("A","single.index"), theta=-135, phi = 30,
               color="heat", se=2,ylab = "single-index", zlab = " ",
               main=expression(paste("Interaction surface g")))



 ### Warfarin data application
 data(warfarin)
 X <- warfarin$X
 A <- warfarin$A
 y <- -abs(warfarin$INR - 2.5)  # the target INR is 2.5
 X[,1:3] <- scale(X[,1:3]) # standardize continuous variables

 # Estimate the main effect, using an additive model
 mu.fit <- mgcv::gam(y-mean(y)  ~ X[, 4:13] +
                       s(X[,1], k=5, bs="ps")+
                       s(X[,2], k=5, bs="ps") +
                       s(X[,3], k=5, bs="ps"), method="REML")
 summary(mu.fit)
 mu.hat <- predict(mu.fit)
 # fit SIMSL
 simsl.obj <- simsl(y, A, X, Xm= mu.hat, scale.X = FALSE, center.X=FALSE, method="REML",
                    bootstrap = FALSE) # bootstrap = TRUE
 simsl.obj$beta.coef
 #round(simsl.obj$boot.ci,3)
 mgcv::vis.gam(simsl.obj$g.fit, view=c("A","single.index"), theta=52, phi = 18,
               color="heat", se=2, ylab = "single-index", zlab = "Y",
               main=expression(paste("Interaction surface g")))

[Package simsl version 0.2.1 Index]