simsl {simsl} | R Documentation |
Single-index models with a surface-link (main function)
Description
simsl
is the wrapper function for fitting a single-index model with a surface-link (SIMSL).
The function estimates a linear combination (a single-index) of baseline covariates X, and models a nonlinear interactive structure between the single-index and a treatment variable defined on a continuum, by estimating a smooth link function on the index-treatment domain. The resulting simsl
object can be used to estimate an optimal dose rule for a new patient with baseline clinical information.
Usage
simsl(y, A, X, Xm = NULL, family = "gaussian", R = NULL,
bs = c("ps", "ps"), k = c(8, 8), m = list(NA, NA), sp = NULL,
knots = NULL, sep.A.effect = FALSE, mc = c(TRUE, FALSE),
method = "GCV.Cp", beta.ini = NULL, ind.to.be.positive = NULL,
random.effect = FALSE, z = NULL, gamma = 1, pen.order = 0,
lambda = 0, max.iter = 10, eps.iter = 0.01, trace.iter = TRUE,
center.X = TRUE, scale.X = TRUE, uncons.final.fit = TRUE,
bootstrap = FALSE, nboot = 200, boot.conf = 0.95, seed = 1357)
Arguments
y |
a n-by-1 vector of treatment outcomes; y is a member of the exponential family; any distribution supported by |
A |
a n-by-1 vector of treatment variable; each element is assumed to take a value on a continuum. |
X |
a n-by-p matrix of baseline covarates. |
Xm |
a n-by-q design matrix associated with an X main effect model; the defult is |
family |
specifies the distribution of y; e.g., "gaussian", "binomial", "poisson"; can be any family supported by |
R |
the number of response categories for the case of family = "ordinal". |
bs |
basis type for the treatment (A) and single-index domains, respectively; the defult is "ps" (p-splines); any basis supported by |
k |
basis dimension for the treatment (A) and single-index domains, respectively. |
m |
a length 2 list (e.g., m=list(c(2,3), c(2,2))), for the treatment (A) and single-index domains, respectively, where each element specifies the order of basis and penalty (note, for bs="ps", c(2,3) means a 2nd order P-spline basis (cubic spline) and a 3rd order difference penalty; the default "NA" sets c(2,2) for each domain); see |
sp |
a vector of smoothing parameters; Smoothing parameters must be supplied in the order that the smooth terms appear in the model formula (i.e., A, and then the single-index); negative elements indicate that the parameter should be estimated, and hence a mixture of fixed and estimated parameters is possible; see |
knots |
a list containing user-specified knot values to be used for basis construction, for the treatment (A) and single-index domains, respectively. |
sep.A.effect |
If |
mc |
a length 2 vector indicating which marginals (i.e., A and the single-index, respectively) should have centering (i.e., the sum-to-zero) constraints applied; the default is |
method |
the smoothing parameter estimation method; "GCV.Cp" to use GCV for unknown scale parameter and Mallows' Cp/UBRE/AIC for known scale; any method supported by |
beta.ini |
an initial value for |
ind.to.be.positive |
for identifiability of the solution |
random.effect |
if |
z |
a factor that specifies the random intercepts when |
gamma |
increase this beyond 1 to produce smoother models. |
pen.order |
0 indicates the ridge penalty; 1 indicates the 1st difference penalty; 2 indicates the 2nd difference penalty, used in a penalized least squares (LS) estimation of |
lambda |
a regularization parameter associated with the penalized LS for |
max.iter |
an integer specifying the maximum number of iterations for |
eps.iter |
a value specifying the convergence criterion of algorithm. |
trace.iter |
if |
center.X |
if |
scale.X |
if |
uncons.final.fit |
if |
bootstrap |
if |
nboot |
when |
boot.conf |
a value specifying the confidence level of the bootstrap confidence intervals; the defult is |
seed |
when |
Details
SIMSL captures the effect of covariates via a single-index and their interaction with the treatment via a 2-dimensional smooth link function.
Interaction effects are determined by shapes of the link surface.
The SIMSL allows comparing different individual treatment levels and constructing individual treatment rules,
as functions of a biomarker signature (single-index), efficiently utilizing information on patient’s characteristics.
The resulting simsl
object can be used to estimate an optimal dose rule for a new patient with baseline clinical information.
Value
a list of information of the fitted SIMSL including
beta.coef |
the estimated single-index coefficients. |
g.fit |
a |
beta.ini |
the initial value used in the estimation of |
beta.path |
solution path of |
d.beta |
records the change in |
X.scale |
sd of pretreatment covariates X |
X.center |
mean of pretreatment covariates X |
A.range |
range of the observed treatment variable A |
p |
number of baseline covariates X |
n |
number of subjects |
boot.ci |
|
boot.mat |
a (nboot x p) matrix of bootstrap estimates of |
Author(s)
Park, Petkova, Tarpey, Ogden
See Also
pred.simsl
, fit.simsl
Examples
set.seed(1234)
n.test <- 500
## simulation 1
# generate training data
p <- 30
n <- 200
X <- matrix(runif(n*p,-1,1),ncol=p)
A <- runif(n,0,2)
D_opt <- 1 + 0.5*X[,2] + 0.5*X[,1]
mean.fn <- function(X, D_opt, A){ 8 + 4*X[,1] - 2*X[,2] - 2*X[,3] - 25*((D_opt-A)^2) }
mu <- mean.fn(X, D_opt, A)
y <- rnorm(length(mu),mu,1)
# fit SIMSL
simsl.obj <- simsl(y=y, A=A, X=X)
# generate testing data
X.test <- matrix(runif(n.test*p,-1,1),ncol=p)
A.test <- runif(n.test,0,2)
f_opt.test <- 1 + 0.5*X.test[,2] + 0.5*X.test[,1]
pred <- pred.simsl(simsl.obj, newX= X.test) # make prediction based on the estimated SIMSL
value <- mean(8 + 4*X.test[,1] - 2*X.test[,2] - 2*X.test[,3] - 25*((f_opt.test- pred$trt.rule)^2))
value # "value" of the estimated treatment rule; the "oracle" value is 8.
## simulation 2
p <- 10
n <- 400
# generate training data
X <- matrix(runif(n*p,-1,1),ncol=p)
A <- runif(n,0,2)
f_opt <- I(X[,1] > -0.5)*I(X[,1] < 0.5)*0.6 + 1.2*I(X[,1] > 0.5) +
1.2*I(X[,1] < -0.5) + X[,4]^2 + 0.5*log(abs(X[,7])+1) - 0.6
mu <- 8 + 4*cos(2*pi*X[,2]) - 2*X[,4] - 8*X[,5]^3 - 15*abs(f_opt-A)
y <- rnorm(length(mu),mu,1)
Xq <- cbind(X, X^2) # include a quadratic term
# fit SIMSL
simsl.obj <- simsl(y=y, A=A, X=Xq)
# generate testing data
X.test <- matrix(runif(n.test*p,-1,1),ncol=p)
A.test <- runif(n.test,0,2)
f_opt.test <- I(X.test[,1] > -0.5)*I(X.test[,1] < 0.5)*0.6 + 1.2*I(X.test[,1] > 0.5) +
1.2*I(X.test[,1] < -0.5) + X.test[,4]^2 + 0.5*log(abs(X.test[,7])+1) - 0.6
Xq.test <- cbind(X.test, X.test^2)
pred <- pred.simsl(simsl.obj, newX= Xq.test) # make prediction based on the estimated SIMSL
value <- mean(8 + 4*cos(2*pi*X.test[,2]) - 2*X.test[,4] - 8*X.test[,5]^3 -
15*abs(f_opt.test-pred$trt.rule))
value # "value" of the estimated treatment rule; the "oracle" value is 8.
### air pollution data application
data(chicago); head(chicago)
chicago <- chicago[,-3][complete.cases(chicago[,-3]), ]
chicago <- chicago[-c(2856:2859), ] # get rid of the gross outliers in y
chicago <- chicago[-which.max(chicago$pm10median), ] # get rid of the gross outliers in x
# create lagged variables
lagard <- function(x,n.lag=5) {
n <- length(x); X <- matrix(NA,n,n.lag)
for (i in 1:n.lag) X[i:n,i] <- x[i:n-i+1]
X
}
chicago$pm10 <- lagard(chicago$pm10median)
chicago <- chicago[complete.cases(chicago), ]
# create season varaible
chicago$time.day <- round(chicago$time %% 365)
# fit SIMSL for modeling the season-by-pm10 interactions on their effects on outcomes
simsl.obj <- simsl(y=chicago$death, A=chicago$time.day, X=chicago[,7], bs=c("cc","ps"),
ind.to.be.positive = 1, family="poisson", method = "REML",
bootstrap =FALSE) # bootstrap = TRUE
simsl.obj$beta.coef # the estimated single-index coefficients
summary(simsl.obj$g.fit)
#round(simsl.obj$boot.ci,3)
mgcv::vis.gam(simsl.obj$g.fit, view=c("A","single.index"), theta=-135, phi = 30,
color="heat", se=2,ylab = "single-index", zlab = " ",
main=expression(paste("Interaction surface g")))
### Warfarin data application
data(warfarin)
X <- warfarin$X
A <- warfarin$A
y <- -abs(warfarin$INR - 2.5) # the target INR is 2.5
X[,1:3] <- scale(X[,1:3]) # standardize continuous variables
# Estimate the main effect, using an additive model
mu.fit <- mgcv::gam(y-mean(y) ~ X[, 4:13] +
s(X[,1], k=5, bs="ps")+
s(X[,2], k=5, bs="ps") +
s(X[,3], k=5, bs="ps"), method="REML")
summary(mu.fit)
mu.hat <- predict(mu.fit)
# fit SIMSL
simsl.obj <- simsl(y, A, X, Xm= mu.hat, scale.X = FALSE, center.X=FALSE, method="REML",
bootstrap = FALSE) # bootstrap = TRUE
simsl.obj$beta.coef
#round(simsl.obj$boot.ci,3)
mgcv::vis.gam(simsl.obj$g.fit, view=c("A","single.index"), theta=52, phi = 18,
color="heat", se=2, ylab = "single-index", zlab = "Y",
main=expression(paste("Interaction surface g")))