simml {simml}R Documentation

Single-index models with multiple-links (main function)


simml is the wrapper function for Single-index models with multiple-links (SIMML). The function estimates a linear combination (a single-index) of covariates X, and models the treatment-specific outcome y, via treatment-specific nonparametrically-defined link functions.


simml(y, A, X, Xm = NULL, aug = NULL, family = "gaussian",
  R = NULL, bs = "cr", k = 8, sp = NULL, = FALSE,
  method = "GCV.Cp", gamma = 1, rho = 0, beta.ini = NULL, = NULL, = FALSE, max.iter = 20,
  eps.iter = 0.01, trace.iter = TRUE, lambda = 0, pen.order = 0,
  scale.X = TRUE, center.X = TRUE, ortho.constr = TRUE,
  si.main.effect = FALSE, random.effect = FALSE, z = NULL,
  plots = FALSE, bootstrap = FALSE, nboot = 200, boot.conf = 0.95,
  seed = 1357)



a n-by-1 vector of treatment outcomes; y is a member of the exponential family; any distribution supported by mgcv::gam; y can also be an ordinal categorial response with R categories taking a value from 1 to R.


a n-by-1 vector of treatment variable; each element is assumed to take a value in a finite discrete space.


a n-by-p matrix of baseline covarates.


a n-by-q design matrix associated with an X main effect model; the defult is NULL and it is taken as a vector of zeros


a n-by-1 additional augmentation vector associated with the X main effect; the default is NULL and it is taken as a vector of zeros


specifies the distribution of y; e.g., "gaussian", "binomial", "poisson"; can be any family supported by mgcv::gam; can also be "ordinal", for an ordinal categorical response y.


the number of response categories for the case of family = "ordinal".


basis type for the treatment (A) and single-index joint effect; the defult is "ps" (p-splines); any basis supported by mgcv::gam can be used, e.g., "cr" (cubic regression splines); see mgcv::s for detail.


basis dimension for the spline-type-represented treatment-specific link functions.


smoothing paramter for the treatment-specific link functions; if NULL, then estimated from the data.

if TRUE, the link function is restricted to be linear.


the smoothing parameter estimation method; "GCV.Cp" to use GCV for unknown scale parameter and Mallows' Cp/UBRE/AIC for known scale; any method supported by mgcv::gam can be used.


increase this beyond 1 to produce smoother models. gamma multiplies the effective degrees of freedom in the GCV or UBRE/AIC (see mgcv::gam for detail); the default is 1.


a tuning parameter associated with the additional augmentation vector aug; the default is 0.


an initial value for beta.coef; a p-by-1 vector; the defult is NULL, in which case a linear model estimate is used.

for identifiability of the solution beta.coef, the user can restrict the jth (e.g., j=1) component of beta.coef to be positive; by default, we match the "overall" sign of beta.coef with that of the linear estimate (i.e., the initial estimate), by restricting the inner product between the two to be positive.

if TRUE, re-scale the index coefficients to restrict the index to the interval [0,1]; in such a case, an intercept term is induced.


an integer specifying the maximum number of iterations for beta.coef update.


a value specifying the convergence criterion of algorithm.


if TRUE, trace the estimation process and print the differences in beta.coef.


a regularization parameter associated with the penalized LS for beta.coef update; the default is 0, and the index coefficients are not penalized.


0 indicates the ridge penalty; 1 indicates the 1st difference penalty; 2 indicates the 2nd difference penalty, used in a penalized least squares (LS) estimation of beta.coef.


if TRUE, scale X to have unit variance.


if TRUE, center X to have zero mean.


separates the interaction effects from the main effect (without this, the interaction effect can be confounded by the main effect; the default is TRUE.


if TRUE, once the convergence in the estimates of beta.coef is reached, include the main effect associated with the fitted single-index (beta.coef'X) to the final fit; the default is FALSE.


if TRUE, as part of the main effects, the user can incorporate z-specific random intercepts.


a factor that specifies the random intercepts when random.effect = TRUE.


if TRUE, produce a plot for the estimated effect contrast (for binary treatment cases) (on a linear predictor scale).


if TRUE, compute bootstrap confidence intervals for the single-index coefficients, beta.coef; the default is FALSE.


when bootstrap=TRUE, a value specifying the number of bootstrap replications.


a value specifying the confidence level of the bootstrap confidence intervals; the defult is boot.conf = 0.95.


when bootstrap=TRUE, randomization seed used in bootstrap resampling.


SIMML captures the effect of covariates via a single-index and their interaction with the treatment via nonparametric link functions. Interaction effects are determined by distinct shapes of the link functions. The estimated single-index is useful for comparing differential treatment efficacy. The resulting simml object can be used to estimate an optimal treatment decision rule for a new patient with pretreatment clinical information.


a list of information of the fitted SIMML including


the estimated single-index coefficients.

a mgcv:gam object containing information about the estimated treatment-specific link functions.


the initial value used in the estimation of beta.coef


solution path of beta.coef over the iterations


records the change in beta.coef over the solution path, beta.path


sd of pretreatment covariates X


mean of pretreatment covariates X


number of different treatment options


number of pretreatment covariates X


number of subjects

(1-boot.alpha/2) percentile bootstrap CIs (LB, UB) associated with beta.coef


Park, Petkova, Tarpey, Ogden

See Also

pred.simml, fit.simml


family <- "gaussian"   #"poisson"
delta = 1              # moderate main effect
s=2                    # if s=2 (s=1), a nonlinear (linear) contrast function
n=500                  # number of subjects
p=10                   # number of pretreatment covariates

# generate training data
data <- n, p=p, delta = delta, s= s, family = family)
data$SNR  # the ratio of interactions("signal") vs. main effects("noise")
A <- data$A
y <- data$y
X <- data$X

# generate testing data
data.test <-^5, p=p, delta = delta,  s= s, family = family)
A.test <- data.test$A
y.test <- data.test$y
X.test <- data.test$X
data.test$value.opt     # the optimal "value"

# fit SIMML
#1) SIMML without X main effect
simml.obj1 <- simml(y, A, X, family = family)

#2) SIMML with X main effect (estimation efficiency for the g term of SIMML can be improved)
simml.obj2 <- simml(y, A, X, Xm = X, family = family)

# apply the estimated SIMML to the testing set and obtain treatment assignment rules.
simml.trt.rule1 <- pred.simml(simml.obj1, newX= X.test)$trt.rule
# "value" estimation (estimated by IPWE)
simml.value1 <-  mean(y.test[simml.trt.rule1 == A.test])

simml.trt.rule2 <- pred.simml(simml.obj2, newX= X.test)$trt.rule
simml.value2 <-  mean(y.test[simml.trt.rule2 == A.test])

# compare these to the optimal "value"

# fit MC (modified covariates) model of Tien et al 2014
n.A <- summary(as.factor(A)); pi.A <- n.A/sum(n.A)
mc  <- (as.numeric(A) + pi.A[1] -2) *cbind(1, X)  # 0.5*(-1)^as.numeric(A) *cbind(1, X)
mc.coef  <-  coef(glm(y ~ mc, family =  family))
mc.trt.rule <- (cbind(1, X.test) %*% mc.coef[-1] > 0) +1
# "value" estimation (estimated by IPWE)
mc.value  <-  mean(y.test[mc.trt.rule == A.test])

# visualization of the estimated link functions of SIMML
simml.obj1$beta.coef        # estimated single-index coefficients <- simml.obj1$   # estimated trt-specific link functions; "" is a mgcv::gam object.

# can improve visualization by using the package "mgcViz"
# mgcViz depends on "rgl". "rgl" depends on XQuartz, which you can download from
# transform the "mgcv::gam" object to a "mgcViz" object (to improve visualization) <- getViz(

plot1  <- plot( sm(,1) )  # for treatment group 1
plot1 + l_fitLine(colour = "red") + l_rug(mapping = aes(x=x, y=y), alpha = 0.8) +
  l_ciLine(mul = 5, colour = "blue", linetype = 2) +
  l_points(shape = 19, size = 1, alpha = 0.1) +
  xlab(expression(paste("z = ", alpha*minute, "x")))  +  ylab("y") +
  ggtitle("Treatment group 1 (Trt =1)") +  theme_classic()

plot2 <- plot( sm(,2) )   # for treatment group 2
plot2 + l_fitLine(colour = "red") + l_rug(mapping = aes(x=x, y=y), alpha = 0.8) +
  l_ciLine(mul = 5, colour = "blue", linetype = 2) +
  l_points(shape = 19, size = 1, alpha = 0.1) +
  xlab(expression(paste("z = ", alpha*minute, "x"))) +ylab("y") +
  ggtitle("Treatment group 2 (Trt =2)") + theme_classic()

trans = function(x) x +$coefficients[2]
plotDiff(s1 = sm(, 2), s2 = sm(, 1), trans=trans) +  l_ciPoly() +
  l_fitLine() + geom_hline(yintercept = 0, linetype = 2) +
  xlab(expression(paste("z = ", alpha*minute, "x")) ) +
  ylab("(Treatment 2 effect) - (Treatment 1 effect)") +
  ggtitle("Contrast between two treatment effects") +

# yet another way of visualization, using ggplot2
dat  <- data.frame(y= simml.obj1$$model$y,
                   x= simml.obj1$$model$single.index,
                   Treatment= simml.obj1$$model$A)
g.plot<- ggplot(dat, aes(x=x,y=y,color=Treatment,shape=Treatment,linetype=Treatment))+
   geom_point(aes(color=Treatment, shape=Treatment), size=1, fill="white") +
   scale_colour_brewer(palette="Set1", direction=-1) +
   xlab(expression(paste(beta*minute,"x"))) + ylab("y")
g.plot + geom_smooth(method=gam, formula= y~ s(x, bs=simml.obj1$bs, k=simml.obj1$k),
                     se=TRUE, fullrange=TRUE, alpha = 0.35)

# can obtain bootstrap CIs for beta.coef.
simml.obj <- simml(y,A,X,Xm=X, family=family,bootstrap=TRUE,nboot=15)  #nboot=500.

# compare the estimates to the true beta.coef.

# an application to data with ordinal categorical response
dat <-, p=5, R = 11,  # 11 response levels
                   s = "nonlinear",     # nonlinear interactions
                   delta = 1)
y <- dat$y  # ordinal response
X <- dat$X  # X matrix
A <- dat$A  # treatment
dat$true.beta  # the "true" single-index coefficient

# 1) fit a cumulative logit simml, with a flexible link function
res <-  simml(y,A,X, family="ordinal", R=11)
res$beta.coef  # single-index coefficients.
res$$family$getTheta(TRUE)  # the estimated R-1 threshold values.

# 2) fit a cumulative logit simml, with a linear link function
res2 <-  simml(y,A,X, family="ordinal", R=11, = TRUE)
res2$beta.coef  # single-index coefficients.

family = mgcv::ocat(R=11)  # ocat: ordered categorical response family, with R categories.
# the treatment A's effect.
tmp <- mgcv::gam(y ~ A, family =family)
exp(coef(tmp)[2])  #odds ratio (OR) comparing treatment A=2 vs. A=1.

ind2 <- pred.simml(res)$trt.rule ==2  # subgroup recommended with A=2 under SIMML ITR
tmp2 <- mgcv::gam(y[ind2] ~ A[ind2], family = family)
exp(coef(tmp2)[2]) #OR comparing treatment A=2 vs. A=1, for subgroup recommended with A=2

ind1 <- pred.simml(res)$trt.rule ==1  # subgroup recommended with A=1 under SIMML ITR
tmp1 <- mgcv::gam(y[ind1] ~ A[ind1], family = family)
exp(coef(tmp1)[2]) #OR comparing treatment A=2 vs. A=1, for subgroup recommended with A=2

[Package simml version 0.3.0 Index]