qgcomp.partials {qgcomp} | R Documentation |
Partial effect sizes, confidence intervals, hypothesis tests
Description
Obtain effect estimates for "partial positive" and "partial negative" effects using quantile g-computation. This approach uses sample splitting to evaluate the overall impact of a set of variables with effects in a single direction, where, using training data, all variables with effects in the same direction are grouped.
Usage
qgcomp.partials(
fun = c("qgcomp.glm.noboot", "qgcomp.cox.noboot", "qgcomp.zi.noboot"),
traindata = NULL,
validdata = NULL,
expnms = NULL,
.fixbreaks = TRUE,
.globalbreaks = FALSE,
...
)
Arguments
fun |
character variable in the set "qgcomp.glm.noboot" (binary, count, continuous outcomes), "qgcomp.cox.noboot" (survival outcomes), "qgcomp.zi.noboot" (zero inflated outcomes). This describes which qgcomp package function is used to fit the model. (default = "qgcomp.glm.noboot") |
traindata |
Data frame with training data |
validdata |
Data frame with validation data |
expnms |
Exposure mixture of interest |
.fixbreaks |
(logical, overridden by .globalbreaks) Use the same quantile cutpoints in the training and validation data (selected in the training data). As of version 2.8.11, the default is TRUE, whereas it was implicitly FALSE in prior verions. Setting to TRUE increases variance but greatly decreases bias in smaller samples. |
.globalbreaks |
(logical, if TRUE, overrides .fixbreaks) Use the same quantile cutpoints in the training and validation data (selected in combined training and validation data). As of version 2.8.11, the default is TRUE, whereas it was implicitly FALSE in prior verions. Setting to TRUE increases variance but greatly decreases bias in smaller samples. |
... |
Arguments to |
Details
In the basic (non bootstrapped) qgcomp
functions, the positive and
negative "sums
of coefficients" or "partial effect sizes" are given, which equal the sum
of the negative and positive coefficients in the underlying model. Unfortunately,
these partial effects don't constitute variables for which we can derive confidence
intervals or hypothesis tests, so they are mainly for exploratory purposes. By employing
sample splitting, however, we can obtain better estimates of these partial effects.
Sample splitting proceeds by partitioning the data into two samples (40/60 training/validtion split seems acceptable in many circumstances). The "overall mixture effect" is then estimated in the training data, and the mixture variables with positive and negative coefficients are split into separate groups. These two different groups are then used as "the mixture of interest" in two additional qgcomp fits, where the mixture of interest is adjusted for the other exposure variables. For example, if the "positive partial effect" is of interest, then this effect is equal to the sum of the coefficients in the qgcomp model fit to the validation data, with the mixture of interest selected by the original fit to the training data (note that some of these coefficients may be negative in the fit to the validation data - this is expected and necessary for valid hypothesis tests).
The positive/negative partial effects are necessarily exploratory, but sample splitting preserves the statistical properties at the expense of wider confidence intervals and larger variances. The two resulting mixture groups groups should be inspected for
Value
A 'qgcompmultifit' object, which inherits from list
, which contains
- posmix
character vector of variable names with positive coefficients in the qgcomp model fit to the training data
- negmix
character vector of variable names with negative coefficients in the qgcomp model fit to the training data
- pos.fit
a qgcompfit object fit to the validation data, in which the exposures of interest are contained in 'posmix'
- neg.fit
a qgcompfit object fit to the validation data, in which the exposures of interest are contained in 'negmix'
See Also
Other qgcomp_methods:
qgcomp.cch.noboot()
,
qgcomp.cox.boot()
,
qgcomp.cox.noboot()
,
qgcomp.glm.boot()
,
qgcomp.glm.noboot()
,
qgcomp.hurdle.boot()
,
qgcomp.hurdle.noboot()
,
qgcomp.multinomial.boot()
,
qgcomp.multinomial.noboot()
,
qgcomp.zi.boot()
,
qgcomp.zi.noboot()
Examples
set.seed(123223)
dat = qgcomp::simdata_quantized(n=1000, outcomtype="continuous", cor=c(.75, 0),
b0=0, coef=c(0.25,-0.25,0,0), q=4)
cor(dat)
# overall fit (more or less null due to counteracting exposures)
(overall <- qgcomp.glm.noboot(f=y~., q=NULL, expnms=c("x1", "x2", "x3", "x4"), data=dat))
# partial effects using 40% training/60% validation split
trainidx <- sample(1:nrow(dat), round(nrow(dat)*0.4))
valididx <- setdiff(1:nrow(dat),trainidx)
traindata = dat[trainidx,]
validdata = dat[valididx,]
splitres <- qgcomp.partials(fun="qgcomp.glm.noboot", f=y~., q=NULL,
traindata=traindata,validdata=validdata, expnms=c("x1", "x2", "x3", "x4"))
splitres
## Not run:
# under the null, both should give null results
set.seed(123223)
dat = simdata_quantized(n=1000, outcomtype="continuous", cor=c(.75, 0),
b0=0, coef=c(0,0,0,0), q=4)
# 40% training/60% validation
trainidx2 <- sample(1:nrow(dat), round(nrow(dat)*0.4))
valididx2 <- setdiff(1:nrow(dat),trainidx2)
traindata2 <- dat[trainidx2,]
validdata2 <- dat[valididx2,]
splitres2 <- qgcomp.partials(fun="qgcomp.glm.noboot", f=y~.,
q=NULL, traindata=traindata2,validdata=validdata2, expnms=c("x1", "x2", "x3", "x4"))
splitres2
# 60% training/40% validation
trainidx3 <- sample(1:nrow(dat), round(nrow(dat)*0.6))
valididx3 <- setdiff(1:nrow(dat),trainidx3)
traindata3 <- dat[trainidx3,]
validdata3 <- dat[valididx3,]
splitres3 <- qgcomp.partials(fun="qgcomp.glm.noboot", f=y~., q=NULL,
traindata=traindata3,validdata=validdata3, expnms=c("x1", "x2", "x3", "x4"))
splitres3
# survival outcome
set.seed(50)
N=1000
dat = simdata_quantized(n=1000, outcomtype="survival", cor=c(.75, 0, 0, 0, 1),
b0=0, coef=c(1,0,0,0,0,1), q=4)
names(dat)[which(names(dat=="x5"))] = "z"
trainidx4 <- sample(1:nrow(dat), round(nrow(dat)*0.6))
valididx4 <- setdiff(1:nrow(dat),trainidx4)
traindata4 <- dat[trainidx4,]
validdata4 <- dat[valididx4,]
expnms=paste0("x", 1:5)
f = survival::Surv(time, d)~x1 + x2 + x3 + x4 + x5 + z
(fit1 <- survival::coxph(f, data = dat))
(overall <- qgcomp.cox.noboot(f, expnms = expnms, data = dat))
(splitres4 <- qgcomp.partials(fun="qgcomp.cox.noboot", f=f, q=4,
traindata=traindata4,validdata=validdata4,
expnms=expnms))
# zero inflated count outcome
set.seed(50)
n=1000
dat <- data.frame(y= (yany <- rbinom(n, 1, 0.5))*(ycnt <- rpois(n, 1.2)), x1=runif(n)+ycnt*0.2,
x2=runif(n)-ycnt*0.2, x3=runif(n),
x4=runif(n) , z=runif(n))
# poisson count model, mixture in both portions, but note that the qgcomp.partials
# function defines the "positive" variables only by the count portion of the model
(overall5 <- qgcomp.zi.noboot(f=y ~ z + x1 + x2 + x3 + x4 | x1 + x2 + x3 + x4 + z,
expnms = c("x1", "x2", "x3", "x4"),
data=dat, q=4, dist="poisson"))
trainidx5 <- sample(1:nrow(dat), round(nrow(dat)*0.6))
valididx5 <- setdiff(1:nrow(dat),trainidx5)
traindata5 <- dat[trainidx5,]
validdata5 <- dat[valididx5,]
splitres5 <- qgcomp.partials(fun="qgcomp.zi.noboot",
f=y ~ x1 + x2 + x3 + x4 + z | x1 + x2 + x3 + x4 + z, q=4,
traindata=traindata5, validdata=validdata5,
expnms=c("x1", "x2", "x3", "x4"))
splitres5
## End(Not run)