R.s.estimate {Rsurrogate} | R Documentation |
Calculates the proportion of treatment effect explained
Description
This function calculates the proportion of treatment effect on the primary outcome explained by the treatment effect on the surrogate marker(s). This function is intended to be used for a fully observed continuous outcome. The user can also request a variance estimate and a 95% confidence interval, both estimated using perturbating-resampling. If a confidence interval is requested three versions are provided: a normal approximation based interval, a quantile based interval, and Fieller's confidence interval.
Usage
R.s.estimate(sone, szero, yone, yzero, var = FALSE, conf.int = FALSE,
weight.perturb = NULL, number = "single", type = "robust",extrapolate = FALSE,
transform = FALSE,warn.te = FALSE, warn.support = FALSE)
Arguments
sone |
numeric vector or matrix; surrogate marker for treated observations, assumed to be continuous. If there are multiple surrogates then this should be a matrix with |
szero |
numeric vector; surrogate marker for control observations, assumed to be continuous.If there are multiple surrogates then this should be a matrix with |
yone |
numeric vector; primary outcome for treated observations, assumed to be continuous. |
yzero |
numeric vector; primary outcome for control observations, assumed to be continuous. |
var |
TRUE or FALSE; indicates whether a variance estimate is requested, default is FALSE. |
conf.int |
TRUE or FALSE; indicates whether a 95% confidence interval is requested, default is FALSE |
weight.perturb |
a |
number |
specifies the number of surrogate markers; choices are "multiple" or "single", default is "single" |
type |
specifies the type of estimation; choices are "robust" or "model" or "freedman", default is "robust" |
extrapolate |
TRUE or FALSE; indicates whether the user wants to use extrapolation. |
transform |
TRUE or FALSE; indicates whether the user wants to use a transformation for the surrogate marker. |
warn.te |
value to control warnings; user does not need to specify. |
warn.support |
value to control warnings; user does not need to specify. |
Details
Let and
denote the primary outcome under the treatment and primary outcome under the control,respectively. Let
and
denote the surrogate marker under the treatment and the surrogate marker under the control,respectively. The residual treatment effect is defined as
where and
is the marginal cumulative distribution function of
, the surrogate marker measure under the control. The proportion of treatment effect explained by the surrogate marker, which we denote by
, can be expressed using a contrast between
and
:
The definition and estimation of is described in the delta.estimate documentation.
A flexible model-based approach to estimate in the single marker setting is to specify:
It can be shown that when these models hold, . Thus, reasonable estimates for
and
using this approach would be
and
For robust estimation of in the single marker setting, we estimate
nonparametrically using kernel smoothing:
where is the observed
for person
,
is the observed
for person
,
is a smooth symmetric density function with finite support,
and
is a specified bandwidth. As in most nonparametric functional estimation procedures, the choice of the smoothing parameter
is critical. To eliminate the impact of the bias of the conditional mean function on the resulting estimator, we require the standard undersmoothing assumption of
with
To obtain an appropriate
we first use bw.nrd to obtain
; and then we let
with
. We then estimate
as
where is the observed
for person
and
is the observed
for person
. Lastly, we estimate
as
.
This function also allows for estimation of using Freedman's approach. Let
denote the primary outcome,
denote the surrogate marker, and
denote the treatment group (0 for control, 1 for treatment). Freedman's approach to calculating the proportion of treatment effect explained by the surrogate marker is to fit the following two regression models:
and estimating the proportion of treatment effect explained, denoted by , as
.
This function also estimates in a multiple marking setting. A flexible model-based approach to estimate
in the multiple marker setting is to specify models for
and
for each
in
(where p is the number of surrogate markers). Without loss of generality, consider the case where there are three surrogate markers,
and one specifies the following linear models:
It can be shown that when these models hold
Thus, reasonable estimates for and
here would be easily obtained by replacing the unknown regression coefficients in the models above by their consistent estimators.
For robust estimation of S in the multiple marker setting, we use a two-stage procedure combining the model-based approach and the nonparametric estimation procedure from the single marker setting. Specifically, we use a working semiparametric model:
and define and
to reduce the dimension of
in the first stage and in the second stage, we apply the robust approach used in the single marker setting to estimate its surrogacy.
To use Freedman's approach in the presence of multiple markers, the markers are simply additively entered into the second regression model.
Variance estimation and confidence interval construction are performed using perturbation-resampling. Specifically, let be
independent copies of a positive random variables
from a known distribution with unit mean and unit variance. Let
The variance of is obtained as the empirical variance of
In this package, we use weights generated from an Exponential(1) distribution and use
. Variance estimates for
and
are calculated similarly. We construct two versions of the
confidence interval for each estimate: one based on a normal approximation confidence interval using the estimated variance and another taking the 2.5th and 97.5th empirical percentile of the perturbed quantities. In addition, we use Fieller's method to obtain a third confidence interval for
as
where and
is the
th percentile of
where .
Note that if the observed supports for S are not the same, then for
outside the support of
may return NA (depending on the bandwidth). If extrapolation = TRUE, then the
values for these surrogate values are set to the closest non-NA value. If transform = TRUE, then
and
are transformed such that the new transformed values,
and
are defined as:
for
where
is the cumulative distribution function for a standard normal random variable, and
and
are the sample mean and standard deviation, respectively, of
.
Value
A list is returned:
R.s |
the estimate, |
R.s.var |
the variance estimate of |
conf.int.normal.R.s |
a vector of size 2; the 95% confidence interval for |
conf.int.quantile.R.s |
a vector of size 2; the 95% confidence interval for |
conf.int.fieller.R.s |
a vector of size 2; the 95% confidence interval for |
For all options other then "freedman", the following are also returned:
delta |
the estimate, |
delta.s |
the estimate, |
delta.var |
the variance estimate of |
delta.s.var |
the variance estimate of |
conf.int.normal.delta |
a vector of size 2; the 95% confidence interval for |
conf.int.quantile.delta |
a vector of size 2; the 95% confidence interval for |
conf.int.normal.delta.s |
a vector of size 2; the 95% confidence interval for |
conf.int.quantile.delta.s |
a vector of size 2; the 95% confidence interval for |
Note
If the treatment effect is not significant, the user will receive the following message: "Warning: it looks like the treatment effect is not significant; may be difficult to interpret the proportion of treatment effect explained in this setting". In the single marker case with the robust estimation approach, if the observed support of the surrogate marker for the control group is outside the observed support of the surrogate marker for the treatment group, the user will receive the following message: "Warning: observed supports do not appear equal, may need to consider a transformation or extrapolation"
Author(s)
Layla Parast
References
Freedman, L. S., Graubard, B. I., & Schatzkin, A. (1992). Statistical validation of intermediate endpoints for chronic diseases. Statistics in medicine, 11(2), 167-178.
Parast, L., McDermott, M., Tian, L. (2016). Robust estimation of the proportion of treatment effect explained by surrogate marker information. Statistics in Medicine, 35(10):1637-1653.
Wang, Y., & Taylor, J. M. (2002). A measure of the proportion of treatment effect explained by a surrogate marker. Biometrics, 58(4), 803-812.
Fieller, Edgar C. (1954). Some problems in interval estimation. Journal of the Royal Statistical Society. Series B (Methodological), 175-185.
Fieller, E. C. (1940). The biological standardization of insulin. Supplement to the Journal of the Royal Statistical Society, 1-64.
Examples
data(d_example)
names(d_example)
R.s.estimate(yone=d_example$y1, yzero=d_example$y0, sone=d_example$s1.a, szero=d_example$s0.a,
number = "single", type = "robust")
R.s.estimate(yone=d_example$y1, yzero=d_example$y0, sone=cbind(d_example$s1.a,d_example$s1.b,
d_example$s1.c), szero=cbind(d_example$s0.a, d_example$s0.b, d_example$s0.c),
number = "multiple", type = "model")