DTR.KernSmooth {DTRKernSmooth} | R Documentation |
Estimate the optimal treatment regime among all linear regimes with smoothed estimation methods
Description
This function estimates the optimal treatment regime among all linear regimes with smoothed estimation methods and doubly robust correction, and outputs a 'DTR.KernSmooth' model object
Usage
DTR.KernSmooth(
X,
y,
a,
intercept = TRUE,
prob = 0.5,
m0 = mean(y[a == 0]),
m1 = mean(y[a == 1]),
kernel = "normal",
phi0 = 1,
gamma = 2,
err_tol = 1e-04,
iter_tol = 200
)
Arguments
X |
Input matrix, of dimension n_obs x n_vars; each row is an observation vector. |
y |
Response variable to be maximized on average if every subject follows the treatment recommended by the optimal regime. |
a |
Received treatments for n_obs subjects. Must be bivariate, and labeled as {0,1}. |
intercept |
Logical. |
prob |
The probability to receive the assigned treatments for the n_obs subjects, i.e., P(a=a_i|X_i). If |
m0 |
The estimated response values if the subjects receive treatment 0. The default is the average response value of all subjects who receive treatment 0. |
m1 |
The estimated response values if the subjects receive treatment 1. The default is the average response value of all subjects who receive treatment 1. |
kernel |
The kernel function to be used in smoothed estimation. Should be one of "normal", "poly1" and "poly2". The default value is "normal". See more details in "Details". |
phi0 |
The initial step size to be used in the Proximal Algorithm. The default value is 1. |
gamma |
The multiplier of the step sizes to be used in the Proximal Algorithm. Must be gamma > 1. The default value is 2. |
err_tol |
The desired accuracy in the estimation. The default value is 1e-4. |
iter_tol |
The maximum number of iterations in the estimation algorithm. The default value is 200. |
Details
This function estimates the optimal linear treatment regime to maximizes
the average outcome among the population if every individual follows the treatment
recommended by this treatment regime.
Assume the propensity score \pi(\bm{x})=P(A=1|\bm{x})
can be modeled as
\pi(\bm{x},\bm{\xi})
where \bm{\xi}
is a finite-dimensional parameter
(e.g., via logistic regression). Let \widehat{\bm{\xi}}
be an estimate
of \bm{\xi}
. LetLet \pi_a(\bm{x}_i, \widehat{\bm{\xi}})=A_i\pi(\bm{x}_i, \widehat{\bm{\xi}})
+ (1-A_i)\left[1-\pi(\bm{x}_i, \widehat{\bm{\xi}})\right]
, and \widehat{m}_c(\bm{x}_i, \widehat{\bm{\beta}})
= I\left(\bm{x}_i^T\bm{\beta}>0\right)\widehat{m}_1(\bm{x}_i)
+ I\left(\bm{x}_i^T\bm{\beta}\leq 0\right)\widehat{m}_0(\bm{x}_i)
Hence, our goal is to estimate \bm{\beta}
which maximizes:
V_n(\bm{\beta})=n^{-1}\sum_{i=1}^n \frac{\left[A_i I\left(\bm{x}_i^T\bm{\beta}>0\right)+(1-A_i)I\left(\bm{x}_i^T\bm{\beta}\leq 0\right)\right]Y_i}
{\pi_a(\bm{x}_i, \widehat{\bm{\xi}})}- n^{-1}\sum_{i=1}^n \frac{ A_i I\left(\bm{x}_i^T\bm{\beta}>0\right)+(1-A_i)I\left(\bm{x}_i^T\bm{\beta}\leq 0\right) -\pi_a(\bm{x}_i, \widehat{\bm{\xi}})}
{\pi_a(\bm{x}_i, \widehat{\bm{\xi}})}\widehat{m}_c(\bm{x}_i, \widehat{\bm{\beta}}),
with the second term as the doubly correction.
For the identifiability, we normalize the estimator such that the second element
has magnitude 1, i.e., |\widehat{\beta}_2|=1
.
To alleviates the computational challenge due to the nonsmooth indicator function,
and derive asymptotic distribution of the estimators, we consider to use a smoothed
function K(\cdot)
to approximate the indicator function I(\cdot)
.
That is, we will estimate \bm{\beta}
which maximizes:
n^{-1}\sum_{i=1}^n \frac{\left[A_i K\left(\frac{\bm{x}_i^T\bm{\beta}}{h_n}\right)+(1-A_i)\left\{1-K\left(\frac{\bm{x}_i^T\bm{\beta}}{h_n}\right)\right\}\right]Y_i}
{\pi_a(\bm{x}_i, \widehat{\bm{\xi}})}- n^{-1}\sum_{i=1}^n \frac{\left[A_i-\pi_a(\bm{x}_i, \widehat{\bm{\xi}})\right] \widehat{m}_1(\bm{x}_i)K\left(\frac{\bm{x}_i^T\bm{\beta}}{h_n}\right)+\left[1-A_i-\pi_a(\bm{x}_i, \widehat{\bm{\xi}})\right] \widehat{m}_0(\bm{x}_i) \left\{1-K\left(\frac{\bm{x}_i^T\bm{\beta}}{h_n}\right)\right\}}
{\pi_a(\bm{x}_i, \widehat{\bm{\xi}})}.
In this function, we provide three options for the smoothed kernel functions:
- "normal"
The c.d.f of N(0,1) distribution. The bandwidth is set as
h_n=0.9n^{-0.2} \min\{std (\bm{x}_i^T\bm{\beta}),IQR(\bm{x}_i^T\bm{\beta})/1.34\}
.- "poly1"
A polynomial function
K(v) =\left[0.5 + \frac{105}{64}\{\frac{v}{5}-\frac{5}{3}(\frac{v}{5})^3 +\frac{7}{5}(\frac{v}{5})^5 - \frac{3}{7}(\frac{v}{5})^7\}\right]I( -5\leq v \leq 5)+I(v>5)
. The bandwidth is set ash_n=0.9n^{-1/9} \min\{std (\bm{x}_i^T\bm{\beta}),IQR(\bm{x}_i^T\bm{\beta})/1.34\}
.- "poly2"
A polynomial function
K(v) =\left[0.5 + \frac{225}{128}\{\frac{v}{5}-\frac{14}{9}(\frac{v}{5})^3 +\frac{21}{25}(\frac{v}{5})^5\}\right]I( -5\leq v \leq 5)+I(v>5)
. The bandwidth is set ash_n=0.9n^{-1/13} \min\{std (\bm{x}_i^T\bm{\beta}),IQR(\bm{x}_i^T\bm{\beta})/1.34\}
.
To solve the non-convexity problem of the optimization, we employ a proximal gradient descent algorithm for estimation. See more details in the reference.
Value
An object of class "DTR.KernSmooth", which is a list containing at least the following components:
X |
The input matrix used. |
y |
The response variable used. |
a |
The treatment vector received by each subject. |
intercept |
Logical which indicates whether the intercept is included in estimating the optimal treatment regime. |
prob |
The propensity score vector for each subject. |
m0 |
The estimated response values used if the subjects receive treatment 0. |
m1 |
The estimated response values used if the subjects receive treatment 1. |
kernel |
The kernel function used in smoothed estimation. |
beta_smooth |
The estimated optimal treatment regime vector. |
opt_treatment |
The predicted optimal treatments for the input data given the estimated optimal regime. |
value_smooth |
The estimated optimal average response value among all linear treatment regimes. |
converge |
Logical. |
iter_num |
The number of iterations used for the algorithm convergence. |
Author(s)
Yunan Wu and Lan Wang
Maintainer:
Yunan Wu <yunan.wu@utdallas.edu>
References
Wu, Y. and Wang, L. (2021),
Resampling-based Confidence Intervals for Model-free Robust Inference
on Optimal Treatment Regimes, Biometrics, 77: 465– 476, doi:10.1111/biom.13337.
Nesterov, Y. (2007).
Gradient methods for minimizing composite objective function. Core
discussion papers, Université catholique de Louvain, Center for Operations
Research and Econometrics (CORE).
See Also
predict.DTR.KernSmooth
, obj_value
,
DTR.Boots.KernSmooth
Examples
n <- 500; p <- 3
beta <- c(0.2,1,-0.5,-0.8)*0.7
beta1 <- c(1,-0.5,-0.5,0.5)
set.seed(12345)
X <- matrix(rnorm(n*p),n)
a <- rbinom(n,1,0.7)
mean1 <- exp(cbind(1,X) %*% beta1)
mean2 <- 8/(1 + exp(-cbind(1,X) %*% beta)) - 4
y <- mean1 + a * mean2 + rnorm(n)
smooth_model_ci <- DTR.KernSmooth(X, y, a, prob = 0.3 + 0.4*a, m0 = 0, m1 = 0)
smooth_model_ci$beta_smooth
smooth_model_ci$value_smooth
smooth_model_ic <- DTR.KernSmooth(X, y, a, m0 = mean1, m1 = mean1 + mean2)
smooth_model_ic$beta_smooth
smooth_model_ic$value_smooth
smooth_model_cc <- DTR.KernSmooth(X, y, a, prob = 0.3 + 0.4*a, m0 = mean1, m1 = mean1 + mean2)
smooth_model_cc$beta_smooth
smooth_model_cc$value_smooth
smooth_model_ii <- DTR.KernSmooth(X, y, a)
smooth_model_ii$beta_smooth
smooth_model_ii$value_smooth