policy_eval {polle} | R Documentation |
Policy Evaluation
Description
policy_eval()
is used to estimate the value of a given fixed policy
or a data adaptive policy (e.g. a policy learned from the data).
Usage
policy_eval(
policy_data,
policy = NULL,
policy_learn = NULL,
g_functions = NULL,
g_models = g_glm(),
g_full_history = FALSE,
save_g_functions = TRUE,
q_functions = NULL,
q_models = q_glm(),
q_full_history = FALSE,
save_q_functions = TRUE,
type = "dr",
M = 1,
future_args = list(future.seed = TRUE),
name = NULL
)
## S3 method for class 'policy_eval'
coef(object, ...)
## S3 method for class 'policy_eval'
IC(x, ...)
## S3 method for class 'policy_eval'
vcov(object, ...)
## S3 method for class 'policy_eval'
print(x, ...)
## S3 method for class 'policy_eval'
summary(object, ...)
## S3 method for class 'policy_eval'
estimate(x, ..., labels = x$name)
## S3 method for class 'policy_eval'
merge(x, y, ..., paired = TRUE)
## S3 method for class 'policy_eval'
x + ...
Arguments
policy_data |
Policy data object created by |
policy |
Policy object created by |
policy_learn |
Policy learner object created by |
g_functions |
Fitted g-model objects, see nuisance_functions.
Preferably, use |
g_models |
List of action probability models/g-models for each stage
created by |
g_full_history |
If TRUE, the full history is used to fit each g-model. If FALSE, the state/Markov type history is used to fit each g-model. |
save_g_functions |
If TRUE, the fitted g-functions are saved. |
q_functions |
Fitted Q-model objects, see nuisance_functions.
Only valid if the Q-functions are fitted using the same policy.
Preferably, use |
q_models |
Outcome regression models/Q-models created by
|
q_full_history |
Similar to g_full_history. |
save_q_functions |
Similar to save_g_functions. |
type |
Type of evaluation (dr/doubly robust, ipw/inverse propensity weighting, or/outcome regression). |
M |
Number of folds for cross-fitting. |
future_args |
Arguments passed to |
name |
Character string. |
object , x , y |
Objects of class "policy_eval". |
... |
Additional arguments. |
labels |
Name(s) of the estimate(s). |
paired |
|
Details
Each observation has the sequential form
O= {B, U_1, X_1, A_1, ..., U_K, X_K, A_K, U_{K+1}},
for a possibly stochastic number of stages K.
-
B
is a vector of baseline covariates. -
U_k
is the reward at stage k (not influenced by the actionA_k
). -
X_k
is a vector of state covariates summarizing the state at stage k. -
A_k
is the categorical action within the action set\mathcal{A}
at stage k.
The utility is given by the sum of the rewards, i.e.,
U = \sum_{k = 1}^{K+1} U_k
.
A policy is a set of functions
d = \{d_1, ..., d_K\},
where d_k
for k\in \{1, ..., K\}
maps \{B, X_1, A_1, ..., A_{k-1}, X_k\}
into the
action set.
Recursively define the Q-models (q_models
):
Q^d_K(h_K, a_K) = E[U|H_K = h_K, A_K = a_K]
Q^d_k(h_k, a_k) = E[Q_{k+1}(H_{k+1}, d_{k+1}(B,X_1, A_1,...,X_{k+1}))|H_k = h_k, A_k = a_k].
If q_full_history = TRUE
,
H_k = \{B, X_1, A_1, ..., A_{k-1}, X_k\}
, and if
q_full_history = FALSE
, H_k = \{B, X_k\}
.
The g-models (g_models
) are defined as
g_k(h_k, a_k) = P(A_k = a_k|H_k = h_k).
If g_full_history = TRUE
,
H_k = \{B, X_1, A_1, ..., A_{k-1}, X_k\}
, and if
g_full_history = FALSE
, H_k = \{B, X_k\}
.
Furthermore, if g_full_history = FALSE
and g_models
is a
single model, it is assumed that g_1(h_1, a_1) = ... = g_K(h_K, a_K)
.
If type = "or"
policy_eval
returns the empirical estimates of
the value (value_estimate
):
E[Q^d_1(H_1, d_1(...))]
for an appropriate input ...
to the policy.
If type = "ipw"
policy_eval
returns the empirical estimates of
the value (value_estimate
) and score (IC
):
E[(\prod_{k=1}^K I\{A_k = d_k(...)\} g_k(H_k, A_k)^{-1}) U].
(\prod_{k=1}^K I\{A_k = d_k(...)\} g_k(H_k, A_k)^{-1}) U - E[(\prod_{k=1}^K I\{A_k = d_k(...)\} g_k(H_k, A_k)^{-1}) U].
If type = "dr"
policy_eval
returns the empirical estimates of
the value (value_estimate
) and influence curve (IC
):
E[Z^d_1],
Z^d_1 - E[Z^d_1],
where
Z^d_1 = Q^d_1(H_1 , d_1(...)) + \sum_{r = 1}^K \prod_{j = 1}^{r}
\frac{I\{A_j = d_j(...)\}}{g_{j}(H_j, A_j)}
\{Q_{r+1}^d(H_{r+1} , d_{r+1}(...)) - Q_{r}^d(H_r , d_r(...))\}.
Value
policy_eval()
returns an object of class "policy_eval".
The object is a list containing the following elements:
value_estimate |
Numeric. The estimated value of the policy. |
type |
Character string. The type of evaluation ("dr", "ipw", "or"). |
IC |
Numeric vector. Estimated influence curve associated with the value estimate. |
value_estimate_ipw |
(only if |
value_estimate_or |
(only if |
id |
Character vector. The IDs of the observations. |
policy_actions |
data.table with keys id and stage. Actions associated with the policy for every observation and stage. |
policy_object |
(only if |
g_functions |
(only if |
g_values |
The fitted g-function values. |
q_functions |
(only if |
q_values |
The fitted Q-function values. |
cross_fits |
(only if |
folds |
(only if |
S3 generics
The following S3 generic functions are available for an object of
class policy_eval
:
get_g_functions()
Extract the fitted g-functions.
get_q_functions()
Extract the fitted Q-functions.
get_policy()
Extract the fitted policy object.
get_policy_functions()
Extract the fitted policy function for a given stage.
get_policy_actions()
Extract the (fitted) policy actions.
ps
plot.policy_eval()
Plot diagnostics.
References
van der Laan, Mark J., and Alexander R. Luedtke. "Targeted learning of the
mean outcome under an optimal dynamic treatment rule." Journal of causal
inference 3.1 (2015): 61-95. doi:10.1515/jci-2013-0022
Tsiatis, Anastasios A., et al. Dynamic treatment regimes: Statistical methods
for precision medicine. Chapman and Hall/CRC, 2019. doi:10.1201/9780429192692.
See Also
lava::IC, lava::estimate.default.
Examples
library("polle")
### Single stage:
d1 <- sim_single_stage(5e2, seed=1)
pd1 <- policy_data(d1, action="A", covariates=list("Z", "B", "L"), utility="U")
pd1
# defining a static policy (A=1):
pl1 <- policy_def(1)
# evaluating the policy:
pe1 <- policy_eval(policy_data = pd1,
policy = pl1,
g_models = g_glm(),
q_models = q_glm(),
name = "A=1 (glm)")
# summarizing the estimated value of the policy:
# (equivalent to summary(pe1)):
pe1
coef(pe1) # value coefficient
sqrt(vcov(pe1)) # value standard error
# getting the g-function and Q-function values:
head(predict(get_g_functions(pe1), pd1))
head(predict(get_q_functions(pe1), pd1))
# getting the fitted influence curve (IC) for the value:
head(IC(pe1))
# evaluating the policy using random forest nuisance models:
set.seed(1)
pe1_rf <- policy_eval(policy_data = pd1,
policy = pl1,
g_models = g_rf(),
q_models = q_rf(),
name = "A=1 (rf)")
# merging the two estimates (equivalent to pe1 + pe1_rf):
(est1 <- merge(pe1, pe1_rf))
coef(est1)
head(IC(est1))
### Two stages:
d2 <- sim_two_stage(5e2, seed=1)
pd2 <- policy_data(d2,
action = c("A_1", "A_2"),
covariates = list(L = c("L_1", "L_2"),
C = c("C_1", "C_2")),
utility = c("U_1", "U_2", "U_3"))
pd2
# defining a policy learner based on cross-fitted doubly robust Q-learning:
pl2 <- policy_learn(type = "drql",
control = control_drql(qv_models = list(q_glm(~C_1),
q_glm(~C_1+C_2))),
full_history = TRUE,
L = 2) # number of folds for cross-fitting
# evaluating the policy learner using 2-fold cross fitting:
pe2 <- policy_eval(type = "dr",
policy_data = pd2,
policy_learn = pl2,
q_models = q_glm(),
g_models = g_glm(),
M = 2, # number of folds for cross-fitting
name = "drql")
# summarizing the estimated value of the policy:
pe2
# getting the cross-fitted policy actions:
head(get_policy_actions(pe2))