R: Policy Evaluation

policy_eval {polle}

R Documentation

Policy Evaluation

Description

policy_eval() is used to estimate the value of a given fixed policy or a data adaptive policy (e.g. a policy learned from the data).

Usage

policy_eval(
  policy_data,
  policy = NULL,
  policy_learn = NULL,
  g_functions = NULL,
  g_models = g_glm(),
  g_full_history = FALSE,
  save_g_functions = TRUE,
  q_functions = NULL,
  q_models = q_glm(),
  q_full_history = FALSE,
  save_q_functions = TRUE,
  type = "dr",
  M = 1,
  future_args = list(future.seed = TRUE),
  name = NULL
)

## S3 method for class 'policy_eval'
coef(object, ...)

## S3 method for class 'policy_eval'
IC(x, ...)

## S3 method for class 'policy_eval'
vcov(object, ...)

## S3 method for class 'policy_eval'
print(x, ...)

## S3 method for class 'policy_eval'
summary(object, ...)

## S3 method for class 'policy_eval'
estimate(x, ..., labels = x$name)

## S3 method for class 'policy_eval'
merge(x, y, ..., paired = TRUE)

## S3 method for class 'policy_eval'
x + ...

Arguments

`policy_data`	Policy data object created by `policy_data()`.
`policy`	Policy object created by `policy_def()`.
`policy_learn`	Policy learner object created by `policy_learn()`.
`g_functions`	Fitted g-model objects, see nuisance_functions. Preferably, use `g_models`.
`g_models`	List of action probability models/g-models for each stage created by `g_empir()`, `g_glm()`, `g_rf()`, `g_sl()` or similar functions. Only used for evaluation if `g_functions` is `NULL`. If a single model is provided and `g_full_history` is `FALSE`, a single g-model is fitted across all stages. If `g_full_history` is `TRUE` the model is reused at every stage.
`g_full_history`	If TRUE, the full history is used to fit each g-model. If FALSE, the state/Markov type history is used to fit each g-model.
`save_g_functions`	If TRUE, the fitted g-functions are saved.
`q_functions`	Fitted Q-model objects, see nuisance_functions. Only valid if the Q-functions are fitted using the same policy. Preferably, use `q_models`.
`q_models`	Outcome regression models/Q-models created by `q_glm()`, `q_rf()`, `q_sl()` or similar functions. Only used for evaluation if `q_functions` is `NULL`. If a single model is provided, the model is reused at every stage.
`q_full_history`	Similar to g_full_history.
`save_q_functions`	Similar to save_g_functions.
`type`	Type of evaluation (dr/doubly robust, ipw/inverse propensity weighting, or/outcome regression).
`M`	Number of folds for cross-fitting.
`future_args`	Arguments passed to `future.apply::future_apply()`.
`name`	Character string.
`object`, `x`, `y`	Objects of class "policy_eval".
`...`	Additional arguments.
`labels`	Name(s) of the estimate(s).
`paired`	`TRUE` indicates that the estimates are based on the same data sample.

Details

Each observation has the sequential form

O= {B, U_1, X_1, A_1, ..., U_K, X_K, A_K, U_{K+1}},

for a possibly stochastic number of stages K.

B is a vector of baseline covariates.
U_k is the reward at stage k (not influenced by the action A_k).
X_k is a vector of state covariates summarizing the state at stage k.
A_k is the categorical action within the action set \mathcal{A} at stage k.

The utility is given by the sum of the rewards, i.e., U = \sum_{k = 1}^{K+1} U_k.

A policy is a set of functions

d = \{d_1, ..., d_K\},

where d_k for k\in \{1, ..., K\} maps \{B, X_1, A_1, ..., A_{k-1}, X_k\} into the action set.

Recursively define the Q-models (q_models):

Q^d_K(h_K, a_K) = E[U|H_K = h_K, A_K = a_K]

Q^d_k(h_k, a_k) = E[Q_{k+1}(H_{k+1}, d_{k+1}(B,X_1, A_1,...,X_{k+1}))|H_k = h_k, A_k = a_k].

If q_full_history = TRUE, H_k = \{B, X_1, A_1, ..., A_{k-1}, X_k\}, and if q_full_history = FALSE, H_k = \{B, X_k\}.

The g-models (g_models) are defined as

g_k(h_k, a_k) = P(A_k = a_k|H_k = h_k).

If g_full_history = TRUE, H_k = \{B, X_1, A_1, ..., A_{k-1}, X_k\}, and if g_full_history = FALSE, H_k = \{B, X_k\}. Furthermore, if g_full_history = FALSE and g_models is a single model, it is assumed that g_1(h_1, a_1) = ... = g_K(h_K, a_K).

If type = "or" policy_eval returns the empirical estimates of the value (value_estimate):

E[Q^d_1(H_1, d_1(...))]

for an appropriate input ... to the policy.

If type = "ipw" policy_eval returns the empirical estimates of the value (value_estimate) and score (IC):

E[(\prod_{k=1}^K I\{A_k = d_k(...)\} g_k(H_k, A_k)^{-1}) U].

(\prod_{k=1}^K I\{A_k = d_k(...)\} g_k(H_k, A_k)^{-1}) U - E[(\prod_{k=1}^K I\{A_k = d_k(...)\} g_k(H_k, A_k)^{-1}) U].

If type = "dr" policy_eval returns the empirical estimates of the value (value_estimate) and influence curve (IC):

E[Z^d_1],

Z^d_1 - E[Z^d_1],

where

Z^d_1 = Q^d_1(H_1 , d_1(...)) + \sum_{r = 1}^K \prod_{j = 1}^{r} \frac{I\{A_j = d_j(...)\}}{g_{j}(H_j, A_j)} \{Q_{r+1}^d(H_{r+1} , d_{r+1}(...)) - Q_{r}^d(H_r , d_r(...))\}.

Value

policy_eval() returns an object of class "policy_eval". The object is a list containing the following elements:

`value_estimate`	Numeric. The estimated value of the policy.
`type`	Character string. The type of evaluation ("dr", "ipw", "or").
`IC`	Numeric vector. Estimated influence curve associated with the value estimate.
`value_estimate_ipw`	(only if `type = "dr"`) Numeric. The estimated value of the policy based on inverse probability weighting.
`value_estimate_or`	(only if `type = "dr"`) Numeric. The estimated value of the policy based on outcome regression.
`id`	Character vector. The IDs of the observations.
`policy_actions`	data.table with keys id and stage. Actions associated with the policy for every observation and stage.
`policy_object`	(only if `policy = NULL` and `M = 1`) The policy object returned by `policy_learn`, see policy_learn.
`g_functions`	(only if `M = 1`) The fitted g-functions. Object of class "nuisance_functions".
`g_values`	The fitted g-function values.
`q_functions`	(only if `M = 1`) The fitted Q-functions. Object of class "nuisance_functions".
`q_values`	The fitted Q-function values.
`cross_fits`	(only if `M > 1`) List containing the "policy_eval" object for every (validation) fold.
`folds`	(only if `M > 1`) The (validation) folds used for cross-fitting.

S3 generics

The following S3 generic functions are available for an object of class policy_eval:

get_g_functions(): Extract the fitted g-functions.
get_q_functions(): Extract the fitted Q-functions.
get_policy(): Extract the fitted policy object.
get_policy_functions(): Extract the fitted policy function for a given stage.
get_policy_actions(): Extract the (fitted) policy actions.

plot.policy_eval(): Plot diagnostics.

References

van der Laan, Mark J., and Alexander R. Luedtke. "Targeted learning of the mean outcome under an optimal dynamic treatment rule." Journal of causal inference 3.1 (2015): 61-95. doi:10.1515/jci-2013-0022

Tsiatis, Anastasios A., et al. Dynamic treatment regimes: Statistical methods for precision medicine. Chapman and Hall/CRC, 2019. doi:10.1201/9780429192692.

Examples

library("polle")
### Single stage:
d1 <- sim_single_stage(5e2, seed=1)
pd1 <- policy_data(d1, action="A", covariates=list("Z", "B", "L"), utility="U")
pd1

# defining a static policy (A=1):
pl1 <- policy_def(1)

# evaluating the policy:
pe1 <- policy_eval(policy_data = pd1,
                   policy = pl1,
                   g_models = g_glm(),
                   q_models = q_glm(),
                   name = "A=1 (glm)")

# summarizing the estimated value of the policy:
# (equivalent to summary(pe1)):
pe1
coef(pe1) # value coefficient
sqrt(vcov(pe1)) # value standard error

# getting the g-function and Q-function values:
head(predict(get_g_functions(pe1), pd1))
head(predict(get_q_functions(pe1), pd1))

# getting the fitted influence curve (IC) for the value:
head(IC(pe1))

# evaluating the policy using random forest nuisance models:
set.seed(1)
pe1_rf <- policy_eval(policy_data = pd1,
                      policy = pl1,
                      g_models = g_rf(),
                      q_models = q_rf(),
                      name = "A=1 (rf)")

# merging the two estimates (equivalent to pe1 + pe1_rf):
(est1 <- merge(pe1, pe1_rf))
coef(est1)
head(IC(est1))

### Two stages:
d2 <- sim_two_stage(5e2, seed=1)
pd2 <- policy_data(d2,
                   action = c("A_1", "A_2"),
                   covariates = list(L = c("L_1", "L_2"),
                                     C = c("C_1", "C_2")),
                   utility = c("U_1", "U_2", "U_3"))
pd2

# defining a policy learner based on cross-fitted doubly robust Q-learning:
pl2 <- policy_learn(type = "drql",
                    control = control_drql(qv_models = list(q_glm(~C_1),
                                                            q_glm(~C_1+C_2))),
                    full_history = TRUE,
                    L = 2) # number of folds for cross-fitting

# evaluating the policy learner using 2-fold cross fitting:
pe2 <- policy_eval(type = "dr",
                   policy_data = pd2,
                   policy_learn = pl2,
                   q_models = q_glm(),
                   g_models = g_glm(),
                   M = 2, # number of folds for cross-fitting
                   name = "drql")
# summarizing the estimated value of the policy:
pe2

# getting the cross-fitted policy actions:
head(get_policy_actions(pe2))

[Package polle version 1.4 Index]