policy_eval {polle}R Documentation

Policy Evaluation

Description

policy_eval() is used to estimate the value of a given fixed policy or a data adaptive policy (e.g. a policy learned from the data).

Usage

policy_eval(
  policy_data,
  policy = NULL,
  policy_learn = NULL,
  g_functions = NULL,
  g_models = g_glm(),
  g_full_history = FALSE,
  save_g_functions = TRUE,
  q_functions = NULL,
  q_models = q_glm(),
  q_full_history = FALSE,
  save_q_functions = TRUE,
  type = "dr",
  M = 1,
  future_args = list(future.seed = TRUE),
  name = NULL
)

## S3 method for class 'policy_eval'
coef(object, ...)

## S3 method for class 'policy_eval'
IC(x, ...)

## S3 method for class 'policy_eval'
vcov(object, ...)

## S3 method for class 'policy_eval'
print(x, ...)

## S3 method for class 'policy_eval'
summary(object, ...)

## S3 method for class 'policy_eval'
estimate(x, ..., labels = x$name)

## S3 method for class 'policy_eval'
merge(x, y, ..., paired = TRUE)

## S3 method for class 'policy_eval'
x + ...

Arguments

policy_data

Policy data object created by policy_data().

policy

Policy object created by policy_def().

policy_learn

Policy learner object created by policy_learn().

g_functions

Fitted g-model objects, see nuisance_functions. Preferably, use g_models.

g_models

List of action probability models/g-models for each stage created by g_empir(), g_glm(), g_rf(), g_sl() or similar functions. Only used for evaluation if g_functions is NULL. If a single model is provided and g_full_history is FALSE, a single g-model is fitted across all stages. If g_full_history is TRUE the model is reused at every stage.

g_full_history

If TRUE, the full history is used to fit each g-model. If FALSE, the state/Markov type history is used to fit each g-model.

save_g_functions

If TRUE, the fitted g-functions are saved.

q_functions

Fitted Q-model objects, see nuisance_functions. Only valid if the Q-functions are fitted using the same policy. Preferably, use q_models.

q_models

Outcome regression models/Q-models created by q_glm(), q_rf(), q_sl() or similar functions. Only used for evaluation if q_functions is NULL. If a single model is provided, the model is reused at every stage.

q_full_history

Similar to g_full_history.

save_q_functions

Similar to save_g_functions.

type

Type of evaluation (dr/doubly robust, ipw/inverse propensity weighting, or/outcome regression).

M

Number of folds for cross-fitting.

future_args

Arguments passed to future.apply::future_apply().

name

Character string.

object, x, y

Objects of class "policy_eval".

...

Additional arguments.

labels

Name(s) of the estimate(s).

paired

TRUE indicates that the estimates are based on the same data sample.

Details

Each observation has the sequential form

O= {B, U_1, X_1, A_1, ..., U_K, X_K, A_K, U_{K+1}},

for a possibly stochastic number of stages K.

The utility is given by the sum of the rewards, i.e., U = \sum_{k = 1}^{K+1} U_k.

A policy is a set of functions

d = \{d_1, ..., d_K\},

where d_k for k\in \{1, ..., K\} maps \{B, X_1, A_1, ..., A_{k-1}, X_k\} into the action set.

Recursively define the Q-models (q_models):

Q^d_K(h_K, a_K) = E[U|H_K = h_K, A_K = a_K]

Q^d_k(h_k, a_k) = E[Q_{k+1}(H_{k+1}, d_{k+1}(B,X_1, A_1,...,X_{k+1}))|H_k = h_k, A_k = a_k].

If q_full_history = TRUE, H_k = \{B, X_1, A_1, ..., A_{k-1}, X_k\}, and if q_full_history = FALSE, H_k = \{B, X_k\}.

The g-models (g_models) are defined as

g_k(h_k, a_k) = P(A_k = a_k|H_k = h_k).

If g_full_history = TRUE, H_k = \{B, X_1, A_1, ..., A_{k-1}, X_k\}, and if g_full_history = FALSE, H_k = \{B, X_k\}. Furthermore, if g_full_history = FALSE and g_models is a single model, it is assumed that g_1(h_1, a_1) = ... = g_K(h_K, a_K).

If type = "or" policy_eval returns the empirical estimates of the value (value_estimate):

E[Q^d_1(H_1, d_1(...))]

for an appropriate input ... to the policy.

If type = "ipw" policy_eval returns the empirical estimates of the value (value_estimate) and score (IC):

E[(\prod_{k=1}^K I\{A_k = d_k(...)\} g_k(H_k, A_k)^{-1}) U].

(\prod_{k=1}^K I\{A_k = d_k(...)\} g_k(H_k, A_k)^{-1}) U - E[(\prod_{k=1}^K I\{A_k = d_k(...)\} g_k(H_k, A_k)^{-1}) U].

If type = "dr" policy_eval returns the empirical estimates of the value (value_estimate) and influence curve (IC):

E[Z^d_1],

Z^d_1 - E[Z^d_1],

where

Z^d_1 = Q^d_1(H_1 , d_1(...)) + \sum_{r = 1}^K \prod_{j = 1}^{r} \frac{I\{A_j = d_j(...)\}}{g_{j}(H_j, A_j)} \{Q_{r+1}^d(H_{r+1} , d_{r+1}(...)) - Q_{r}^d(H_r , d_r(...))\}.

Value

policy_eval() returns an object of class "policy_eval". The object is a list containing the following elements:

value_estimate

Numeric. The estimated value of the policy.

type

Character string. The type of evaluation ("dr", "ipw", "or").

IC

Numeric vector. Estimated influence curve associated with the value estimate.

value_estimate_ipw

(only if type = "dr") Numeric. The estimated value of the policy based on inverse probability weighting.

value_estimate_or

(only if type = "dr") Numeric. The estimated value of the policy based on outcome regression.

id

Character vector. The IDs of the observations.

policy_actions

data.table with keys id and stage. Actions associated with the policy for every observation and stage.

policy_object

(only if policy = NULL and M = 1) The policy object returned by policy_learn, see policy_learn.

g_functions

(only if M = 1) The fitted g-functions. Object of class "nuisance_functions".

g_values

The fitted g-function values.

q_functions

(only if M = 1) The fitted Q-functions. Object of class "nuisance_functions".

q_values

The fitted Q-function values.

cross_fits

(only if M > 1) List containing the "policy_eval" object for every (validation) fold.

folds

(only if M > 1) The (validation) folds used for cross-fitting.

S3 generics

The following S3 generic functions are available for an object of class policy_eval:

get_g_functions()

Extract the fitted g-functions.

get_q_functions()

Extract the fitted Q-functions.

get_policy()

Extract the fitted policy object.

get_policy_functions()

Extract the fitted policy function for a given stage.

get_policy_actions()

Extract the (fitted) policy actions.

ps

plot.policy_eval()

Plot diagnostics.

References

van der Laan, Mark J., and Alexander R. Luedtke. "Targeted learning of the mean outcome under an optimal dynamic treatment rule." Journal of causal inference 3.1 (2015): 61-95. doi:10.1515/jci-2013-0022

Tsiatis, Anastasios A., et al. Dynamic treatment regimes: Statistical methods for precision medicine. Chapman and Hall/CRC, 2019. doi:10.1201/9780429192692.

See Also

lava::IC, lava::estimate.default.

Examples

library("polle")
### Single stage:
d1 <- sim_single_stage(5e2, seed=1)
pd1 <- policy_data(d1, action="A", covariates=list("Z", "B", "L"), utility="U")
pd1

# defining a static policy (A=1):
pl1 <- policy_def(1)

# evaluating the policy:
pe1 <- policy_eval(policy_data = pd1,
                   policy = pl1,
                   g_models = g_glm(),
                   q_models = q_glm(),
                   name = "A=1 (glm)")

# summarizing the estimated value of the policy:
# (equivalent to summary(pe1)):
pe1
coef(pe1) # value coefficient
sqrt(vcov(pe1)) # value standard error

# getting the g-function and Q-function values:
head(predict(get_g_functions(pe1), pd1))
head(predict(get_q_functions(pe1), pd1))

# getting the fitted influence curve (IC) for the value:
head(IC(pe1))

# evaluating the policy using random forest nuisance models:
set.seed(1)
pe1_rf <- policy_eval(policy_data = pd1,
                      policy = pl1,
                      g_models = g_rf(),
                      q_models = q_rf(),
                      name = "A=1 (rf)")

# merging the two estimates (equivalent to pe1 + pe1_rf):
(est1 <- merge(pe1, pe1_rf))
coef(est1)
head(IC(est1))

### Two stages:
d2 <- sim_two_stage(5e2, seed=1)
pd2 <- policy_data(d2,
                   action = c("A_1", "A_2"),
                   covariates = list(L = c("L_1", "L_2"),
                                     C = c("C_1", "C_2")),
                   utility = c("U_1", "U_2", "U_3"))
pd2

# defining a policy learner based on cross-fitted doubly robust Q-learning:
pl2 <- policy_learn(type = "drql",
                    control = control_drql(qv_models = list(q_glm(~C_1),
                                                            q_glm(~C_1+C_2))),
                    full_history = TRUE,
                    L = 2) # number of folds for cross-fitting

# evaluating the policy learner using 2-fold cross fitting:
pe2 <- policy_eval(type = "dr",
                   policy_data = pd2,
                   policy_learn = pl2,
                   q_models = q_glm(),
                   g_models = g_glm(),
                   M = 2, # number of folds for cross-fitting
                   name = "drql")
# summarizing the estimated value of the policy:
pe2

# getting the cross-fitted policy actions:
head(get_policy_actions(pe2))

[Package polle version 1.4 Index]