policy_learn {polle} | R Documentation |
Create Policy Learner
Description
policy_learn()
is used to specify a policy learning method (Q-learning,
doubly robust Q-learning, policy tree
learning and outcome weighted learning). Evaluating the policy learner returns a policy object.
Usage
policy_learn(
type = "ql",
control = list(),
alpha = 0,
full_history = FALSE,
L = 1,
cross_fit_g_models = TRUE,
save_cross_fit_models = FALSE,
future_args = list(future.seed = TRUE),
name = type
)
## S3 method for class 'policy_learn'
print(x, ...)
## S3 method for class 'policy_object'
print(x, ...)
Arguments
type |
Type of policy learner method:
|
control |
List of control arguments. Values (and default values) are set using
|
alpha |
Probability threshold for determining realistic actions. |
full_history |
If |
L |
Number of folds for cross-fitting nuisance models. |
cross_fit_g_models |
If |
save_cross_fit_models |
If |
future_args |
Arguments passed to |
name |
Character string. |
x |
Object of class "policy_object" or "policy_learn". |
... |
Additional arguments passed to print. |
Value
Function of inherited class "policy_learn"
.
Evaluating the function on a policy_data object returns an object of
class policy_object. A policy object is a list containing all or
some of the following elements:
q_functions |
Fitted Q-functions. Object of class "nuisance_functions". |
g_functions |
Fitted g-functions. Object of class "nuisance_functions". |
action_set |
Sorted character vector describing the action set, i.e., the possible actions at each stage. |
alpha |
Numeric. Probability threshold to determine realistic actions. |
K |
Integer. Maximal number of stages. |
qv_functions |
(only if |
ptl_objects |
(only if |
ptl_designs |
(only if |
S3 generics
The following S3 generic functions are available for an object of class "policy_object":
get_g_functions()
Extract the fitted g-functions.
get_q_functions()
Extract the fitted Q-functions.
get_policy()
Extract the fitted policy object.
get_policy_functions()
Extract the fitted policy function for a given stage.
get_policy_actions()
Extract the (fitted) policy actions.
References
Doubly Robust Q-learning (type = "drql"
): Luedtke, Alexander R., and
Mark J. van der Laan. "Super-learning of an optimal dynamic treatment rule."
The international journal of biostatistics 12.1 (2016): 305-332.
doi:10.1515/ijb-2015-0052.
Policy Tree Learning (type = "ptl"
): Zhou, Zhengyuan, Susan Athey,
and Stefan Wager. "Offline multi-action policy learning: Generalization and
optimization." Operations Research (2022). doi:10.1287/opre.2022.2271.
(Augmented) Outcome Weighted Learning: Liu, Ying, et al. "Augmented
outcome‐weighted learning for estimating optimal dynamic treatment regimens."
Statistics in medicine 37.26 (2018): 3776-3788. doi:10.1002/sim.7844.
See Also
Examples
library("polle")
### Two stages:
d <- sim_two_stage(5e2, seed=1)
pd <- policy_data(d,
action = c("A_1", "A_2"),
baseline = c("BB"),
covariates = list(L = c("L_1", "L_2"),
C = c("C_1", "C_2")),
utility = c("U_1", "U_2", "U_3"))
pd
### V-restricted (Doubly Robust) Q-learning
# specifying the learner:
pl <- policy_learn(
type = "drql",
control = control_drql(qv_models = list(q_glm(formula = ~ C_1 + BB),
q_glm(formula = ~ L_1 + BB))),
full_history = TRUE
)
# evaluating the learned policy
pe <- policy_eval(policy_data = pd,
policy_learn = pl,
q_models = q_glm(),
g_models = g_glm())
pe
# getting the policy object:
po <- get_policy_object(pe)
# inspecting the fitted QV-model for each action strata at stage 1:
po$qv_functions$stage_1
head(get_policy(pe)(pd))