ContextualLinTSPolicy {contextual}R Documentation

Policy: Linear Thompson Sampling with unique linear models


ContextualLinTSPolicy implements Thompson Sampling with Linear Payoffs, following Agrawal and Goyal (2011). Thompson Sampling with Linear Payoffs is a contextual Thompson Sampling multi-armed bandit Policy which assumes the underlying relationship between rewards and contexts are linear. Check the reference for more details.


policy <- ContextualLinTSPolicy$new(v = 0.2)



double, a positive real value R+; Hyper-parameter for adjusting the variance of posterior gaussian distribution.



instantiates a new ContextualLinTSPolicy instance. Arguments defined in the Arguments section above.


initialization of policy parameters, utilising context_params$k (number of arms) and context_params$d (number of context features).


selects an arm based on self$theta and context, returning the index of the selected arm in action$choice. The context argument consists of a list with context$k (number of arms), context$d (number of features), and the feature matrix context$X with dimensions d x k.

set_reward(t, context, action, reward)

updates parameter list theta in accordance with the current reward$reward, action$choice and the feature matrix context$X with dimensions d x k. Returns the updated theta.


Shipra Agrawal, and Navin Goyal. "Thompson Sampling for Contextual Bandits with Linear Payoffs." Advances in Neural Information Processing Systems 24. 2011.

See Also

Core contextual classes: Bandit, Policy, Simulator, Agent, History, Plot

Bandit subclass examples: BasicBernoulliBandit, ContextualLogitBandit, OfflineReplayEvaluatorBandit

Policy subclass examples: EpsilonGreedyPolicy, ContextualLinTSPolicy


## Not run: 

horizon       <- 100L
simulations   <- 100L

bandit        <- ContextualLinearBandit$new(k = 4, d = 3, sigma = 0.3)

agents        <- list(Agent$new(EpsilonGreedyPolicy$new(0.1), bandit, "EGreedy"),
                      Agent$new(ContextualLinTSPolicyPolicy$new(0.1), bandit, "LinTSPolicy"))

simulation     <- Simulator$new(agents, horizon, simulations, do_parallel = TRUE)

history        <- simulation$run()

plot(history, type = "cumulative", rate = FALSE, legend_position = "topleft")

## End(Not run)

