ab_test {abtest}R Documentation

Bayesian A/B Test

Description

Function for conducting a Bayesian A/B test (i.e., test between two proportions).

Usage

ab_test(
  data = NULL,
  prior_par = list(mu_psi = 0, sigma_psi = 1, mu_beta = 0, sigma_beta = 1),
  prior_prob = NULL,
  nsamples = 10000,
  is_df = 5,
  posterior = FALSE,
  y = NULL,
  n = NULL
)

Arguments

data

list or data frame with the data. This list (data frame) needs to contain the following elements: y1 (number of "successes" in the control condition), n1 (number of trials in the control condition), y2 (number of "successes" in the experimental condition), n2 (number of trials in the experimental condition). Each of these elements needs to be an integer. Alternatively, the user can provide for each of the elements a vector with a cumulative sequence of "successes"/trials. This allows the user to produce a sequential plot of the posterior probabilities for each hypothesis by passing the result object of class "ab" to the plot_sequential function. Sequential data can also be provided in form of a data frame or matrix that has the columns "outcome" (containing only 0 and 1 to indicate the binary outcome) and "group" (containing only 1 and 2 to indicate the group membership). Note that the data can also be provided by specifying the arguments y and n instead (not possible for sequential data).

prior_par

list with prior parameters. This list needs to contain the following elements: mu_psi (prior mean for the normal prior on the test-relevant log odds ratio), sigma_psi (prior standard deviation for the normal prior on the test-relevant log odds ratio), mu_beta (prior mean for the normal prior on the grand mean of the log odds), sigma_beta (prior standard deviation for the normal prior on the grand mean of the log odds). Each of the elements needs to be a real number (the standard deviations need to be positive). The default are standard normal priors for both the log odds ratio parameter and the grand mean of the log odds parameter.

prior_prob

named vector with prior probabilities for the four hypotheses "H1", "H+", "H-", and "H0". "H1" states that the "success" probability differs between the control and the experimental condition but does not specify which one is higher. "H+" states that the "success" proability in the experimental condition is higher than in the control condition, "H-" states that the "success" probability in the experimental condition is lower than in the control condition. "H0" states that the "success" probability is identical (i.e., there is no effect). The one-sided hypotheses "H+" and "H-" are obtained by truncating the normal prior on the log odds ratio so that it assigns prior mass only to the allowed log odds ratio values (e.g., for "H+" a normal prior that is truncated from below at 0). If NULL (default) the prior probabilities are set to c(0, 1/4, 1/4, 1/2). That is, the default assigns prior probability .5 to the hypothesis that there is no effect (i.e., "H0"). The remaining prior probability (i.e., also .5) is split evenly across the hypothesis that there is a positive effect (i.e., "H+") and the hypothesis that there is a negative effect (i.e., "H-").

nsamples

determines the number of importance samples for obtaining the log marginal likelihood for "H+" and "H-" and the number of posterior samples in case posterior = TRUE. The default is 10000.

is_df

degrees of freedom of the multivariate t importance sampling proposal density. The default is 5.

posterior

Boolean which indicates whether posterior samples should be returned. The default is FALSE.

y

integer vector of length 2 containing the number of "successes" in the control and experimental conditon

n

integer vector of length 2 containing the number of trials in the control and experimental conditon

Details

The implemented Bayesian A/B test is based on the following model by Kass and Vaidyanathan (1992, section 3):

log(p1/(1 - p1)) = \beta - \psi/2

log(p2/(1 - p2)) = \beta + \psi/2

y1 ~ Binomial(n1, p1)

y2 ~ Binomial(n2, p2).

"H0" states that \psi = 0, "H1" states that \psi != 0, "H+" states that \psi > 0, and "H-" states that \psi < 0. Normal priors are assigned to the two parameters \psi (i.e., the test-relevant log odds ratio) and \beta (i.e., the grand mean of the log odds which is a nuisance parameter). Log marginal likelihoods for "H0" and "H1" are obtained via Laplace approximations (see Kass & Vaidyanathan, 1992) which work well even for very small sample sizes. For the one-sided hypotheses "H+" and "H-" the log marginal likelihoods are obtained based on importance sampling which uses as a proposal a multivariate t distribution with location and scale matrix obtained via a Laplace approximation to the (log-transformed) posterior. If posterior = TRUE, posterior samples are obtained using importance sampling.

Value

returns an object of class "ab" with components:

Author(s)

Quentin F. Gronau

References

Kass, R. E., & Vaidyanathan, S. K. (1992). Approximate Bayes factors and orthogonal parameters, with application to testing equality of two binomial proportions. Journal of the Royal Statistical Society, Series B, 54, 129-144. doi: 10.1111/j.2517-6161.1992.tb01868.x

Gronau, Q. F., Raj K. N., A., & Wagenmakers, E.-J. (2021). Informed Bayesian Inference for the A/B Test. Journal of Statistical Software, 100. doi: 10.18637/jss.v100.i17

See Also

elicit_prior allows the user to elicit a prior based on providing quantiles for either the log odds ratio, the odds ratio, the relative risk, or the absolute risk. The resulting prior is always translated to the corresponding normal prior on the log odds ratio. The plot_prior function allows the user to visualize the prior distribution. The simulate_priors function produces samples from the prior distribution. The prior and posterior probabilities of the hypotheses can be visualized using the prob_wheel function. Parameter posteriors can be visualized using the plot_posterior function. The plot_sequential function allows the user to sequentially plot the posterior probabilities of the hypotheses (only possible if the data object contains vectors with the cumulative "successes"/trials).

Examples

# synthetic data
data <- list(y1 = 10, n1 = 28, y2 = 14, n2 = 26)

# Bayesian A/B test with default settings
ab <- ab_test(data = data)
print(ab)

# different prior parameter settings
prior_par <- list(mu_psi = 0.2, sigma_psi = 0.8,
                  mu_beta = 0, sigma_beta = 0.7)
ab2 <- ab_test(data = data, prior_par = prior_par)
print(ab2)

# different prior probabilities
prior_prob <- c(.1, .3, .2, .4)
names(prior_prob) <- c("H1", "H+", "H-", "H0")
ab3 <- ab_test(data = data, prior_prob = prior_prob)
print(ab3)

# also possible to obtain posterior samples
ab4 <- ab_test(data = data, posterior = TRUE)

# plot parameter posterior
plot_posterior(x = ab4, what = "logor")

[Package abtest version 1.0.1 Index]