bayesboot {bayesboot} | R Documentation |
The Bayesian bootstrap
Description
Performs a Bayesian bootstrap and returns a data.frame
with a sample
of size R
representing the posterior distribution of the (possibly
multivariate) summary statistic
.
Usage
bayesboot(data, statistic, R = 4000, R2 = 4000, use.weights = FALSE,
.progress = "none", .parallel = FALSE, ...)
Arguments
data |
Either a vector or a list, or a matrix or a data.frame with one
datapoint per row. The format of |
statistic |
A function implementing the summary statistic of interest
where the first argument should take the data. If |
R |
The size of the posterior sample from the Bayesian bootstrap. |
R2 |
When |
use.weights |
When |
.progress |
The type of progress bar ("none", "text", "tk", and "win").
See the |
.parallel |
If |
... |
Other arguments passed on to |
Details
The summary statistic is a function of the data that represents a feature of
interest, where a typical statistic is the mean. In bayesboot
it is
most efficient to define the statistic as a function taking the data as the
first argument and a vector of weights as the second argument. An example of
such a function is weighted.mean
. Indicate that you are using a
statistic defined in this way by setting use.weights = TRUE
.
It is also possible to define the statistic as a function only taking data
(and no weights) by having use.weights = FALSE
(the default). This
will, for each of the R
Bayesian bootstrap draws, give a resampled
version of the data
of size R2
to statistic
. This will
be much slower than using use.weights = TRUE
but will work with a
larger range of statistics (the median
, for example)
For more information regarding this implementation of the Bayesian bootstrap see the blog post Easy Bayesian Bootstrap in R. For more information about the model behind the Bayesian bootstrap see the blog post The Non-parametric Bootstrap as a Bayesian Model and, of course, the original Bayesian bootstrap paper by Rubin (1981).
Value
A data.frame
with R
rows, each row being a draw from
the posterior distribution of the Bayesian bootstrap. The number of columns
is decided by the length of the output from statistic
. If
statistic
does not return a vector or data frame with named values
then the columns will be given the names V1
, V2
, V3
,
etc. While the output is a data.frame
it has subclass
bayesboot
which enables specialized summary
and
plot
functions for the result of a bayesboot
call.
Note
While
R
andR2
are set to4000
by default, that should not be taken to indicate that a sample of size 4000 is sufficient nor recommended.When using
use.weights = FALSE
it is important to use a summary statistic that does not depend on the sample size. That is, doubling the size of a dataset by cloning data should result in the same statistic as when using the original dataset. An example of a statistic that depends on the sample size is the sample standard deviation (that is,sd
), and when usingbayesboot
it would make more sense to use the population standard deviation (as in the example below).
References
Miller, R. G. (1974) The jackknife - a review. Biometrika, 61(1), 1–15.
Rubin, D. B. (1981). The Bayesian bootstrap. The annals of statistics, 9(1), 130–134.
Examples
### A Bayesian bootstrap analysis of a mean ###
# Heights of the last ten American presidents in cm (Kennedy to Obama).
heights <- c(183, 192, 182, 183, 177, 185, 188, 188, 182, 185);
b1 <- bayesboot(heights, mean)
# But it's more efficient to use the a weighted statistic.
b2 <- bayesboot(heights, weighted.mean, use.weights = TRUE)
# The result of bayesboot can be plotted and summarized
plot(b2)
summary(b2)
# It can also be easily post processed.
# Here the probability that the mean is > 182 cm.
mean( b2[,1] > 182)
### A Bayesian bootstrap analysis of a SD ###
# When use.weights = FALSE it is important that the summary statistics
# does not change as a function of sample size. This is the case with
# the sample standard deviation, so here we have to implement a
# function calculating the population standard deviation.
pop.sd <- function(x) {
n <- length(x)
sd(x) * sqrt( (n - 1) / n)
}
b3 <- bayesboot(heights, pop.sd)
summary(b3)
### A Bayesian bootstrap analysis of a correlation coefficient ###
# Data comparing two methods of measuring blood flow.
# From Table 1 in Miller (1974) and used in an example
# by Rubin (1981, p. 132).
blood.flow <- data.frame(
dye = c(1.15, 1.7, 1.42, 1.38, 2.80, 4.7, 4.8, 1.41, 3.9),
efp = c(1.38, 1.72, 1.59, 1.47, 1.66, 3.45, 3.87, 1.31, 3.75))
# Using the weighted correlation (corr) from the boot package.
library(boot)
b4 <- bayesboot(blood.flow, corr, R = 1000, use.weights = TRUE)
hist(b4[,1])
### A Bayesian bootstrap analysis of lm coefficients ###
# A custom function that returns the coefficients of
# a weighted linear regression on the blood.flow data
lm.coefs <- function(d, w) {
coef( lm(efp ~ dye, data = d, weights = w) )
}
b5 <- bayesboot(blood.flow, lm.coefs, R = 1000, use.weights = TRUE)
# Plotting the marginal posteriors
plot(b5)
# Plotting a scatter of regression lines from the posterior
plot(blood.flow)
for(i in sample(nrow(b5), size = 20)) {
abline(coef = b5[i, ], col = "grey")
}