R: Combine parameter estimates via bootstrap

bootComb {bootComb}

R Documentation

Combine parameter estimates via bootstrap

Description

This package propagates uncertainty from several estimates when combining these estimates via a function. It does this by using the parametric bootstrap to simulate values from the distribution of each estimate to build up an empirical distribution of the combined parameter. Finally either the percentile method is used or the highest density interval is chosen to derive a confidence interval for the combined parameter with the desired coverage.

Usage

bootComb(
  distList,
  combFun,
  N = 1e+06,
  distributions = NULL,
  qLowVect = NULL,
  qUppVect = NULL,
  alphaVect = 0.05,
  Sigma = NULL,
  method = "quantile",
  coverage = 0.95,
  doPlot = FALSE,
  legPos = "topright",
  returnBootVals = FALSE,
  validRange = NULL,
  seed = NULL
)

Arguments

`distList`	If `Sigma` is set to NULL, this is a list object where each element of the list is a sampling function for a probability distribution function (i.e. like rnorm, rbeta, ...). If `Sigma` is specified, then this needs to be a list of quantile functions for the distributions for each parameter.
`combFun`	The function to combine the different estimates to a new parameter. Needs to take a single list as input argument, one element of the list for each estimate. This list input argument needs to be a list of same length as distList.
`N`	The number of bootstrap samples to take. Defaults to 1e6.
`distributions`	Alternatively to specifying `distlist`, the parameters `distributions`, `qLowVect`, `qUppVect` and (optionally) `alphaVect` can be specified. The first 3 of these need t be either all specified and be vectors of the same length or all set to NULL. The `distributions` parameter needs to be a vector specifying the names of the distributions for each parameter (one of "beta", "exponential", "gamma", "normal", "Poisson" or "NegativeBinomial").
`qLowVect`	Alternatively to specifying `distlist`, the parameters `distributions`, `qLowVect`, `qUppVect` and (optionally) `alphaVect` can be specified. The first 3 of these need t be either all specified and be vectors of the same length or all set to NULL. The `qLowVect` parameter needs to be a vector specifying the lower confidence interval limits for each parameter.
`qUppVect`	Alternatively to specifying `distlist`, the parameters `distributions`, `qLowVect`, `qUppVect` and (optionally) `alphaVect` can be specified. The first 3 of these need t be either all specified and be vectors of the same length or all set to NULL. The `qUppVect` parameter needs to be a vector specifying the upper confidence interval limits for each parameter.
`alphaVect`	Alternatively to specifying `distlist`, the parameters `distributions`, `qLowVect`, `qUppVect` and (optionally) `alphaVect` can be specified. The first 3 of these need t be either all specified and be vectors of the same length or all set to NULL. The `alphaVect` parameter needs to be a vector specifying the alpha level (i.e. 1 minus the coverage) of each confidence interval. Can be specified as a single number if the same for all parameters. Defaults to 0.05.
`Sigma`	Set to NULL if parameters are assumed to be independent (the default). If specified, this needs to be a valid covariance matrix for a multivariate normal distribution with variances equal to 1 for all variables (in other words, this really is a correlation matrix).
`method`	The method uses to derive a confidence interval from the empirical distribution of the combined parameter.Needs to be one of 'quantile' (default; uses the percentile method to derive the confidence interval) or hdi' (computes the highest density interval).
`coverage`	The desired coverage of the resulting confidence interval.Defaults to 0.95.
`doPlot`	Logical; indicates whether a graph should be produced showing the input distributions and the resulting empirical distribution of the combined estimate together with the reported confidence interval. Defaults to FALSE.
`legPos`	Legend position (only used if doPlot==TRUE); either NULL (no legend) or one of "top", "topleft", "topright", "bottom", "bottomleft", "bottomright" "left", "right", "center".
`returnBootVals`	Logical; if TRUE then the parameter values computed from the bootstrapped input parameter values will be returned; values for the individual parameters will be reported as a second list element; defaults to FALSE.
`validRange`	Optional; if not NULL, a vector of length 2 giving the range within which the values obtained from the bootstrapped input parameters must lie; values outside this range will be discarded. Behaviour that results in the need for this option arises when parameters are not independent. Use with caution.
`seed`	If desired a random seed can be specified so that the same results can be reproduced.

Value

A list with 3 elements:

`conf.int`	A vector of length 2 giving the lower and upper limits of the computed confidence interval.
`bootstrapValues`	A vector containing the computed / combined parameter values from the bootstrap samples of the input parameters. (Only non-NULL if `returnBootVals` is set to TRUE.)
`bootstrapValuesInput`	A list where each element is the vector of the bootstrapped values for the corresponding input parameter. This can be useful to check the dependence structure that was specified. (Only non-NULL if `returnBootVals` is set to TRUE.)

Examples

## Example 1 - product of 2 probability parameters for which only the 95% CIs are reported
dist1<-getBetaFromCI(qLow=0.4,qUpp=0.6,alpha=0.05)
dist2<-getBetaFromCI(qLow=0.7,qUpp=0.9,alpha=0.05)
distListEx<-list(dist1$r,dist2$r)
combFunEx<-function(pars){pars[[1]]*pars[[2]]}
bootComb(distList=distListEx,
         combFun=combFunEx,
         doPlot=TRUE,
         method="hdi",
         N=1e5, # reduced from N=1e6 so that it runs quicker; larger values => more accurate
         seed=352)

# Alternatively, the same example can be run in just 2 lines of code:
combFunEx<-function(pars){pars[[1]]*pars[[2]]}
bootComb(distributions=c("beta","beta"),
         qLowVect=c(0.4,0.7),
         qUppVect=c(0.6,0.9),
         combFun=combFunEx,
         doPlot=TRUE,
         method="hdi",
         N=1e5, # reduced from N=1e6 so that it runs quicker; larger values => more accurate
         seed=352)

## Example 2 - sum of 3 Gaussian distributions
dist1<-function(n){rnorm(n,mean=5,sd=3)}
dist2<-function(n){rnorm(n,mean=2,sd=2)}
dist3<-function(n){rnorm(n,mean=1,sd=0.5)}
distListEx<-list(dist1,dist2,dist3)
combFunEx<-function(pars){pars[[1]]+pars[[2]]+pars[[3]]}
bootComb(distList=distListEx,combFun=combFunEx,doPlot=TRUE,method="quantile")

# Compare with theoretical result:
exactCI<-qnorm(c(0.025,0.975),mean=5+2+1,sd=sqrt(3^2+2^2+0.5^2))
print(exactCI)
x<-seq(-10,30,length=1e3)
y<-dnorm(x,mean=5+2+1,sd=sqrt(3^2+2^2+0.5^2))
lines(x,y,col="red")
abline(v=exactCI[1],col="red",lty=3)
abline(v=exactCI[2],col="red",lty=3)

## Example 3 - same as Example 1 but assuming the 2 parameters to be dependent / correlated
combFunEx<-function(pars){pars[[1]]*pars[[2]]}
bootComb(distributions=c("beta","beta"),
         qLowVect=c(0.4,0.7),
         qUppVect=c(0.6,0.9),
         Sigma=matrix(byrow=TRUE,ncol=2,c(1,0.5,0.5,1)),
         combFun=combFunEx,
         doPlot=TRUE,
         method="hdi",
         N=1e5, # reduced from N=1e6 so that it runs quicker; larger values => more accurate
         seed=352)