R: Multiscale Bootstrap Resampling

scaleboot {scaleboot}

R Documentation

Multiscale Bootstrap Resampling

Description

Performs multiscale bootstrap resampling for a specified statistic.

Usage


scaleboot(dat,nb,sa,fun,parm=NULL,count=TRUE,weight=TRUE,
          cluster=NULL,onlyboot=FALSE,seed=NULL,...)

countw.assmax(x,w,ass)

countw.shtest(x,w,obs)

countw.shtestass(x,w,assobs)

Arguments

`dat`	data matrix or data-frame. Row vectors are to be resampled.
`nb`	vector of the numbers of bootstrap replicates.
`sa`	vector of scales in sigma squared (`\sigma^2`).
`fun`	function for a statistic.
`parm`	parameter to be passed to `fun` above.
`count`	logical. Should only the accumulative counts be returned? Otherwise, raw statistic vectors are returned.
`weight`	logical. In `fun` above, resampling is specified by a weight vector. Otherwise, resampling is specified by a vector of indices.
`cluster`	parallel cluster object which may be generated by function `makeCluster`.
`onlyboot`	logical. Should only bootstrap resampling be performed? Otherwise, `sbfit` or `sbconf` is called internally.
`seed`	If non NULL, random seed is set. Specifying a seed is particularly important when `cluster` is non NULL, in which case `seed + seq(along=cluster)` are set to cluster nodes.
`...`	further arguments passed to and from other methods.
`x`	data matrix or data-frame passed from `scaleboot`.
`w`	weight vector for resampling.
`ass`	a list of association vectors. An example of `parm` above.
`obs`	a vector of observed test statistics. An example of `parm` above.
`assobs`	a list of ass and obs above. An example of `parm` above.

Details

These functions are used internally by relltest for computing raw bootstrap probabilities of phylogenetic inference. Alternatively, we used pvclust to get raw bootstrap probabilities of hierarchical clustering. In other cases, users may utilize scaleboot function or prepare their own functions.

scaleboot performs multiscale bootstrap resampling for a statistic defined by fun, which should be one of the two possible forms fun(x,w,parm) and fun(x,i,parm). The former is used when weight=TRUE, and the weight vector w is generated by a multinomial distribution. The latter is used when weight=FALSE, and the index vector i is generated by resampling n' elements from \{1,...,n\}. When count=TRUE, fun should return a logical, or a vector of logicals.

Examples of fun(x,w,parm) are countw.assmax for AU p-values, countw.shtest for SH-test of trees, and countw.shtestass for SH-test of both trees and edges. The definitions are given below.

countw.assmax <- function(x,w,ass) {
  y <- maxdif(wsumrow(x,w)) <= 0 # countw.max
  if(is.null(ass)) y
  else {
    z <- vector("logical",length(ass))
    for(i in seq(along=ass)) z[i] <- any(y[ass[[i]]])
    z
  }
}

countw.shtest <- function(x,w,obs)  maxdif(wsumrow(x,w)) >= obs

countw.shtestass <- function(x,w,assobs)
  unlist(assmaxdif(wsumrow(x,w),assobs$ass)) >= assobs$obs
    
### weighted sum of row vectors
##
## x = matrix (array of row vectors)
## w = weight vector (for rows)
##
wsumrow <- function(x,w) {
  apply(w*x,2,sum)*nrow(x)/sum(w)
}

### calc max diff
##
## y[i] := max_{j neq i} x[j] - x[i]
##
maxdif <- function(x) {
  i1 <- which.max(x)  # the largest element
  x <- -x + x[i1]
  x[i1] <- -min(x[-i1])  # the second largest value
  x
}

### calc assmaxdif
##
## y[[i]][j] := max_{k neq ass[[i]]} x[k] - x[ass[[i]][j]]
##
assmaxdif <-  function(x,a) {
  y <- vector("list",length(a))
  names(y) <- names(a)
  for(i in seq(along=a))  y[[i]] <- max(x[-a[[i]]]) - x[a[[i]]]
  y
}

When count=TRUE, the summation of outputs from fun is calculated. This gives the frequencies for how many times the hypotheses are supported by the bootstrap replicates.

Value

If onlyboot=TRUE, then a list of raw results from the multiscale bootstrap resampling is returned. The components are "stat" for list vectors of outputs from fun (only when count=FALSE), "bps" for a matrix of multiscale bootstrap probabilities (only when count=FALSE), "nb" for the number of bootstrap replicates used, and "sa" for the scales used. Note that scales are redefined by sa <- nsize/round(nsize/sa), where nsize is the sample size.

If onlyboot=FALSE, then the result of a call to sbfit is returned when count=TRUE, otherwise the result of sbconf is returned when count=FALSE.

Author(s)

Hidetoshi Shimodaira

Examples

## Not run: 
## An example to calculate AU p-values for phylogenetic trees
## See also the Examples of "sbconf"
data(mam15) # load mam15.mt
sa <- 9^seq(-1,1,length=13) # parameter for multiscale bootstrap
nb <- 1000 # nb=10000 is better but slower
# Now compute bootstrap probabilities and fit models to them
sim <- scaleboot(mam15.mt,nb,sa,countw.assmax) # takes some time (< 1 min)
sim # show bootstrap probabilities and model fitting
summary(sim) # show AU p-vaslues

## End(Not run)

## Not run: 
## The following lines are only for illustration purpose
## a line from the definition of relltest
scaleboot(dat,nb,sa,countw.assmax,ass,cluster=cluster,
                 names.hp=na,nofit=nofit,models=models,seed=seed)

## two lines from rell.shtest (internal function)
scaleboot(z,nb,1,countw.shtest,tobs,cluster=cluster,
                 onlyboot=TRUE,seed=seed)
scaleboot(z,nb,1,countw.shtestass,pa,cluster=cluster,
                 onlyboot=TRUE,seed=seed)

## End(Not run)

[Package scaleboot version 1.0-1 Index]