R: Fitting Models to Bootstrap Probabilities

sbfit {scaleboot}

R Documentation

Fitting Models to Bootstrap Probabilities

Description

sbfit is used to fit parametric models to multiscale bootstrap probabilities by the maximum likelihood method.

Usage

sbfit(x, ...)

## Default S3 method:
sbfit(x,nb,sa,models=NULL,nofit=FALSE,bpm=NULL,sam=NULL,...)

## S3 method for class 'matrix'
sbfit(x,nb,sa,models=NULL,names.hp=rownames(x),
  bpms=NULL,sam=NULL,nofit=FALSE,cluster=NULL,...)

## S3 method for class 'data.frame'
sbfit(x,...)

## S3 method for class 'scaleboot'
sbfit(x,models=names(x$fi),...)

## S3 method for class 'scalebootv'
sbfit(x,models=attr(x,"models"),...)

## S3 method for class 'scaleboot'
print(x,sort.by=c("aic","none"),...)

## S3 method for class 'scalebootv'
print(x,...)

Arguments

`x`	an object used to select a method. For `sbfit.default`, `x` is denoted by `nb` and is a vector of bootstrap probabilities for a hypothesis. For `sbfit.matrix`, `x` is denoted by `bps` and is a matrix with row vectors of `bp` for several hypotheses.
`nb`	vector of numbers of bootstrap replicates. A short vector (or scalar) is cyclically extended to match the size of `bp`.
`sa`	vector of scales in sigma squared (`\sigma^2`). Should be the same size as `bp`.
`models`	character vector of model names. Valid model names are `poly.m` for m>=1 and `sing.m` for m>=3. The default is set by `sboptions()$models`, whose default is c("poly.1","poly.2","poly.3","sing.3","sphe.3"). If `models` is an integer value, `sbmodelnames(m=models)` is used.
`nofit`	logical. If TRUE, fitting is not performed.
`bpm`	(experimental: bootstrap probabilities for 2-step bootstrap)
`sam`	(experimental: scales for 2-step bootstrap)
`bpms`	(experimental: bootstrap probabilities for 2-step bootstrap)
`names.hp`	character vector of hypotheses names.
`cluster`	parallel cluster object which may be generated by function `makeCluster`.
`sort.by`	sort key.
`...`	further arguments passed to and from other methods.

Details

sbfit.default fits parametric models to bp by maximizing the log-likelihood value of a binomial model. A set of multiscale bootstrap resampling should be performed before a call to sbfit for preparing bp, where bp[i] is a bootstrap probability of a hypothesis calculated with a number of bootstrap replicates nb[i] and a scale \sigma^2=sa[i]. The scale is defined as \sigma^2=n/n', where n is the sample size of data, and n' is the sample size of replicated data for bootstrap resampling.

Each model specifies a psi(beta,s)=\psi(\sigma^2 | \beta) function with a parameter vector \beta. The model may describe how the bootstrap probability changes along the scale. Let cnt[i]=bp[i]*nb[i] be the frequency indicating how many times the hypothesis of interest is observed in bootstrap replicates at scale sa[i]. Then we assume that cnt[i] is binomially distributed with number of trials nb[i] and success probability 1-pnorm(psi(beta,s=sa[i])/sqrt(sa[i])). Currently, sbpsi.poly and sbpsi.sing are available as \psi functions. The estimated model parameters are accessed by the coef.scaleboot method.

The model fitting is performed in the order specified in models, and the initial values for numerical optimization of the likelihood function are prepared by using previously estimated model parameters. Thus, "poly.(m-1)" should be specified before "poly.m", and "poly.(m-1)" and "sing.(m-1)" should be specified before "sing.m".

sbfit.matrix calls sbfit.default repeatedly, once for each row vector bp of the matrix bps. Parallel computing is performed when cluster is non NULL.

sbfit.scaleboot calls sbfit.default with bp, nb, and sa components in x object for refitting by giving another models argument. It discards the previous result of fitting, and recomputes the model parameters.

sbfit.scalebootv calls sbfit.matrix with the bps, nb, and sa components in the attributes of x.

Value

sbfit.default and sbfit.scaleboot return an object of class "scaleboot", and sbfit.matrix and sbfit.scalebootv return an object of class "scalebootv".

An object of class "scaleboot" is a list containing at least the following components:

`bp`	the vector of bootstrap probabilities used.
`nb`	the `rep(nb,length=length(bp))` used.
`sa`	the `sa` used.
`fi`	list vector of fitted results for `models` used. Each list consists of components `"par"` (estimated parameter), `"mag"` (magnification factor for `"par"` to make the actual parameter vector `beta=parmag`), `"value"` (maximum log-likelihood), `"hessian"` (hessian matrix), `"var"` (variance estimate of `"par"`), `"mask"` (logical vector indicating parameter elements which are not at boundaries), `"init"` (initial values used for optimization), `"psi"` (psi function name of the model), `"df"` (degrees of freedom), `"rss"` (equivalent to the residual sum of squares, but actually defined as 2(lik0-lik) where lik0 and lik are the log-likelihood function of the non-restricted model and the model of interest, respectively), `"pfit"` (p-value for `"rss"`), `"aic"` (aic value of the model relative to the non-restricted model).

An object of class "scalebootv" is a vector of "scaleboot" objects, and in addition, it has attributes "models", "bps", "nb", and "sa".

Author(s)

Hidetoshi Shimodaira <shimo@i.kyoto-u.ac.jp>

References

Shimodaira, H. (2002). An approximately unbiased test of phylogenetic tree selection, Systematic Biology, 51, 492-508.

Shimodaira, H. (2004). Approximately unbiased tests of regions using multistep-multiscale bootstrap resampling, Annals of Statistics, 32, 2616-2641.

Shimodaira, H. (2008). Testing Regions with Nonsmooth Boundaries via Multiscale Bootstrap, Journal of Statistical Planning and Inference, 138, 1227-1241. (http://dx.doi.org/10.1016/j.jspi.2007.04.001).

Examples

## Testing a hypothesis
## Examples of fitting models to a vector of bp's
## mam15.relltest$t4 of data(mam15), but
## using a different set of scales (sigma^2 values).
## In the below, sigma^2 ranges 0.01 to 100 in sa[i]
## This very large range is only for illustration.
## Typically, the range around 0.1 to 10
## is recommended for much better model fitting.
## In other examples, we have used
## sa = 9^seq(-1,1,length=13).

cnt <- c(0,0,0,0,6,220,1464,3565,5430,6477,6754,
         6687,5961) # observed frequencies at scales
nb <- 100000 # number of replicates at each scale
bp <- cnt/nb # bootstrap probabilities (bp's)
sa <- 10^seq(-2,2,length=13) # scales (sigma squared)
## model fitting to bp's 
f <- sbfit(bp,nb,sa) # model fitting ("scaleboot" object)
f # print the result of fitting
plot(f,legend="topleft") # observed bp's and fitted curves
## approximately unbiased p-values
summary(f) # calculate and print p-values
## refitting with models up to "poly.4" and "sing.4"
f <- sbfit(f,models=1:4)
f # print the result of fitting
plot(f,legend="topleft") # observed bp's and fitted curves
summary(f) # calculate and print p-values

## Not run: 
## Testing multiple hypotheses (only two here)
## Examples of fitting models to vectors of bp's
## mam15.relltest[c("t1,t2")]
cnt1 <- c(85831,81087,76823,72706,67946,62685,57576,51682,
       45887,41028,35538,31232,27832)  # cnt for "t1"
cnt2 <- c(2,13,100,376,975,2145,3682,5337,7219,8559,
       10069,10910,11455)  # cnt for "t2"
cnts <- rbind(cnt1,cnt2)
nb <- 100000 # number of replicates at each scale
bps <- cnts/nb # row vectors are bp's
sa <- 9^seq(-1,1,length=13) # scales (sigma squared)
fv <- sbfit(bps,nb,sa) # returns a "scalebootv" object
fv # print the result of fitting
plot(fv) # multiple plots
summary(fv) # calculate and print p-values

## End(Not run)

[Package scaleboot version 1.0-1 Index]