BCE {BCE} | R Documentation |
Bayesian Composition Estimator
Description
this function is now superseded by the alternative
link{bce1}
.
estimates probability distributions of a sample composition
based on an input ratio matrix, Rat
, containing
biomarker ratios in (field) samples, and an input data matrix,
Dat
, containing the biomarker ratios for several taxonomic groups
Usage
BCE(Rat, Dat, relsdRat = 0, abssdRat = 0, minRat = 0,
maxRat = +Inf, relsdDat = 0, abssdDat = 0, tol = 1e-4, tolX = 1e-4,
positive = 1:ncol(Rat), iter = 100, outputlength = 1000,
burninlength = 0, jmpRat = 0.01, jmpX = 0.01, unif = FALSE,
verbose = TRUE, initRat = Rat, initX = NULL, userProb = NULL,
confInt = 2/3, export = FALSE, file = "BCE")
Arguments
Rat |
initial ratio matrix. Each row of |
Dat |
initial data matrix. Each row of |
relsdRat |
relative standard deviation on ratio matrix. Either one
number or a matrix with the same dimensions as |
abssdRat |
absolute standard deviation on ratio matrix. Either one
number or a matrix with the same dimensions as |
minRat |
minimum values of ratio matrix. Either one number or a
matrix with the same dimensions as |
maxRat |
maximum values of ratio matrix. Either one number or a
matrix with the same dimensions as |
relsdDat |
relative standard deviation on data matrix. Either one
number or a matrix with the same dimensions as |
abssdDat |
absolute standard deviation on data matrix. Either one
number or a matrix with the same dimensions as |
tol |
minimum standard deviation for data matrix |
tolX |
minimum x values. Used for MCMC initiation. One value. |
positive |
A vector containing numbers of columns that should
contain strictly positive data. Only these columns are rescaled. The
other columns (not in |
iter |
number of iterations for MCMC. |
outputlength |
number of iterations kept in the output. |
burninlength |
number of initial iterations to be removed from output. |
jmpRat |
jump length of the ratio matrix |
jmpX |
jump length of the composition matrix (in a
simplex). Either one number, a vector of length equal to the number of
taxa (number of rows in |
unif |
logical; if TRUE a uniform distribution for ratio matrix is used. This is similar as in chemtax. |
verbose |
logical; if TRUE, extra information is provided during the run of the function, such as extra warnings, elapsed time and expected time until the end of the MCMC. |
initRat |
ratio matrix used to start the markov chain: defaults to the initial ratio matrix. |
initX |
composition matrix used to start the markov chain: default the LSEI solution of Ax=B. |
userProb |
function taking two arguments: ratio matrix RAT and composition matrix X, and returning the posterior probability. Dependence of the probability on the data should be incorporated in the function. If not specified, the default probability distribution is the product of a non-informative distribution on the composition matrix, and gamma distributions for the ratio matrix and the data given the model output. |
confInt |
confidence interval in output; because the distributions may not be symmetrical, standard deviations are not always a useful measure; instead, upper and lower boundaries of the given confidence interval are given. Default is 2/3, i.e there is a probability of 0.66 for a value to be contained within the interval. |
export |
logical; if |
file |
Only if |
Details
The function BCE
searches probability distributions for all
elements of a taxonomical composition matrix X
and a ratio
matrix Rat
for which:
X\%*\%Rat \simeq Dat
It does this by returning iter
samples for X and Rat, organized
in three-dimensional arrays. The input
data matrix Dat
and ratio matrix Rat
should be
in the following formats, with the relative concentrations per
biomarker organized in columns:
data matrix:
marker1 | marker2 | marker3 | marker4 | |
sample1 | 0.14 | 0.005 | 0.35 | 0.033 |
sample2 | 0.15 | 0.004 | 0.36 | 0.034 |
sample3 | 0.13 | 0.004 | 0.31 | 0.030 |
sample4 | 0.13 | 0.005 | 0.33 | 0.031 |
sample5 | 0.14 | 0.008 | 0.33 | 0.036 |
sample6 | 0.11 | 0.082 | 0.34 | 0.044 |
and ratio matrix:
marker1 | marker2 | marker3 | marker4 | |
species1 | 0.27 | 0.13 | 0.35 | 0.076 |
species2 | 0.084 | 0 | 0.5 | 0.24 |
species3 | 0.195 | 0.3 | 0 | 0.1 |
species4 | 0.06 | 0 | 0 | 0 |
species5 | 0 | 0 | 0 | 0 |
species6 | 0 | 0 | 0 | 0 |
Value
A bce (bayesian compositional estimator) object; a list containing 4 elements
Rat |
Array with dimension c(nrow( |
X |
Array with dimension c(nrow( |
logp |
vector with length |
naccepted |
integer indicating the number of runs that were accepted. |
Note
Producing sensible output:
Markov Chain Monte Carlo simulations are not as straightforward as one might wish; several preliminary runs might be necessary to determine the desired number of iterations, burn-in length and jump length. For all estimated values of Rat and X, their trace (evolution of the values over all iterations) has to display random behaviour; no obvious trends should appear. A few parameters can be tuned to obtain such behaviour:
jump length The jump length determines how big the jumps are for each step in the random walk. A longer jump length will make you jump around faster in the parameter space, but acceptance of new points can get very low. Smaller jump lengths increase the acceptance rate, but the algorithm will move too slowly, and a lot more runs will be needed to scan the whole parameter space. A good way to find a good jump length, is look at the number of points accepted. If the output is saved under the name
MCMC
, you can find the number of accepted points underMCMC$naccepted
. It is also given if you run the model withverbose=TRUE
(default). This value should be somewhere between 5% and 40%. For long runs, 5 % can be acceptable, for short runs, you will prefer a higher acceptance in order to have enough different points. 20% accepted is usually a good number. Do some preliminary runs withiter=1000-10000
and tune the jump length parametersjmpRat
andjmpX
. You can set different jump lengths for each column of the ratio matrix, or 1 jump length for the whole ratio matrix, and 1 jump length for the composition matrix. Decreasing the jump lengths will generally increase the acceptance rate and vice versa. Also the mixing rate (the speed with which accepted points change their values) will be influenced. You want this mixing rate to be as high as possible, whilst maintaining enough accepted points.burninlength The program uses the solution of lsei using the original ratio matrix as starting values for the MCMC. This might in some cases be far from the optimal solution, and the MCMC algorithm will start with moving towards this optimal solution. This is called a burn-in. When there is a slow mixing rate, this can take a considerable number of cycles. As it can influence the averages and standard deviations, you might want to remove it from the mcmc objects. By defining a burnin length, the first '
burninlength
' cycles will not be written to the output. Look at some plots to determine if you need to specify a burnin length.iter the number of iterations: start with 10000 runs or less; check the output and estimate how many runs you will need to get a random pattern in the output.
Author(s)
Karel Van den Meersche <karel.van_den_meersche@cirad.fr>, Karline Soetaert <karline.soetaert@nioz.nl>.
References
Van den Meersche, K., K. Soetaert and J.J. Middelburg (2008) A Bayesian compositional estimator for microbial taxonomy based on biomarkers, Limnology and Oceanography Methods 6, 190-199
See Also
summary.bce
, plot.bce
,
export.bce
, pairs.bce
Examples
##====================================
# example using bceInput data
# first try
X <- BCE(bceInput$Rat,bceInput$Dat,relsdRat=.2,relsdDat=.2,
iter=1000,outputlength=5000,jmpX=.01,jmpRat=.01)
## the number of accepted runs is too low;
## we play around with the jump lengths jmpX and jmpRat
X <- BCE(bceInput$Rat,bceInput$Dat,relsdRat=.2,relsdDat=.2,
iter=1000,outputlength=5000,jmpX=.02,jmpRat=.002)
## we inspect the output:
plot(X)
## For every element of X and Rat, we want to obtain a well-mixed,
## random trace. In this case, mixing is still a little poor.
## to optimize mixing in the ratio matrix, it is a good idea
## to make the jump length linear to the ratio matrix
## standard deviation (sdrat=.2*rat) :
X <- BCE(bceInput$Rat,bceInput$Dat,relsdRat=.2,relsdDat=.2,
iter=1000,outputlength=5000,jmpX=.02,
jmpRat=.2*(.2*bceInput$Rat))
plot(X)
## mixing improved a lot; we repeat the run with more iterations
## to improve the reliability of the results.
## the following run can take a few minutes - so it is toggled off
#X <- BCE(bceInput$Rat,bceInput$Dat,relsdRat=.2,relsdDat=.2,
# iter=100000,outputlength=5000,jmpX=.02,
# jmpRat=.2*(.2*bceInput$Rat))
#plot(X)
## you can see in the plots that traces for all elements of Rat and X
## are well-mixed. This run was saved in "bceOutput"
Sum <-summary(bceOutput)
# show results as mean with ranges
print(Sum$meanX)
# plot estimated means and ranges (lbX=lower, ubX=upper bound)
xlim <- range(c(Sum$lbX,Sum$ubX))
# first the mean
dotchart(x=t(Sum$meanX),xlim=xlim,
main="Taxonomic composition",
sub="using bce",pch=16)
# then ranges
nr <- nrow(Sum$meanX)
nc <- ncol(Sum$meanX)
for (i in 1:nr)
{ip <-(nr-i)*(nc+2)+1
cc <- ip : (ip+nc-1)
segments(t(Sum$lbX[i,]),cc,t(Sum$ubX[i,]),cc)
}
# show results as pairs plot
pairs(bceOutput,sample=3,main="Station 3")