bootf2 {bootf2} | R Documentation |
Estimate 90% Confidence Intervals of f_2
with Bootstrap Methodology
Description
Main function to estimate 90% confidence intervals of f_2
using
bootstrap methodology.
Usage
bootf2(test, ref, path.in, file.in, path.out, file.out,
n.boots = 10000L, seed = 306L, digits = 2L, alpha = 0.05,
regulation = c("EMA", "FDA", "WHO","Canada", "ANVISA"),
min.points = 1L, both.TR.85 = FALSE, print.report = TRUE,
report.style = c("concise", "intermediate", "detailed"),
f2.type = c("all", "est.f2", "exp.f2", "bc.f2",
"vc.exp.f2", "vc.bc.f2"),
ci.type = c("all", "normal", "basic", "percentile",
"bca.jackknife", "bca.boot"),
quantile.type = c("all", as.character(1:9), "boot"),
jackknife.type = c("all", "nt+nr", "nt*nr", "nt=nr"),
time.unit = c("min", "h"), output.to.screen = FALSE,
sim.data.out = FALSE)
Arguments
test , ref |
Data frames of dissolution profiles of test and reference
product if |
path.in , file.in , path.out , file.out |
Character strings of input and output directories and file names. See Input/Output in Details. |
n.boots |
An integer indicating the number of bootstrap samples. |
seed |
Integer seed value for reproducibility. If missing, a random seed will be generated for reproducibility purpose. |
digits |
An integer indicating the decimal points for the output. |
alpha |
A numeric value between 0 and 1 to estimate
|
regulation |
Character strings indicating regulatory guidelines.
@seealso |
min.points |
An integer indicating the minimum time points to be used
to calculate |
both.TR.85 |
Logical. If |
print.report |
Logical. If |
report.style |
|
f2.type |
Character strings indicating which type of |
ci.type |
Character strings indicating which type of confidence interval should be estimated. See Types of confidence intervals in Details. |
quantile.type |
Character strings indicating the type of percentile. |
jackknife.type |
Character strings indicating the type of jackknife method. See Details. |
time.unit |
Character strings indicating the unit of time. It should
be either |
output.to.screen |
Logical. If |
sim.data.out |
Logical. If |
Details
Minimum required arguments that must be provided by the user
Arguments test
and ref
must be provided by the user. They should be R
data frames
, with time as the first column, and all individual profiles
profiles as the rest columns. The actual names of the columns do not matter
since they will be renamed internally.
Input/Output
The dissolution data of test and reference product can either be provided as
data frames for test
and ref
, as explained above, or be read from an
Excel file with data of test and reference stored in separate worksheets.
In the latter case, the argument path.in
, the directory where the Excel
file is, and file.in
, the name of the Excel file including the file
extension .xls
or .xlsx
, must be provided. In such case, the argument
test
and ref
must be the names of the worksheets in quotation marks.
The first column of each Excel worksheet must be time, and the rest columns
are individual dissolution profiles. The first row should be column names,
such as time, unit01, unit02, ... The actual names of the columns do not
matter as they will be renamed internally.
Arguments path.out
and file.out
are the names of the output directory
and file. If they are not provided, but argument print.report
is TRUE
,
then a plain text report will be generated automatically in the current
working directory with file name test_vs_ref_TZ_YYYY-MM-DD_HHMMSS.txt
,
where test
and ref
are data set names of test and reference, TZ
is the
time zone such as CEST
, YYYY-MM-DD
is the numeric date format and
HHMMSS
is the numeric time format for hour, minute, and second.
For a quick check, set argument output.to.screen = TRUE
, a summary report
very similar to concise
style report will be printed on screen.
Types of Estimators
According to Shah et al, the population f_2
for the inference is
f_2 = 100-25\log\left(1 + \frac{1}{P}\sum_{i=1}^P%
\left(\mu_{\mathrm{T},i} - \mu_{\mathrm{R},i}\right)^2 \right)\,,
where P
is the number of time points; \mu_{\mathrm{T},i}
and \mu_{\mathrm{R},i}
are population mean of test and
reference product at time point i
, respectively; \sum_{i=1}^P
is the summation from i = 1
to P
.
Five estimators for f_2
are included in the function:
The estimated
f_2
, denoted by\hat{f}_2
, is the one written in various regulatory guidelines. It is expressed differently, but mathematically equivalently, as\hat{f}_2 = 100-25\log\left(1 + \frac{1}{P}\sum_{i=1}^P\left(% \bar{X}_{\mathrm{T},i} - \bar{X}_{\mathrm{R},i}\right)^2 \right)\:,
where
P
is the number of time points;\bar{X}_{\mathrm{T},i}
and\bar{X}_{\mathrm{R},i}
are mean dissolution data at thei
th time point of random samples chosen from the test and the reference population, respectively. Compared to the equation of populationf_2
above, the only difference is that in the equation of\hat{f}_2
the sample means of dissolution profiles replace the population means for the approximation. In other words, a point estimate is used for the statistical inference in practice.The Bias-corrected
f_2
, denoted by\hat{f}_{2,\mathrm{bc}}
, was described in the article of Shah et al, as\hat{f}_{2,\mathrm{bc}} = 100-25\log\left(1 + \frac{1}{P}% \left(\sum_{i=1}^P\left(\bar{X}_{\mathrm{T},i} - % \bar{X}_{\mathrm{R},i}\right)^2 - \frac{1}{n}\sum_{i=1}^P% \left(S_{\mathrm{T},i}^2 + S_{\mathrm{R},i}^2\right)\right)\right)\,,
where
S_{\mathrm{T},i}^2
andS_{\mathrm{R},i}^2
are unbiased estimates of variance at thei
th time points of random samples chosen from test and reference population, respectively; andn
is the sample size.The variance- and bias-corrected
f_2
, denoted by\hat{f}_{2,\mathrm{vcbc}}
, does not assume equal weight of variance as\hat{f}_{2,\mathrm{bc}}
does.\hat{f}_{2, \mathrm{vcbc}} = 100-25\log\left(1 +% \frac{1}{P}\left(\sum_{i=1}^P \left(\bar{X}_{\mathrm{T},i} -% \bar{X}_{\mathrm{R},i}\right)^2 - \frac{1}{n}\sum_{i=1}^P% \left(w_{\mathrm{T},i}\cdot S_{\mathrm{T},i}^2 +% w_{\mathrm{R},i}\cdot S_{\mathrm{R},i}^2\right)\right)\right)\,,
where
w_{\mathrm{T},i}
andw_{\mathrm{R},i}
are weighting factors for variance of test and reference products, respectively, which can be calculated as follows:w_{\mathrm{T},i} = 0.5 + \frac{S_{\mathrm{T},i}^2}% {S_{\mathrm{T},i}^2 + S_{\mathrm{R},i}^2}\,,
and
w_{\mathrm{R},i} = 0.5 + \frac{S_{\mathrm{R},i}^2}% {S_{\mathrm{T},i}^2 + S_{\mathrm{R},i}^2}\,.
The expected
f_2
, denoted by\hat{f}_{2, \mathrm{exp}}
, is calculated based on the mathematical expectation of estimatedf_2
,\hat{f}_{2, \mathrm{exp}} = 100-25\log\left(1 + \frac{1}{P}% \left(\sum_{i=1}^P\left(\bar{X}_{\mathrm{T},i} - % \bar{X}_{\mathrm{R},i}\right)^2 + \frac{1}{n}\sum_{i=1}^P% \left(S_{\mathrm{T},i}^2 + S_{\mathrm{R},i}^2\right)\right)\right)\,,
using mean dissolution profiles and variance from samples for the approximation of population values.
The variance-corrected expected
f_2
, denoted by\hat{f}_{2, \mathrm{vcexp}}
, is calculated as\hat{f}_{2, \mathrm{vcexp}} = 100-25\log\left(1 +% \frac{1}{P}\left(\sum_{i=1}^P \left(\bar{X}_{\mathrm{T},i} -% \bar{X}_{\mathrm{R},i}\right)^2 + \frac{1}{n}\sum_{i=1}^P% \left(w_{\mathrm{T},i}\cdot S_{\mathrm{T},i}^2 +% w_{\mathrm{R},i}\cdot S_{\mathrm{R},i}^2\right)\right)\right)\,.
Types of Confidence Interval
The following confidence intervals are included:
The Normal interval with bias correction, denoted by
normal
in the function, is estimated according to Davison and Hinkley,\hat{f}_{2, \mathrm{L,U}} = \hat{f}_{2, \mathrm{S}} - E_B \mp % \sqrt{V_B}\cdot Z_{1-\alpha}\,,
where
\hat{f}_{2, \mathrm{L,U}}
are the lower and upper limit of the confidence interval estimated from bootstrap samples;\hat{f}_{2, \mathrm{S}}
denotes the estimators described above;Z_{1-\alpha}
represents the inverse of standard normal cumulative distribution function with type I error\alpha
;E_B
andV_B
are the resampling estimates of bias and variance calculated asE_B = \frac{1}{B}\sum_{b=1}^{B}\hat{f}_{2,b}^\star - % \hat{f}_{2, \mathrm{S}} = \bar{f}_2^\star - \hat{f}_{2,\mathrm{S}}\,,
and
V_B = \frac{1}{B-1}\sum_{b=1}^{B} \left(\hat{f}_{2,b}^\star -\bar{f}_2^\star\right)^2\,,
where
B
is the number of bootstrap samples;\hat{f}_{2,b}^\star
is thef_2
estimate withb
th bootstrap sample, and\bar{f}_2^\star
is the mean value.The basic interval, denoted by
basic
in the function, is estimated according to Davison and Hinkley,\hat{f}_{2, \mathrm{L}} = 2\hat{f}_{2, \mathrm{S}} -% \hat{f}_{2,(B+1)(1-\alpha)}^\star\,,
and
\hat{f}_{2, \mathrm{U}} = 2\hat{f}_{2, \mathrm{S}} -% \hat{f}_{2,(B+1)\alpha}^\star\,,
where
\hat{f}_{2,(B+1)\alpha}^\star
and\hat{f}_{2,(B+1)(1-\alpha)}^\star
are the(B+1)\alpha
th and(B+1)(1-\alpha)
th ordered resampling estimates off_2
, respectively. When(B+1)\alpha
is not an integer, the following equation is used for interpolation,\hat{f}_{2,(B+1)\alpha}^\star = \hat{f}_{2,k}^\star + % \frac{\Phi^{-1}\left(\alpha\right)-\Phi^{-1}\left(\frac{k}{B+1}\right)}% {\Phi^{-1}\left(\frac{k+1}{B+1}\right)-\Phi^{-1}% \left(\frac{k}{B+1}\right)}\left(\hat{f}_{2,k+1}^\star-% \hat{f}_{2,k}^\star\right),
where
k
is the integer part of(B+1)\alpha
,\hat{f}_{2,k+1}^\star
and\hat{f}_{2,k}^\star
are the(k+1)
th and thek
th ordered resampling estimates off_2
, respectively.The percentile intervals, denoted by
percentile
in the function, are estimated using nine different types of quantiles, Type 1 to Type 9, as summarized in Hyndman and Fan's article and implemented inR
'squantile
function. UsingR
'sboot
package, programbootf2BCA
outputs a percentile interval using the equation above for interpolation. To be able to compare the results among different programs, the same interval, denoted byPercentile (boot)
in the function, is also included in the function.The bias-corrected and accelerated (BCa) intervals are estimated according to Efron and Tibshirani,
\hat{f}_{2, \mathrm{L}} = \hat{f}_{2, \alpha_1}^\star\,,
\hat{f}_{2, \mathrm{U}} = \hat{f}_{2, \alpha_2}^\star\,,
where
\hat{f}_{2, \alpha_1}^\star
and\hat{f}_{2, \alpha_2}^\star
are the100\alpha_1
th and the100\alpha_2
th percentile of the resampling estimates off_2
, respectively. Type I errors\alpha_1
and\alpha_2
are obtained as\alpha_1 = \Phi\left(\hat{z}_0 + \frac{\hat{z}_0 + \hat{z}_\alpha}% {1-\hat{a}\left(\hat{z}_0 + \hat{z}_\alpha\right)}\right),
and
\alpha_2 = \Phi\left(\hat{z}_0 + \frac{\hat{z}_0 + % \hat{z}_{1-\alpha}}{1-\hat{a}\left(\hat{z}_0 + % \hat{z}_{1-\alpha}\right)}\right),
where
\hat{z}_0
and\hat{a}
are called bias-correction and acceleration, respectively.There are different methods to estimate
\hat{z}_0
and\hat{a}
. Shah et al. used jackknife technique, denoted bybca.jackknife
in the function,\hat{z}_0 = \Phi^{-1}\left(\frac{N\left\{\hat{f}_{2,b}^\star <% \hat{f}_{2,\mathrm{S}} \right\}}{B}\right),
and
\hat{a} = \frac{\sum_{i=1}^{n}\left(\hat{f}_{2,\mathrm{m}} -% \hat{f}_{2, i}\right)^3}{6\left(\sum_{i=1}^{n}\left(% \hat{f}_{2,\mathrm{m}} - \hat{f}_{2, i}\right)^2\right)^{3/2}}\,,
where
N\left\{\cdot\right\}
denotes the number of element in the set,\hat{f}_{2, i}
is thei
th jackknife statistic,\hat{f}_{2,\mathrm{m}}
is the mean of the jackknife statistics, and\sum
is the summation from 1 to sample sizen
.Program
bootf2BCA
gives a slightly different BCa interval withR
'sboot
package. This approach, denoted bybca.boot
in the function, is also implemented in the function for estimating the interval.
Notes on the argument jackknife.type
For any sample with size n
, the jackknife estimator is obtained by
estimating the parameter for each subsample omitting the i
th
observation. However, when two samples (e.g., test and reference) are
involved, there are several possible ways to do it. Assuming sample size
of test and reference are n_\mathrm{T}
and n_\mathrm{R}
,
the following three possibility are considered:
Estimated by removing one observation from both test and reference samples. In this case, the prerequisite is
n_\mathrm{T} = n_\mathrm{R}
, denoted bynt=nr
in the function. So if there are 12 units in test and reference data sets, there will be 12 jackknife estimators.Estimate the jackknife for test sample while keeping the reference data unchanged; and then estimate jackknife for reference sample while keeping the test sample unchanged. This is denoted by
nt+nr
in the function. This is the default method. So if there are 12 units in test and reference data sets, there will be12 + 12 = 24
jackknife estimators.For each observation deleted from test sample, estimate jackknife for reference sample. This is denoted by
nt*nr
in the function. So if there are 12 units in test and reference data sets, there will be12 \times 12 = 144
jackknife estimators.
Value
A list of 3 or 5 components.
-
boot.ci
: A data frame of bootstrapf_2
confidence intervals. -
boot.f2
: A data frame of all individualf_2
values for all bootstrap data set. -
boot.info
: A data frame with detailed information of bootstrap for reproducibility purpose, such as all arguments used in the function, time points used for calculation off_2
, and the number ofNA
s. -
boot.summary
: A data frame with descriptive statistics of the bootstrapf_2
. -
boot.t
andboot.r
: Lists of individual bootstrap samples for test and reference product ifsim.data.out = TRUE
.
References
Shah, V. P.; Tsong, Y.; Sathe, P.; Liu, J.-P. In Vitro
Dissolution Profile Comparison—Statistics and Analysis of the
Similarity Factor, f_2
. Pharmaceutical Research 1998,
15 (6), 889–896. DOI: 10.1023/A:1011976615750.
Davison, A. C.; Hinkley, D. V. Bootstrap Methods and Their Application. Cambridge University Press, 1997.
Hyndman, R. J.; Fan, Y. Sample Quantiles in Statistical Packages. The American Statistician 1996, 50 (4), 361–365. DOI: /10.1080/00031305.1996.10473566.
Efron, B.; Tibshirani, R. An Introduction to the Bootstrap. Chapman & Hall, 1993.
Examples
# time points
tp <- c(5, 10, 15, 20, 30, 45, 60)
# model.par for reference with low variability
par.r <- list(fmax = 100, fmax.cv = 3, mdt = 15, mdt.cv = 14,
tlag = 0, tlag.cv = 0, beta = 1.5, beta.cv = 8)
# simulate reference data
dr <- sim.dp(tp, model.par = par.r, seed = 100, plot = FALSE)
# model.par for test
par.t <- list(fmax = 100, fmax.cv = 3, mdt = 12.29, mdt.cv = 12,
tlag = 0, tlag.cv = 0, beta = 1.727, beta.cv = 9)
# simulate test data with low variability
dt <- sim.dp(tp, model.par = par.t, seed = 100, plot = FALSE)
# bootstrap. to reduce test run time, n.boots of 100 was used in the example.
# In practice, it is recommended to use n.boots of 5000--10000.
# Set `output.to.screen = TRUE` to view the result on screen
d <- bootf2(dt$sim.disso, dr$sim.disso, n.boots = 100, print.report = FALSE)