exploreHypers {openEBGM} | R Documentation |
Explore various hyperparameter estimates
Description
exploreHypers
finds hyperparameter estimates using a variety of
starting points to examine the consistency of the optimization procedure.
Usage
exploreHypers(
data,
theta_init,
squashed = TRUE,
zeroes = FALSE,
N_star = 1,
method = c("nlminb", "nlm", "bfgs"),
param_limit = 100,
max_pts = 20000,
std_errors = FALSE
)
Arguments
data |
A data frame from |
theta_init |
A data frame of initial hyperparameter guesses with
columns ordered as:
|
squashed |
A scalar logical ( |
zeroes |
A scalar logical specifying if zero counts are included. |
N_star |
A positive scalar whole number value for the minimum count
size to be used for hyperparameter estimation. If zeroes are used, set
|
method |
A scalar string indicating which optimization procedure is to
be used. Choices are |
param_limit |
A scalar numeric value for the largest acceptable value
for the |
max_pts |
A scalar whole number for the largest number of data points allowed. Used to help prevent extremely long run times. |
std_errors |
A scalar logical indicating if standard errors should be returned for the hyperparameter estimates. |
Details
The method
argument determines which optimization procedure
is used. All the options use functions from the stats
package:
Since this function runs multiple optimization procedures, it is
best to start with 5 or less initial starting points (rows in
theta_init
). If the function runs in a reasonable amount of time,
this number can be increased.
This function should not be used with very large data sets unless data squashing is used first since each optimization call will take a long time.
It is recommended to use N_star = 1
when practical. Data
squashing (see squashData
) can be used to reduce the number
of data points.
The converge column in the resulting data frame was determined by examining the convergence code of the chosen optimization method. In some instances, the code is somewhat ambiguous. The determination of converge was intended to be conservative (leaning towards FALSE when questionable). See the documentation for the chosen method for details about code.
Standard errors, if requested, are calculated using the observed Fisher information matrix as discussed in DuMouchel (1999).
Value
A list including the data frame estimates
of hyperparameter
estimates corresponding to the initial guesses from theta_init
(plus
convergence results):
code: The convergence code returned by the chosen optimization function (see
nlminb
,nlm
, andoptim
for details).converge: A logical indicating whether or not convergence was reached. See "Details" section for more information.
in_bounds: A logical indicating whether or not the estimates were within the bounds of the parameter space (upper bound for
\alpha_1, \beta_1, \alpha_2, and \beta_2
was determined by theparam_limit
argument).minimum: The negative log-likelihood value corresponding to the estimated optimal value of the hyperparameter.
Also returns the data frame std_errs
if standard errors are
requested.
Warning
Make sure to properly specify the squashed
,
zeroes
, and N_star
arguments for your data set, since these
will determine the appropriate likelihood function. Also, this function
will not filter out data points. For instance, if you use N_star = 2
you must filter out the ones and zeroes (if present) from data
prior
to using this function.
References
DuMouchel W (1999). "Bayesian Data Mining in Large Frequency Tables, With an Application to the FDA Spontaneous Reporting System." The American Statistician, 53(3), 177-190.
See Also
nlminb
, nlm
, and
optim
for optimization details
squashData
for data preparation
Other hyperparameter estimation functions:
autoHyper()
,
hyperEM()
Examples
data.table::setDTthreads(2) #only needed for CRAN checks
#Start with 2 or more guesses
theta_init <- data.frame(
alpha1 = c(0.5, 1),
beta1 = c(0.5, 1),
alpha2 = c(2, 3),
beta2 = c(2, 3),
p = c(0.1, 0.2)
)
data(caers)
proc <- processRaw(caers)
squashed <- squashData(proc, bin_size = 300, keep_pts = 10)
squashed <- squashData(squashed, count = 2, bin_size = 13, keep_pts = 10)
suppressWarnings(
exploreHypers(squashed, theta_init = theta_init)
)