UNCOVER.opts {UNCOVER} | R Documentation |
Additional argument generator for UNCOVER()
Description
This function is used to specify additional arguments to
UNCOVER
.
Usage
UNCOVER.opts(
N = 1000,
train_frac = 1,
max_K = Inf,
min_size = 0,
reg = 0,
n_min_class = 0,
SMC_thres = 30,
BIC_memo_thres = Inf,
SMC_memo_thres = Inf,
ess = N/2,
n_move = 1,
prior.override = FALSE,
rprior = NULL,
dprior = NULL,
diagnostics = TRUE,
RIBIS_thres = 30,
BIC_cache = cachem::cache_mem(max_size = 1024 * 1024^2, evict = "lru"),
SMC_cache = cachem::cache_mem(max_size = 1024 * 1024^2, evict = "lru"),
...
)
Arguments
N |
Number of particles for the SMC sampler. Defaults to 1000. |
train_frac |
What fraction of the data should be used for training.
Should only be directly specified if |
max_K |
The maximum number of clusters allowed in the final output.
Should only be directly specified if |
min_size |
The minimum number of observations allowed for any cluster
in the final model. Should only be directly specified if
|
reg |
Numerical natural logarithm of the tolerance parameter. Must be
positive. Should only be directly specified if
|
n_min_class |
Each cluster will have an associated minority class.
|
SMC_thres |
The threshold for which the number of observations needs to exceed to consider using BIC as an estimator. Defaults to 30 if not specified. |
BIC_memo_thres |
Only used when estimating the log Bayesian evidence of
a cluster using BIC. When the number of observations exceeds |
SMC_memo_thres |
Only used when estimating the log Bayesian evidence of
a cluster using SMC. When the number of observations exceeds |
ess |
Effective Sample Size Threshold: If the effective sample size of
the particles falls below this value then a resample move step is
triggered. Defaults to |
n_move |
Number of Metropolis-Hastings steps to apply each time a resample move step is triggered. Defaults to 1. |
prior.override |
Are you overriding the default multivariate normal
form of the prior? Defaults to |
rprior |
Function which produces samples from your prior if the default prior form is to be overridden. If using the default prior form this does not need to be specified. |
dprior |
Function which produces your specified priors density for inputted samples if the default prior form is to be overridden. If using the default prior form this does not need to be specified. |
diagnostics |
Should diagnostic data be recorded and outputted?
Defaults to |
RIBIS_thres |
The threshold for which the number of observations needs to exceed to consider ever using RIBIS as an estimator. Defaults to 30 if not specified. See details. |
BIC_cache |
The cache for the function which estimates the log Bayesian evidence using BIC. Defaults to a cache with standard size and least recently used eviction policy. |
SMC_cache |
The cache for the function which estimates the log Bayesian evidence using SMC. Defaults to a cache with standard size and least recently used eviction policy. |
... |
Additional arguments required for complete specification of the two prior functions given, if the default prior form is to be overridden. |
Details
This function should only be used to provide additional control
arguments to UNCOVER
. Arguments that are for a particular deforestation
criteria should not be altered from the defaults for other deforestation
criteria.
BIC refers to the Bayesian Information Criterion. The use of BIC when
estimating the log Bayesian evidence is valid assuming the number of
observations is large, and if specifying SMC_thres
this should be balanced
with computational expense (as the function which relies
on BIC values is much faster than the SMC sampler).
In an attempt to improve computational time, the SMC sampler along with the
function which uses BIC values are memoised, with the cache for each of
these memoised functions be specified by SMC_cache
and BIC_cache
respectively. See memoise::memoise()
for more details. If we do
not get and each match from the function input to a previously evaluated
input, we may wish to search the cache for similar inputs which could
provide a reasonable starting point. Checking the cache however takes time,
and so we allow the user to specify at which size of cluster to they deem it
worthwhile to check. Which value threshold to select to optimise run time is
problem specific, however for BIC_memo_thres
it is almost always
beneficial to never check the cache (the exception for this being when the
cluster sizes are extremely large, for example containing a million
observations). SMC_memo_thres
can be much lower as the SMC sampler is a
much more expensive function to run. See Emerson and Aslett (2023) for more
details.
RIBIS_thres
can be specified to have a higher value to ensure that the
asymptotic properties which Reverse Iterated Batch Importance Sampling
(RIBIS) relies upon hold. See Emerson and Aslett (2023) for more details.
Specifying rprior
and dprior
will not override the default prior form
unless prior.override=TRUE
. If a multivariate normal form is required then
the arguments for this prior should be specified in UNCOVER
.
Value
A list consisting of:
N
Number of particles for the SMC sampler
train_frac
Training data fraction
max_K
Maximum number of clusters allowed
min_size
Minimum size of clusters allowed
reg
Log of the maximum regret tolerance parameter
n_min_class
Minimum size of cluster minority class allowed
SMC_thres
Threshold for when estimation with BIC is attempted
BIC_memo_thres
Threshold for when we review previous inputs of the BIC function for similarities
SMC_memo_thres
Threshold for when we review previous inputs of the SMC function for similarities
ess
Effective Sample Size Threshold
n_move
Number of Metropolis-Hastings steps
rprior
Function which produces samples from your prior.
NULL
ifprior.override==FALSE
.dprior
Function which produces your specified priors density for inputted samples.
NULL
ifprior.override==FALSE
.prior.override
Logical value indicating if the prior has been overridden or not
diagnostics
Logical value indicating whether diagnostic information should be included in the output of
UNCOVER
RIBIS_thres
The threshold for allowing the use of RIBIS
BIC_cache
Cache for the memoised function which estimates the log Bayesian evidence using BIC
SMC_cache
Cache for the memoised function which estimates the log Bayesian evidence using SMC
MoreArgs
A list of the additional arguments required for
rprior
anddprior
.NULL
ifprior.override==FALSE
.
References
Emerson, S.R. and Aslett, L.J.M. (2023). Joint cohort and prediction modelling through graphical structure analysis (to be released)
See Also
Examples
#Specifying a multivariate independent uniform prior
rmviu <- function(n,a,b){
return(mapply(FUN = function(min.vec,max.vec,pn){stats::runif(pn,a,b)},
min.vec=a,max.vec=b,MoreArgs = list(pn = n)))
}
dmviu <- function(x,a,b){
for(ii in 1:ncol(x)){
x[,ii] <- dunif(x[,ii],a[ii],b[ii])
}
return(apply(x,1,prod))
}
UNCOVER.opts(prior.override = TRUE,rprior = rmviu,
dprior = dmviu,a=rep(0,3),b=rep(1,3))
# If we generate a co-variate matrix and binary response vector
CM <- matrix(rnorm(200),100,2)
rv <- sample(0:1,100,replace=TRUE)
# We can then run our algorithm with a SMC threshold of 50 and a SMC cache
# checking threshold of 25 to see if this is quicker than the standard
# version
system.time(UNCOVER(X = CM,y = rv,verbose = FALSE))
system.time(UNCOVER(X = CM,y = rv,
options = UNCOVER.opts(SMC_thres = 50),
verbose = FALSE))
system.time(UNCOVER(X = CM,y = rv,
options = UNCOVER.opts(SMC_thres = 50,
SMC_memo_thres = 25),
verbose = FALSE))