UNCOVER.opts {UNCOVER}R Documentation

Additional argument generator for UNCOVER()

Description

This function is used to specify additional arguments to UNCOVER.

Usage

UNCOVER.opts(
  N = 1000,
  train_frac = 1,
  max_K = Inf,
  min_size = 0,
  reg = 0,
  n_min_class = 0,
  SMC_thres = 30,
  BIC_memo_thres = Inf,
  SMC_memo_thres = Inf,
  ess = N/2,
  n_move = 1,
  prior.override = FALSE,
  rprior = NULL,
  dprior = NULL,
  diagnostics = TRUE,
  RIBIS_thres = 30,
  BIC_cache = cachem::cache_mem(max_size = 1024 * 1024^2, evict = "lru"),
  SMC_cache = cachem::cache_mem(max_size = 1024 * 1024^2, evict = "lru"),
  ...
)

Arguments

N

Number of particles for the SMC sampler. Defaults to 1000.

train_frac

What fraction of the data should be used for training. Should only be directly specified if ⁠deforest_criterion == "Validation⁠. Defaults to 1.

max_K

The maximum number of clusters allowed in the final output. Should only be directly specified if ⁠deforest_criterion == "NoC⁠. Defaults to Inf.

min_size

The minimum number of observations allowed for any cluster in the final model. Should only be directly specified if ⁠deforest_criterion == "SoC⁠. Defaults to 0.

reg

Numerical natural logarithm of the tolerance parameter. Must be positive. Should only be directly specified if ⁠deforest_criterion == "MaxReg⁠. Defaults to 0.

n_min_class

Each cluster will have an associated minority class. n_min_class specifies a minimum number of observations that should have that class for each and every cluster. Should only be directly specified if ⁠deforest_criterion == "Diverse⁠. Defaults to 0.

SMC_thres

The threshold for which the number of observations needs to exceed to consider using BIC as an estimator. Defaults to 30 if not specified.

BIC_memo_thres

Only used when estimating the log Bayesian evidence of a cluster using BIC. When the number of observations exceeds BIC_memo_thres the function checks for similar inputs evaluated previously. See details. Defaults to never checking.

SMC_memo_thres

Only used when estimating the log Bayesian evidence of a cluster using SMC. When the number of observations exceeds SMC_memo_thres the function checks for similar inputs evaluated previously. See details. Defaults to never checking.

ess

Effective Sample Size Threshold: If the effective sample size of the particles falls below this value then a resample move step is triggered. Defaults to N/2.

n_move

Number of Metropolis-Hastings steps to apply each time a resample move step is triggered. Defaults to 1.

prior.override

Are you overriding the default multivariate normal form of the prior? Defaults to FALSE.

rprior

Function which produces samples from your prior if the default prior form is to be overridden. If using the default prior form this does not need to be specified.

dprior

Function which produces your specified priors density for inputted samples if the default prior form is to be overridden. If using the default prior form this does not need to be specified.

diagnostics

Should diagnostic data be recorded and outputted? Defaults to TRUE.

RIBIS_thres

The threshold for which the number of observations needs to exceed to consider ever using RIBIS as an estimator. Defaults to 30 if not specified. See details.

BIC_cache

The cache for the function which estimates the log Bayesian evidence using BIC. Defaults to a cache with standard size and least recently used eviction policy.

SMC_cache

The cache for the function which estimates the log Bayesian evidence using SMC. Defaults to a cache with standard size and least recently used eviction policy.

...

Additional arguments required for complete specification of the two prior functions given, if the default prior form is to be overridden.

Details

This function should only be used to provide additional control arguments to UNCOVER. Arguments that are for a particular deforestation criteria should not be altered from the defaults for other deforestation criteria.

BIC refers to the Bayesian Information Criterion. The use of BIC when estimating the log Bayesian evidence is valid assuming the number of observations is large, and if specifying SMC_thres this should be balanced with computational expense (as the function which relies on BIC values is much faster than the SMC sampler).

In an attempt to improve computational time, the SMC sampler along with the function which uses BIC values are memoised, with the cache for each of these memoised functions be specified by SMC_cache and BIC_cache respectively. See memoise::memoise() for more details. If we do not get and each match from the function input to a previously evaluated input, we may wish to search the cache for similar inputs which could provide a reasonable starting point. Checking the cache however takes time, and so we allow the user to specify at which size of cluster to they deem it worthwhile to check. Which value threshold to select to optimise run time is problem specific, however for BIC_memo_thres it is almost always beneficial to never check the cache (the exception for this being when the cluster sizes are extremely large, for example containing a million observations). SMC_memo_thres can be much lower as the SMC sampler is a much more expensive function to run. See Emerson and Aslett (2023) for more details.

RIBIS_thres can be specified to have a higher value to ensure that the asymptotic properties which Reverse Iterated Batch Importance Sampling (RIBIS) relies upon hold. See Emerson and Aslett (2023) for more details.

Specifying rprior and dprior will not override the default prior form unless prior.override=TRUE. If a multivariate normal form is required then the arguments for this prior should be specified in UNCOVER.

Value

A list consisting of:

N

Number of particles for the SMC sampler

train_frac

Training data fraction

max_K

Maximum number of clusters allowed

min_size

Minimum size of clusters allowed

reg

Log of the maximum regret tolerance parameter

n_min_class

Minimum size of cluster minority class allowed

SMC_thres

Threshold for when estimation with BIC is attempted

BIC_memo_thres

Threshold for when we review previous inputs of the BIC function for similarities

SMC_memo_thres

Threshold for when we review previous inputs of the SMC function for similarities

ess

Effective Sample Size Threshold

n_move

Number of Metropolis-Hastings steps

rprior

Function which produces samples from your prior. NULL if prior.override==FALSE.

dprior

Function which produces your specified priors density for inputted samples. NULL if prior.override==FALSE.

prior.override

Logical value indicating if the prior has been overridden or not

diagnostics

Logical value indicating whether diagnostic information should be included in the output of UNCOVER

RIBIS_thres

The threshold for allowing the use of RIBIS

BIC_cache

Cache for the memoised function which estimates the log Bayesian evidence using BIC

SMC_cache

Cache for the memoised function which estimates the log Bayesian evidence using SMC

MoreArgs

A list of the additional arguments required for rprior and dprior. NULL if prior.override==FALSE.

References

See Also

UNCOVER()

Examples


#Specifying a multivariate independent uniform prior

rmviu <- function(n,a,b){
return(mapply(FUN = function(min.vec,max.vec,pn){stats::runif(pn,a,b)},
              min.vec=a,max.vec=b,MoreArgs = list(pn = n)))
}
dmviu <- function(x,a,b){
for(ii in 1:ncol(x)){
  x[,ii] <- dunif(x[,ii],a[ii],b[ii])
}
return(apply(x,1,prod))
}

UNCOVER.opts(prior.override = TRUE,rprior = rmviu,
                 dprior = dmviu,a=rep(0,3),b=rep(1,3))


# If we generate a co-variate matrix and binary response vector
CM <- matrix(rnorm(200),100,2)
rv <- sample(0:1,100,replace=TRUE)

# We can then run our algorithm with a SMC threshold of 50 and a SMC cache
# checking threshold of 25 to see if this is quicker than the standard
# version
system.time(UNCOVER(X = CM,y = rv,verbose = FALSE))
system.time(UNCOVER(X = CM,y = rv,
                    options = UNCOVER.opts(SMC_thres = 50),
                    verbose = FALSE))
system.time(UNCOVER(X = CM,y = rv,
                    options = UNCOVER.opts(SMC_thres = 50,
                                           SMC_memo_thres = 25),
                    verbose = FALSE))



[Package UNCOVER version 1.1.0 Index]