R: Infer log Likelihoods using simulated distributions of...

infer_logLs {Infusion}

R Documentation

Infer log Likelihoods using simulated distributions of summary statistics

Description

For each simulated distribution of summary statistics, infer_logLs infers a probability density function, and the density of the observed values of the summary statistics is deduced. By default, inference of each density is performed by infer_logL_by_Rmixmod, which fits a distribution of summary statistics using procedures from the Rmixmod package.

Usage

infer_logLs(object, stat.obs, 
            logLname = Infusion.getOption("logLname"), 
            verbose = list(most=interactive(), 
                           final=FALSE), 
            method = Infusion.getOption("mixturing"),
            nb_cores = NULL, packages = NULL, cluster_args,
            ...)
infer_tailp(object, refDensity, stat.obs,
                tailNames=Infusion.getOption("tailNames"),
                verbose=interactive(), method=NULL, cluster_args, ...)
infer_logL_by_GLMM(EDF,stat.obs,logLname,verbose)
infer_logL_by_Rmixmod(EDF,stat.obs,logLname,verbose)
infer_logL_by_mclust(EDF,stat.obs,logLname,verbose)
infer_logL_by_Hlscv.diag(EDF,stat.obs,logLname,verbose)

Arguments

`object`	A list of simulated distributions (the return object of `add_simulation`)
`EDF`	An empirical distribution, with a required `par` attribute (an element of the `object` list).
`stat.obs`	Named numeric vector of observed values of summary statistics.
`logLname`	The name to be given to the log Likelihood in the return object, or the root of the latter name in case of conflict with other names in this object.
`tailNames`	Names of “positives” and “negatives” in the binomial response for the inference of tail probabilities.
`refDensity`	An object representing a reference density (such as an `HLfit` fit object or other objects with a similar `predict` method) which, together with the density inferred from each empirical density, defines a likelihood ratio used to define a rejection region.
`verbose`	A list as shown by the default, or simply a vector of booleans, indicating respectively whether to display (1) some information about progress; (2) a final summary of the results after all elements of `simuls` have been processed. If a count of 'outlier'(s) is reported, this typically means that `stat.obs` is not within the envelope of a simulated distribution (or whatever other meaning the user attaches to an `FALSE isValid` code: see Details)
`method`	A function for density estimation. See Description for the default behaviour and Details for the constraints on input and output of the function.
`nb_cores`	Number of cores for parallel computation. The default is `spaMM.getOption("nb_cores")`, and 1 if the latter is NULL. `nb_cores=1` which prevents the use of parallelisation procedures.
`cluster_args`	A list of arguments, passed to `makeCluster`. May contain a non-null `spec` element, in which case the distinct `nb_cores` argument is ignored.
`packages`	For parallel evaluation: Names of additional libraries to be loaded on the cores, necessary for evaluation of a user-defined 'method'.
`...`	further arguments passed to or from other methods (currently not used).

Details

By default, density estimation is based on Rmixmod methods. Other available methods are not routinely used and not all of Infusion features may work with them. The function Rmixmod::mixmodCluster is called, with arguments nbCluster=seq_nbCluster(nr=nrow(data)) and mixmodGaussianModel=Infusion.getOption("mixmodGaussianModel"). If Infusion.getOption("seq_nbCluster") specifies a sequence of values, then several clusterings are computed and AIC is used to select among them.

infer_logL_by_GLMM, infer_logL_by_Rmixmod, infer_logL_by_mclust, and infer_logL_by_Hlscv.diag are examples of the method that may be provided for density estimation. Other methods may be provided with the same arguments. Their return value must include the element logL, an estimate of the log-density of stat.obs, and the element isValid with values FALSE/TRUE (or 0/1). The standard format for the return value is unlist(c(attr(EDF,"par"),logL,isValid=isValid)).

isValid is primarily intended to indicate whether the log likelihood of stat.obs inferred by a given density estimation method was suitable input for inference of the likelihood surface. isValid has two effects: to distinguish points for which isValid is FALSE in the plot produced by plot.SLik; and more critically, to control the sampling of new parameter points within refine so that points for which isValid is FALSE are less likely to be sampled.

Invalid values may for example indicate a likelihood estimated as zero (since log(0) is not suitable input), or (for density estimation methods which may infer erroneously large values when extrapolating), whether stat.obs is within the convex hull of the EDF. In user-defined methods, invalid inferred logL should be replaced by some alternative low estimate, as all methods included in the package do.

The source code of infer_logL_by_Hlscv.diag illustrates how to test whether stat.obs is within the convex hull of the EDF, using functions resetCHull and isPointInCHull (exported from the blackbox package).

infer_logL_by_Rmixmod calls Rmixmod::mixmodCluster infer_logL_by_mclust calls mclust::densityMclust, infer_logL_by_Hlscv.diag calls ks::kde, and infer_logL_by_GLMM fits a binned distribution of summary statistics using a Poisson GLMM with autocorrelated random effects, where the binning is based on a tesselation of a volume containing the whole simulated distribution. Limited experiments so far suggest that the mixture models methods are fast and appropriate (Rmixmod, being a bit faster, is the default method); that the kernel smoothing method is more erratic and moreover requires additional input from the user, hence is not really applicable, for distributions in dimension d= 4 or above; and that the GLMM method is a very good density estimator for d=2 but will challenge one's patience for d=3 and further challenge the computer's memory for d=4.

Value

For infer_logLs, a data frame containing parameter values and their log likelihoods, and additional information such as attributes providing information about the parameter names and statistics names (not detailed here). These attributes are essential for further inferences.