forde {arf}  R Documentation 
Forests for Density Estimation
Description
Uses a pretrained ARF model to estimate leaf and distribution parameters.
Usage
forde(
arf,
x,
oob = FALSE,
family = "truncnorm",
finite_bounds = FALSE,
alpha = 0,
epsilon = 0,
parallel = TRUE
)
Arguments
arf 
Pretrained 
x 
Training data for estimating parameters. 
oob 
Only use outofbag samples for parameter estimation? If

family 
Distribution to use for density estimation of continuous
features. Current options include truncated normal (the default

finite_bounds 
Impose finite bounds on all continuous variables? 
alpha 
Optional pseudocount for Laplace smoothing of categorical features. This avoids zeromass points when test data fall outside the support of training data. Effectively parametrizes a flat Dirichlet prior on multinomial likelihoods. 
epsilon 
Optional slack parameter on empirical bounds when

parallel 
Compute in parallel? Must register backend beforehand, e.g.
via 
Details
forde
extracts leaf parameters from a pretrained forest and learns
distribution parameters for data within each leaf. The former includes
coverage (proportion of data falling into the leaf) and split criteria. The
latter includes proportions for categorical features and mean/variance for
continuous features. The result is a probabilistic circuit, stored as a
data.table
, which can be used for various downstream inference tasks.
Currently, forde
only provides support for a limited number of
distributional families: truncated normal or uniform for continuous data,
and multinomial for discrete data. Future releases will accommodate a larger
set of options.
Though forde
was designed to take an adversarial random forest as
input, the function's first argument can in principle be any object of class
ranger
. This allows users to test performance with alternative
pipelines (e.g., with supervised forest input). There is also no requirement
that x
be the data used to fit arf
, unless oob = TRUE
.
In fact, using another dataset here may protect against overfitting. This
connects with Wager & Athey's (2018) notion of "honest trees".
Value
A list
with 5 elements: (1) parameters for continuous data; (2)
parameters for discrete data; (3) leaf indices and coverage; (4) metadata on
variables; and (5) the data input class. This list is used for estimating
likelihoods with lik
and generating data with forge
.
References
Watson, D., Blesch, K., Kapar, J., & Wright, M. (2023). Adversarial random forests for density estimation and generative modeling. In Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, pp. 53575375.
Wager, S. & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc., 113(523): 12281242.
See Also
Examples
arf < adversarial_rf(iris)
psi < forde(arf, iris)
head(psi)