forge {arf} | R Documentation |
Forests for Generative Modeling
Description
Uses pre-trained FORDE model to simulate synthetic data.
Usage
forge(params, n_synth, evidence = NULL)
Arguments
params |
Circuit parameters learned via |
n_synth |
Number of synthetic samples to generate. |
evidence |
Optional set of conditioning events. This can take one of three forms: (1) a partial sample, i.e. a single row of data with some but not all columns; (2) a data frame of conditioning events, which allows for inequalities; or (3) a posterior distribution over leaves. See Details. |
Details
forge
simulates a synthetic dataset of n_synth
samples. First,
leaves are sampled in proportion to either their coverage (if
evidence = NULL
) or their posterior probability. Then, each feature is
sampled independently within each leaf according to the probability mass or
density function learned by forde
. This will create realistic
data so long as the adversarial RF used in the previous step satisfies the
local independence criterion. See Watson et al. (2023).
There are three methods for (optionally) encoding conditioning events via the
evidence
argument. The first is to provide a partial sample, where
some but not all columns from the training data are present. The second is to
provide a data frame with three columns: variable
, relation
,
and value
. This supports inequalities via relation
.
Alternatively, users may directly input a pre-calculated posterior
distribution over leaves, with columns f_idx
and wt
. This may
be preferable for complex constraints. See Examples.
Value
A dataset of n_synth
synthetic samples.
References
Watson, D., Blesch, K., Kapar, J., & Wright, M. (2023). Adversarial random forests for density estimation and generative modeling. In Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, pp. 5357-5375.
See Also
Examples
arf <- adversarial_rf(iris)
psi <- forde(arf, iris)
x_synth <- forge(psi, n_synth = 100)
# Condition on Species = "setosa"
evi <- data.frame(Species = "setosa")
x_synth <- forge(psi, n_synth = 100, evidence = evi)
# Condition in Species = "setosa" and Sepal.Length > 6
evi <- data.frame(variable = c("Species", "Sepal.Length"),
relation = c("==", ">"),
value = c("setosa", 6))
x_synth <- forge(psi, n_synth = 100, evidence = evi)
# Or just input some distribution on leaves
# (Weights that do not sum to unity are automatically scaled)
n_leaves <- nrow(psi$forest)
evi <- data.frame(f_idx = psi$forest$f_idx, wt = rexp(n_leaves))
x_synth <- forge(psi, n_synth = 100, evidence = evi)