maxdepth_sampler {pre} | R Documentation |
Sampling function generator for specifying varying maximum tree depth in a prediction rule ensemble (pre)
Description
maxdepth_sampler
generates a random sampling function, governed
by a pre-specified average tree depth.
Usage
maxdepth_sampler(av.no.term.nodes = 4L, av.tree.depth = NULL)
Arguments
av.no.term.nodes |
integer of length one. Specifies the average number of terminal nodes in trees used for rule inducation. |
av.tree.depth |
integer of length one. Specifies the average maximum tree depth in trees used for rule induction. |
Details
The original RuleFit implementation varying tree sizes for
rule induction. Furthermore, it defined tree size in terms of the number
of terminal nodes. In contrast, function pre
defines the
maximum tree size in terms of a (constant) tree depth. Function
maxdepth_sampler
allows for mimicing the behavior of the
orignal RuleFit implementation. In effect, the maximum tree depth is
sampled from an exponential distribution with learning rate
1/(\bar{L}-2)
, where \bar{L} \ge 2
represents the
average number of terminal nodes for trees in the ensemble. See
Friedman & Popescu (2008, section 3.3).
Value
Returns a random sampling function with single argument ntrees
,
which can be supplied to the maxdepth
argument of function
pre
to specify varying tree depths.
References
Friedman, J. H., & Popescu, B. E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3), 916-954.
See Also
Examples
## RuleFit default is max. 4 terminal nodes, on average:
func1 <- maxdepth_sampler()
set.seed(42)
func1(10)
mean(func1(1000))
## Max. 16 terminal nodes, on average (equals average maxdepth of 4):
func2 <- maxdepth_sampler(av.no.term.nodes = 16L)
set.seed(42)
func2(10)
mean(func2(1000))
## Max. tree depth of 3, on average:
func3 <- maxdepth_sampler(av.tree.depth = 3)
set.seed(42)
func3(10)
mean(func3(1000))
## Max. 2 of terminal nodes, on average (always yields maxdepth of 1):
func4 <- maxdepth_sampler(av.no.term.nodes = 2L)
set.seed(42)
func4(10)
mean(func4(1000))
## Create rule ensemble with varying maxdepth:
set.seed(42)
airq.ens <- pre(Ozone ~ ., data = airquality[complete.cases(airquality),],
maxdepth = func1)
airq.ens