sim_postcode_samples {oneclust} | R Documentation |
Simulate a high-cardinality feature and a binary response
Description
Simulate a high-cardinality feature and a binary response
Usage
sim_postcode_samples(
df_levels,
n = 2000L,
threshold = 1000,
prob = c(0.3, 0.1),
seed = 1001
)
Arguments
df_levels |
Number of levels. |
n |
Number of samples. |
threshold |
The threshold for determining if a postal code is rare. |
prob |
Occurrence probability vector of the class 1 event in rare and non-rare postal codes. |
seed |
Random seed. |
Value
A data frame of samples with postal codes, response labels, and level rarity status.
Note
The code is derived from the example described in the "rare levels"
vignette in the vtreat
package.
Examples
df_levels <- sim_postcode_levels(nlevels = 500, seed = 42)
df_postcode <- sim_postcode_samples(
df_levels,
n = 10000, threshold = 3000, prob = c(0.2, 0.1), seed = 43
)
head(df_postcode)
[Package oneclust version 0.3.0 Index]