| rulelist {tidyrules} | R Documentation |
Rulelist
Description
Structure
A rulelist is ordered list of rules stored as a dataframe. Each row,
specifies a rule (LHS), expected outcome (RHS) and some other details.
It has these mandatory columns:
-
rule_nbr: (integer vector) Rule number -
LHS: (character vector) A rule is a string that can be parsed usingbase::parse() -
RHS: (character vector or a literal)
Example
| rule_nbr|LHS |RHS | support| confidence| lift|
|--------:|:--------------------------------------------------------------------|:---------|-------:|----------:|--------:|
| 1|( island %in% c('Biscoe') ) & ( flipper_length_mm > 203 ) |Gentoo | 122| 1.0000000| 2.774193|
| 2|( island %in% c('Biscoe') ) & ( flipper_length_mm <= 203 ) |Adelie | 46| 0.9565217| 2.164760|
| 3|( island %in% c('Dream', 'Torgersen') ) & ( bill_length_mm > 44.1 ) |Chinstrap | 65| 0.9538462| 4.825339|
| 4|( island %in% c('Dream', 'Torgersen') ) & ( bill_length_mm <= 44.1 ) |Adelie | 111| 0.9459459| 2.140825|
Create a rulelist
A rulelist can be created using tidy() on some supported model fits
(run: utils::methods(tidy)). It can also be created manually from a
existing dataframe using as_rulelist.
Keys and attributes
Columns identified as 'keys' along with rule_nbr form a unique
combination
– a group of rules. For example, rule-based C5 model with multiple trials
creates rules per each trial_nbr. predict method understands 'keys',
thereby provides/predicts a rule number (for each row in new data / test
data) within the same trial_nbr.
A rulelist has these mandatory attributes:
-
estimation_type: One amongregression,classificationA rulelist has these optional attributes:
-
keys: (character vector)Names of the column that forms a key. -
model_type: (string) Name of the model
Set Validation data
This helps a few methods like augment, calculate, prune, reorder require few additional attributes which can be set using set_validation_data.
Methods for rulelist
-
Predict: Given a dataframe (possibly without a dependent variable column aka 'test data'), predicts the first rule (as ordered in the rulelist) per 'keys' that is applicable for each row. When
multiple = TRUE, returns all rules applicable for a row (per key). -
Augment: Outputs summary statistics per rule over validation data and returns a rulelist with a new dataframe-column.
-
Calculate: Computes metrics for a rulelist in a cumulative manner such as
cumulative_coverage,cumulative_overlap,cumulative_accuracy. -
Prune: Suggests pruning a rulelist such that some expectation are met (based on metrics). Example: cumulative_coverage of 80% can be met with a first few rules.
-
Reorder: Reorders a rulelist in order to maximize a metric.
Manipulating a rulelist
Rulelists are essentially dataframes. Hence, any dataframe operations which preferably preserve attributes will output a rulelist. as_rulelist and as.data.frame will help in moving back and forth between rulelist and dataframe worlds.
Utilities for a rulelist
-
as_rulelist: Create a
rulelistfrom a dataframe with some mandatory columns. -
set_keys: Set or Unset 'keys' of a
rulelist. -
to_sql_case: Outputs a SQL case statement for a
rulelist. -
convert_rule_flavor: Converts
R-parsable rule strings to python/SQL parsable rule strings.
See Also
rulelist, tidy, augment, predict, calculate, prune, reorder