rulelist {tidyrules} | R Documentation |
Rulelist
Description
Structure
A rulelist
is ordered list of rules stored as a dataframe. Each row,
specifies a rule (LHS), expected outcome (RHS) and some other details.
It has these mandatory columns:
-
rule_nbr
: (integer vector) Rule number -
LHS
: (character vector) A rule is a string that can be parsed usingbase::parse()
-
RHS
: (character vector or a literal)
Example
| rule_nbr|LHS |RHS | support| confidence| lift| |--------:|:--------------------------------------------------------------------|:---------|-------:|----------:|--------:| | 1|( island %in% c('Biscoe') ) & ( flipper_length_mm > 203 ) |Gentoo | 122| 1.0000000| 2.774193| | 2|( island %in% c('Biscoe') ) & ( flipper_length_mm <= 203 ) |Adelie | 46| 0.9565217| 2.164760| | 3|( island %in% c('Dream', 'Torgersen') ) & ( bill_length_mm > 44.1 ) |Chinstrap | 65| 0.9538462| 4.825339| | 4|( island %in% c('Dream', 'Torgersen') ) & ( bill_length_mm <= 44.1 ) |Adelie | 111| 0.9459459| 2.140825|
Create a rulelist
A rulelist
can be created using tidy()
on some supported model fits
(run: utils::methods(tidy)
). It can also be created manually from a
existing dataframe using as_rulelist.
Keys and attributes
Columns identified as 'keys' along with rule_nbr
form a unique
combination
– a group of rules. For example, rule-based C5 model with multiple trials
creates rules per each trial_nbr
. predict
method understands 'keys',
thereby provides/predicts a rule number (for each row in new data / test
data) within the same trial_nbr
.
A rulelist has these mandatory attributes:
-
estimation_type
: One amongregression
,classification
A rulelist has these optional attributes:
-
keys
: (character vector)Names of the column that forms a key. -
model_type
: (string) Name of the model
Set Validation data
This helps a few methods like augment, calculate, prune, reorder require few additional attributes which can be set using set_validation_data.
Methods for rulelist
-
Predict: Given a dataframe (possibly without a dependent variable column aka 'test data'), predicts the first rule (as ordered in the rulelist) per 'keys' that is applicable for each row. When
multiple = TRUE
, returns all rules applicable for a row (per key). -
Augment: Outputs summary statistics per rule over validation data and returns a rulelist with a new dataframe-column.
-
Calculate: Computes metrics for a rulelist in a cumulative manner such as
cumulative_coverage
,cumulative_overlap
,cumulative_accuracy
. -
Prune: Suggests pruning a rulelist such that some expectation are met (based on metrics). Example: cumulative_coverage of 80% can be met with a first few rules.
-
Reorder: Reorders a rulelist in order to maximize a metric.
Manipulating a rulelist
Rulelists are essentially dataframes. Hence, any dataframe operations which preferably preserve attributes will output a rulelist. as_rulelist and as.data.frame will help in moving back and forth between rulelist and dataframe worlds.
Utilities for a rulelist
-
as_rulelist: Create a
rulelist
from a dataframe with some mandatory columns. -
set_keys: Set or Unset 'keys' of a
rulelist
. -
to_sql_case: Outputs a SQL case statement for a
rulelist
. -
convert_rule_flavor: Converts
R
-parsable rule strings to python/SQL parsable rule strings.
See Also
rulelist, tidy, augment, predict, calculate, prune, reorder