R: fit the scalable bayesian rule lists model

sbrl {sbrl}

R Documentation

fit the scalable bayesian rule lists model

Description

Fit the scalable bayesian rule lists model with given data and parameters. It generates a model that is a probabilistic classifier that optimizes the posterior of a Bayesian hierarchical model over pre-mined association rules.

Usage

sbrl(tdata, iters=30000, pos_sign="1", 
 neg_sign="0", rule_minlen=1, rule_maxlen=1, 
 minsupport_pos=0.10, minsupport_neg=0.10, 
 lambda=10.0, eta=1.0, alpha=c(1,1), nchain=10)

Arguments

`tdata`	a dataframe, with a "label" column specifying the correct labels for each observation.
`iters`	the number of iterations for each MCMC chain.
`pos_sign`	the sign for the positive labels in the "label" column.
`neg_sign`	the sign for the negative labels in the "label" column.
`rule_minlen`	the minimum number of cardinality for rules to be mined from the dataframe.
`rule_maxlen`	the maximum number of cardinality for rules to be mined from the dataframe.
`minsupport_pos`	a number between 0 and 1, for the minimum percentage support for the positive observations.
`minsupport_neg`	a number between 0 and 1, for the minimum percentage support for the negative observations.
`lambda`	a hyperparameter for the expected length of the rule list.
`eta`	a hyperparameter for the expected cardinality of the rules in the optimal rule list.
`alpha`	a prior pseudo-count for the positive and negative classes. fixed at 1's
`nchain`	an integer for the number of the chains that MCMC will be running.

Value

Return a list of :

`rs`	a ruleset which contains the rule indices and their positive probabilities for the best rule list by training sbrl with the given data and parameters.
`rulenames`	a list of all the rule names mined with `arules`.
`featurenames`	a list of all the feature names.
`mat_feature_rule`	a binary matrix representing which features are included in which rules.

Author(s)

Hongyu Yang, Morris Chen, Cynthia Rudin, Margo Seltzer

References

Hongyu Yang, Cynthia Rudin, Margo Seltzer (2017) Scalable Bayesian Rule Lists. Proceedings of the 34th International Conference on Machine Learning, PMLR 70:3921-3930, 2017.

Benjamin Letham, Cynthia Rudin, Tyler McCormick and David Madigan (2015) Building Interpretable Classifiers with Rules using Bayesian Analysis. Annals of Applied Statistics, 2015.

Examples

# Let us use the titactoe dataset
data(tictactoe)
for (name in names(tictactoe)) {tictactoe[name] <- as.factor(tictactoe[,name])}

# Train on two-thirds of the data
b = round(2*nrow(tictactoe)/3, digit=0)
data_train <- tictactoe[1:b, ]
# Test on the remaining one third of the data
data_test <- tictactoe[(b+1):nrow(tictactoe), ]
# data_train, data_test are dataframes with factor columns
# The class column is "label"

# Run the sbrl algorithm on the training set
  sbrl_model <- sbrl(data_train, iters=20000, pos_sign="1",
   neg_sign="0", rule_minlen=1, rule_maxlen=3, 
   minsupport_pos=0.10, minsupport_neg=0.10, 
   lambda=10.0, eta=1.0, nchain=25)
  print(sbrl_model)

# Make predictions on the test set
  yhat <- predict(sbrl_model, data_test)
# yhat will be a list of predicted negative and positive probabilities for the test data. 

#clean up
rm(list = ls())
gc()

[Package sbrl version 1.4 Index]