RCAR {arulesCBA}  R Documentation 
Regularized Class Association Rules for Multiclass Problems (RCAR+)
Description
Build a classifier based on association rules mined for an input dataset and weighted with LASSO regularized logistic regression following RCAR (Azmi, et al., 2019). RCAR+ extends RCAR from a binary classifier to a multilabel classifier and can use supportbalanced CARs.
Usage
RCAR(
formula,
data,
lambda = NULL,
alpha = 1,
glmnet.args = NULL,
cv.glmnet.args = NULL,
parameter = NULL,
control = NULL,
balanceSupport = FALSE,
disc.method = "mdlp",
verbose = FALSE,
...
)
Arguments
formula 
A symbolic description of the model to be fitted. Has to be
of form 
data 
A data.frame or arules::transactions containing the training data.
Data frames are automatically discretized and converted to transactions with

lambda 
The amount of weight given to regularization during the
logistic regression learning process. If not specified ( 
alpha 
The elastic net mixing parameter. 
cv.glmnet.args , glmnet.args 
A list of arguments passed on to

parameter , control 
Optional parameter and control lists for 
balanceSupport 
balanceSupport parameter passed to 
disc.method 
Discretization method for factorizing numeric input
(default: 
verbose 
Report progress? 
... 
For convenience, additional parameters are used to create the

Details
RCAR+ extends RCAR from a binary classifier to a multilabel classifier using regularized multinomial logistic regression via glmnet.
In arulesCBA, the class variable is always represented by a set of items.
For a binary classification problem, we use an item and its compliment
(typically called <item label>=TRUE
and <item label>=FALSE
). For
a multilabel classification problem we use one item for each possible class
label (format <class item>=<label>
). See prepareTransactions()
for details.
RCAR+ first mines CARs to find itemsets (LHS of the CARs) that are related
to the class items. Then, a transaction x lhs(CAR) coverage matrix X
is created.
The matrix contains
a 1 if the LHS of the CAR applies to the transaction, and 0 otherwise.
A regularized multinomial logistic model to predict the true class y
for each transaction given X
is fitted. Note that the RHS of the
CARs are actually ignored in this process, so the algorithm effectively
uses rules consisting of each LHS of a CAR paired with each class label.
This is important to keep in mind when trying to interpret the rules used in
the classifier.
If lambda for regularization is not specified during training (lambda = NULL
)
then crossvalidation is used
to determine the largest value of lambda such that the error is within 1 standard error of the
minimum (see glmnet::cv.glmnet()
for how to perform crossvalidation in parallel).
For the final classifier, we only keep the rules that have a weight greater than 0 for at least one class label. The rules include as the weight the beta coefficients of the model.
Prediction for a new transaction is performed in two steps:
Translate the transaction into a 01 coverage vector indicating what class association rule's LHS covers the transaction.
Calculate the predicted label given the multinomial logistic regression model.
Value
Returns an object of class CBA representing the trained
classifier with the additional field model
containing a list with the
following elements:
reg_model 
them multinomial logistic regression model as an object of class glmnet::glmnet. 
cv 
only available if 
all_rules 
the actual classifier only contains the rules with
nonzero weights. This field contains all rules used to build the classifier,
including the rules with a weight of zero. This is consistent with the
model in 
Author(s)
Tyler Giallanza and Michael Hahsler
References
M. Azmi, G.C. Runger, and A. Berrado (2019). Interpretable regularized class association rules algorithm for classification in a categorical data space. Information Sciences, Volume 483, May 2019. Pages 313331.
See Also
Other classifiers:
CBA()
,
CBA_helpers
,
CBA_ruleset()
,
FOIL()
,
LUCS_KDD_CBA
,
RWeka_CBA
Examples
data("iris")
classifier < RCAR(Species ~ ., iris)
classifier
# inspect the rule base sorted by the larges class weight
inspect(sort(classifier$rules, by = "weight"))
# make predictions for the first few instances of iris
predict(classifier, head(iris))
table(pred = predict(classifier, iris), true = iris$Species)
# plot the crossvalidation curve as a function of lambda and add a
# red line at lambda.1se used to determine lambda.
plot(classifier$model$cv)
abline(v = log(classifier$model$cv$lambda.1se), col = "red")
# plot the coefficient profile plot (regularization path) for each class
# label. Note the line for the chosen lambda is only added to the last plot.
# You can manually add it to the others.
plot(classifier$model$reg_model, xvar = "lambda", label = TRUE)
abline(v = log(classifier$model$cv$lambda.1se), col = "red")
#' inspect rule 11 which has a large weight for class virginica
inspect(classifier$model$all_rules[11])