RCAR {arulesCBA} | R Documentation |
Regularized Class Association Rules for Multi-class Problems (RCAR+)
Description
Build a classifier based on association rules mined for an input dataset and weighted with LASSO regularized logistic regression following RCAR (Azmi, et al., 2019). RCAR+ extends RCAR from a binary classifier to a multi-label classifier and can use support-balanced CARs.
Usage
RCAR(
formula,
data,
lambda = NULL,
alpha = 1,
glmnet.args = NULL,
cv.glmnet.args = NULL,
parameter = NULL,
control = NULL,
balanceSupport = FALSE,
disc.method = "mdlp",
verbose = FALSE,
...
)
Arguments
formula |
A symbolic description of the model to be fitted. Has to be
of form |
data |
A data.frame or arules::transactions containing the training data.
Data frames are automatically discretized and converted to transactions with
|
lambda |
The amount of weight given to regularization during the
logistic regression learning process. If not specified ( |
alpha |
The elastic net mixing parameter. |
cv.glmnet.args , glmnet.args |
A list of arguments passed on to
|
parameter , control |
Optional parameter and control lists for |
balanceSupport |
balanceSupport parameter passed to |
disc.method |
Discretization method for factorizing numeric input
(default: |
verbose |
Report progress? |
... |
For convenience, additional parameters are used to create the
|
Details
RCAR+ extends RCAR from a binary classifier to a multi-label classifier using regularized multinomial logistic regression via glmnet.
In arulesCBA, the class variable is always represented by a set of items.
For a binary classification problem, we use an item and its compliment
(typically called <item label>=TRUE
and <item label>=FALSE
). For
a multi-label classification problem we use one item for each possible class
label (format <class item>=<label>
). See prepareTransactions()
for details.
RCAR+ first mines CARs to find itemsets (LHS of the CARs) that are related
to the class items. Then, a transaction x lhs(CAR) coverage matrix X
is created.
The matrix contains
a 1 if the LHS of the CAR applies to the transaction, and 0 otherwise.
A regularized multinomial logistic model to predict the true class y
for each transaction given X
is fitted. Note that the RHS of the
CARs are actually ignored in this process, so the algorithm effectively
uses rules consisting of each LHS of a CAR paired with each class label.
This is important to keep in mind when trying to interpret the rules used in
the classifier.
If lambda for regularization is not specified during training (lambda = NULL
)
then cross-validation is used
to determine the largest value of lambda such that the error is within 1 standard error of the
minimum (see glmnet::cv.glmnet()
for how to perform cross-validation in parallel).
For the final classifier, we only keep the rules that have a weight greater than 0 for at least one class label. The rules include as the weight the beta coefficients of the model.
Prediction for a new transaction is performed in two steps:
Translate the transaction into a 0-1 coverage vector indicating what class association rule's LHS covers the transaction.
Calculate the predicted label given the multinomial logistic regression model.
Value
Returns an object of class CBA representing the trained
classifier with the additional field model
containing a list with the
following elements:
reg_model |
them multinomial logistic regression model as an object of class glmnet::glmnet. |
cv |
only available if |
all_rules |
the actual classifier only contains the rules with
non-zero weights. This field contains all rules used to build the classifier,
including the rules with a weight of zero. This is consistent with the
model in |
Author(s)
Tyler Giallanza and Michael Hahsler
References
M. Azmi, G.C. Runger, and A. Berrado (2019). Interpretable regularized class association rules algorithm for classification in a categorical data space. Information Sciences, Volume 483, May 2019. Pages 313-331.
See Also
Other classifiers:
CBA()
,
CBA_helpers
,
CBA_ruleset()
,
FOIL()
,
LUCS_KDD_CBA
,
RWeka_CBA
Examples
data("iris")
classifier <- RCAR(Species ~ ., iris)
classifier
# inspect the rule base sorted by the larges class weight
inspect(sort(classifier$rules, by = "weight"))
# make predictions for the first few instances of iris
predict(classifier, head(iris))
table(pred = predict(classifier, iris), true = iris$Species)
# plot the cross-validation curve as a function of lambda and add a
# red line at lambda.1se used to determine lambda.
plot(classifier$model$cv)
abline(v = log(classifier$model$cv$lambda.1se), col = "red")
# plot the coefficient profile plot (regularization path) for each class
# label. Note the line for the chosen lambda is only added to the last plot.
# You can manually add it to the others.
plot(classifier$model$reg_model, xvar = "lambda", label = TRUE)
abline(v = log(classifier$model$cv$lambda.1se), col = "red")
#' inspect rule 11 which has a large weight for class virginica
inspect(classifier$model$all_rules[11])