bn.cv {bnlearn}R Documentation

Cross-validation for Bayesian networks

Description

Perform a k-fold or hold-out cross-validation for a learning algorithm or a fixed network structure.

Usage

bn.cv(data, bn, loss = NULL, ..., algorithm.args = list(),
  loss.args = list(), fit, fit.args = list(), method = "k-fold",
  cluster, debug = FALSE)

## S3 method for class 'bn.kcv'
plot(x, ..., main, xlab, ylab, connect = FALSE)
## S3 method for class 'bn.kcv.list'
plot(x, ..., main, xlab, ylab, connect = FALSE)

loss(x)

Arguments

data

a data frame containing the variables in the model.

bn

either a character string (the label of the learning algorithm to be applied to the training data in each iteration) or an object of class bn (a fixed network structure).

loss

a character string, the label of a loss function. If none is specified, the default loss function is the Classification Error for Bayesian networks classifiers; otherwise, the Log-Likelihood Loss for both discrete and continuous data sets. See below for additional details.

algorithm.args

a list of extra arguments to be passed to the learning algorithm.

loss.args

a list of extra arguments to be passed to the loss function specified by loss.

fit

a character string, the label of the method used to fit the parameters of the network. See bn.fit for details.

fit.args

additional arguments for the parameter estimation procedure, see again bn.fit for details.

method

a character string, either k-fold, custom-folds or hold-out. See below for details.

cluster

an optional cluster object from package parallel.

debug

a boolean value. If TRUE a lot of debugging output is printed; otherwise the function is completely silent.

x

an object of class bn.kcv or bn.kcv.list returned by bn.cv().

...

additional objects of class bn.kcv or bn.kcv.list to plot alongside the first.

main, xlab, ylab

the title of the plot, an array of labels for the boxplot, the label for the y axis.

connect

a logical value. If TRUE, the medians points in the boxplots will be connected by a segmented line.

Value

bn.cv() returns an object of class bn.kcv.list if runs is at least 2, an object of class bn.kcv if runs is equal to 1.

loss() returns a numeric vector with a length equal to runs.

Cross-Validation Strategies

The following cross-validation methods are implemented:

If cross-validation is used with multiple runs, the overall loss is the averge of the loss estimates from the different runs.

To clarify, cross-validation methods accept the following optional arguments:

Loss Functions

The following loss functions are implemented:

Optional arguments that can be specified in loss.args are:

Note that if bn is a Bayesian network classifier, pred and pred-lw both give exact posterior predictions computed using the closed-form formulas for naive Bayes and TAN.

Plotting Results from Cross-Validation

Both plot methods accept any combination of objects of class bn.kcv or bn.kcv.list (the first as the x argument, the remaining as the ... argument) and plot the respected expected loss values side by side. For a bn.kcv object, this mean a single point; for a bn.kcv.list object this means a boxplot.

Author(s)

Marco Scutari

References

Koller D, Friedman N (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press.

See Also

bn.boot, rbn, bn.kcv-class.

Examples

bn.cv(learning.test, 'hc', loss = "pred", loss.args = list(target = "F"))

folds = list(1:2000, 2001:3000, 3001:5000)
bn.cv(learning.test, 'hc', loss = "logl", method = "custom-folds",
  folds = folds)

xval = bn.cv(gaussian.test, 'mmhc', method = "hold-out",
         k = 5, m = 50, runs = 2)
xval
loss(xval)

## Not run: 
# comparing algorithms with multiple runs of cross-validation.
gaussian.subset = gaussian.test[1:50, ]
cv.gs = bn.cv(gaussian.subset, 'gs', runs = 10)
cv.iamb = bn.cv(gaussian.subset, 'iamb', runs = 10)
cv.inter = bn.cv(gaussian.subset, 'inter.iamb', runs = 10)
plot(cv.gs, cv.iamb, cv.inter,
  xlab = c("Grow-Shrink", "IAMB", "Inter-IAMB"), connect = TRUE)

# use custom folds.
folds = split(sample(nrow(gaussian.subset)), seq(5))
bn.cv(gaussian.subset, "hc", method = "custom-folds", folds = folds)

# multiple runs, with custom folds.
folds = replicate(5, split(sample(nrow(gaussian.subset)), seq(5)),
          simplify = FALSE)
bn.cv(gaussian.subset, "hc", method = "custom-folds", folds = folds)

## End(Not run)

[Package bnlearn version 5.0 Index]