predict and impute {bnlearn}R Documentation

Predict or impute missing data from a Bayesian network

Description

Impute missing values in a data set or predict a variable from a Bayesian network.

Usage

## S3 method for class 'bn.fit'
predict(object, node, data, cluster, method = "parents", ...,
  prob = FALSE, debug = FALSE)

impute(object, data, cluster, method, ..., strict = TRUE, debug = FALSE)

Arguments

object

an object of class bn.fit for impute; or an object of class bn or bn.fit for predict.

data

a data frame containing the data to be imputed. Complete observations will be ignored.

node

a character string, the label of a node.

cluster

an optional cluster object from package parallel.

method

a character string, the method used to impute the missing values or predict new ones. The default value is parents.

...

additional arguments for the imputation method. See below.

prob

a boolean value. If TRUE and object is a discrete network, the probabilities used for prediction are attached to the predicted values as an attribute called prob.

strict

a boolean value. If TRUE, impute() will produce an error if the data were not imputed successfully, that is, if they still contain missing values. If FALSE, it will return the partially imputed data with a warning.

debug

a boolean value. If TRUE a lot of debugging output is printed; otherwise the function is completely silent.

Details

predict() returns the predicted values for node given the data specified by data and the fitted network. Depending on the value of method, the predicted values are computed as follows.

impute() is based on predict(), and can impute missing values with the same methods (parents, bayes-lw and exact). The method bayes-lw can take an additional argument n with the number of random samples which are averaged for each observation. As in predict(), imputed values will differ in each call to impute() when method is set to bayes-lw.

If object contains NA parameter estimates (because of unobserved discrete parents configurations in the data the parameters were learned from), predict will predict NAs when those parents configurations appear in data. See bn.fit for details on how to make sure bn.fit objects contain no NA parameter estimates.

Value

predict() returns a numeric vector (for Gaussian and conditional Gaussian nodes), a factor (for categorical nodes) or an ordered factor (for ordinal nodes). If prob = TRUE and the network is discrete, the probabilities used for prediction are attached to the predicted values as an attribute called prob.

impute() returns a data frame with the same structure as data.

Note

Ties in prediction are broken using Bayesian tie breaking, i.e. sampling at random from the tied values. Therefore, setting the random seed is required to get reproducible results.

Classifiers have a separate predict() method, see naive.bayes.

Author(s)

Marco Scutari

Examples

# missing data imputation.
with.missing.data = gaussian.test
with.missing.data[sample(nrow(with.missing.data), 500), "F"] = NA
fitted = bn.fit(model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]"),
           gaussian.test)
imputed = impute(fitted, with.missing.data)

# predicting a variable in the test set.
training = bn.fit(model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]"),
           gaussian.test[1:2000, ])
test = gaussian.test[2001:nrow(gaussian.test), ]
predicted = predict(training, node = "F", data = test)

# obtain the conditional probabilities for the values of a single variable
# given a subset of the rest, they are computed to determine the predicted
# values.
fitted = bn.fit(model2network("[A][C][F][B|A][D|A:C][E|B:F]"), learning.test)
evidence = data.frame(A = factor("a", levels = levels(learning.test$A)),
                      F = factor("b", levels = levels(learning.test$F)))
predicted = predict(fitted, "C", evidence,
              method = "bayes-lw", prob = TRUE)
attr(predicted, "prob")

[Package bnlearn version 5.0 Index]