R: BNDataset class.

BNDataset-class {bnstruct}

R Documentation

BNDataset class.

Description

Contains the all of the data that can be extracted from a given dataset: raw data, imputed data, raw and imputed data with bootstrap.

Usage

BNDataset(data, discreteness, variables = NULL, node.sizes = NULL, ...)

## S4 method for signature 'BNDataset'
initialize(.Object)

Arguments

`.Object`	an empty BNDataset.
`data`	raw data.frame or path/name of the file containing the raw dataset (see 'Details').
`discreteness`	a vector of booleans indicating if the variables are discrete or continuous (`TRUE` and `FALSE`, respectively), or path/name of the file containing header information for the dataset (discreteness, variable names, cardinality - see 'Details').
`variables`	vector of variable names.
`node.sizes`	vector of variable cardinalities (for discrete variables) or quantization ranges (for continuous variables).
`...`	further arguments for reading a dataset from files (see documentation for `read.dataset`).

Details

There are two ways to build a BNDataset: using two files containing respectively header informations and data, and manually providing the data table and the related header informations (variable names, cardinality and discreteness).

The key informations needed are: 1. the data; 2. the state of variables (discrete or continuous); 3. the names of the variables; 4. the cardinalities of the variables (if discrete), or the number of levels they have to be quantized into (if continuous). Names and cardinalities/leves can be guessed by looking at the data, but it is strongly advised to provide _all_ of the informations, in order to avoid problems later on during the execution.

Data can be provided in form of data.frame or matrix. It can contain NAs. By default, NAs are indicated with '?'; to specify a different character for NAs, it is possible to provide also the na.string.symbol parameter. The values contained in the data have to be numeric (real for continuous variables, integer for discrete ones). The default range of values for a discrete variable X is [1,|X|], with |X| being the cardinality of X. The same applies for the levels of quantization for continuous variables. If the value ranges for the data are different from the expected ones, it is possible to specify a different starting value (for the whole dataset) with the starts.from parameter. E.g. by starts.from=0 we assume that the values of the variables in the dataset have range [0,|X|-1]. Please keep in mind that the internal representation of bnstruct starts from 1, and the original starting values are then lost.

It is possible to use two files, one for the data and one for the metadata, instead of providing manually all of the info. bnstruct requires the data files to be in a format subsequently described. The actual data has to be in (a text file containing data in) tabular format, one tuple per row, with the values for each variable separated by a space or a tab. Values for each variable have to be numbers, starting from 1 in case of discrete variables. Data files can have a first row containing the names of the corresponding variables.

In addition to the data file, a header file containing additional informations can also be provided. An header file has to be composed by three rows of tab-delimited values: 1. list of names of the variables, in the same order of the data file; 2. a list of integers representing the cardinality of the variables, in case of discrete variables, or the number of levels each variable has to be quantized in, in case of continuous variables; 3. a list that indicates, for each variable, if the variable is continuous (c or C), and thus has to be quantized before learning, or discrete (d or D). In case of need of more advanced options when reading a dataset from files, please refer to the documentation of the read.dataset method. Imputation and bootstrap are also available as separate routines (impute and bootstrap, respectively).

In case of an evolving system to be modeled as a Dynamic Bayesian Network, it is possible to specify only the description of the variables of a single instant; the information will be replicated for all the num.time.steps instants that compose the dataset, where num.time.steps needs to be set as parameter. In this case, it is assumed that the N variables v1, v2, ..., vN of a single instant appear in the dataset as v1_t1, v2_t1, ..., vN_t1, v1_t2, v2_t2, ..., in this exact order. The user can however provide information for all the variables in all the instants; if it is not the case, the name of the variables will be edited to include the instant. In case of an evolving system, the num.variables slots refers anyway to the total number of variables observed in all the instants (the number of columns in the dataset), and not to a single instant.

Value

BNDataset object.

a BNDataset object.

Slots

name:: name of the dataset
header.file:: name and location of the header file
data.file:: name and location of the data file
variables:: names of the variables in the network
node.sizes:: cardinality of each variable of the network
num.variables:: number of variables (columns) in the dataset
discreteness:: TRUE if variable is discrete, FALSE if variable is continue
quantiles:: list of vectors containing the quantiles, one vector per variable. Each vector is NULL if the variable is discrete, and contains the quantiles if it is continuous
num.items:: number of observations (rows) in the dataset
has.raw.data:: TRUE if the dataset contains data read from a file
has.imputed.data:: TRUE if the dataset contains imputed data (computed from raw data)
raw.data:: matrix containing raw data
imputed.data:: matrix containing imputed data
has.boots:: dataset has bootstrap samples
boots:: list of bootstrap samples
has.imputed.boots:: dataset has imputed bootstrap samples
imp.boots:: list of imputed bootstrap samples
num.boots:: number of bootstrap samples
num.time.steps:: number of instants in which the network is observed (1, unless it is a dynamic system)

Examples

## Not run: 
# create from files
dataset <- BNDataset("file.data", "file.header")

# other way: create from raw dataset and metadata
data <- matrix(c(1:16), nrow = 4, ncol = 4)
dataset <- BNDataset(data = data,
                     discreteness = rep('d',4),
                     variables = c("a", "b", "c", "d"),
                     node.sizes = c(4,8,12,16))

## End(Not run)

[Package bnstruct version 1.0.15 Index]