preprocess {zebu} | R Documentation |
Preprocess data
Description
Subroutine called by lassie
. Discretizes, subsets and remove missing data from a data.frame.
Usage
preprocess(x, select, continuous, breaks, default_breaks = 4)
Arguments
x |
data.frame or matrix. |
select |
optional vector of column numbers or column names specifying a subset of data to be used. By default, uses all columns. |
continuous |
optional vector of column numbers or column names specifying continuous variables that should be discretized. By default, assumes that every variable is categorical. |
breaks |
numeric vector or list passed on to |
default_breaks |
default break points for discretizations.
Same syntax as in |
Value
List containing the following values:
raw: raw subsetted data.frame
pp: discretized, subsetted and complete data.frame
select
continuous
breaks
default_breaks
Examples
# This is what happens behind the curtains in the 'lassie' function
# Here we compute the association between the 'Girth' and 'Height' variables
# of the 'trees' dataset
# 'select' and 'continuous' take column numbers or names
select <- c('Girth', 'Height') # select subset of trees
continuous <-c(1, 2) # both 'Girth' and 'Height' are continuous
# equal-width discretization with 3 bins
breaks <- 3
# Preprocess data: subset, discretize and remove missing data
pre <- preprocess(trees, select, continuous, breaks)
# Estimates marginal and multivariate probabilities from preprocessed data.frame
prob <- estimate_prob(pre$pp)
# Computes local and global association using Ducher's Z
lam <- local_association(prob, measure = 'z')