inDep {AssocBin}R Documentation

Test pairwise variable independence

Description

This is a high-level function which accepts a data set, stop criteria, and split functions for continuous variables and then applies a chi-square test for independence to bins generated by recursively binning the ranks of continuous variables or implied by the combinations of levels of categorical variables.

Usage

inDep(
  data,
  stopCriteria,
  catCon = uniRIntSplit,
  conCon = rIntSplit,
  dropPoints = FALSE
)

Arguments

data

'data.frame' or object coercible to a 'data.frame'

stopCriteria

output of 'makeCriteria' providing criteria used to stop binning to be passed to binning functions

catCon

splitting function to apply to pairs of one cateogorical and one continuous variable

conCon

splitting function to apply to pairs of continuous variables

dropPoints

logical; should returned bins contain points?

Details

The output of 'inDep' is a list, the first element of which is a list of lists, each of which records the details of the binning of a particular pair of variables

Value

An 'inDep' object, with slots 'data', 'types', 'pairs', 'binnings', 'residuals', 'statistics', 'dfs', 'logps', and 'pvalues' that stores the results of using recursive binning with the specified splitting logic to test independence on a data set. 'data' gives the name of the data object in the global environment which was split, 'types' is a character vector giving the data types of each pair, 'pairs' is a character vector of the variable names of each pair, 'binnings' is a list of lists where each list is the binning fir to the corresponding pair by the recursive binning algorithm, 'residuals' is list of numeric vectors giving the residual for each bin of each pairwise binning, 'statistics' is a numeric vector giving the chi-squared statistic for each binning, 'dfs' is a numeric vector giving the degrees of freedom of each binning based on the variable type combination and the final number of bins, 'logps' gives the natural logarithm of the statistic's p-value, and finally 'pvalues' is a numeric vector of p-values for 'statistics' assuming a chi-squared null distribution with 'dfs' degrees of freedom. Internally, the p-values are computed on the log scale to better distinguish between strongly dependent pairs and the 'pvalues' returned are computed by calling 'exp(logps)'. The order of all returned values is by increasing 'logps'.

Author(s)

Chris Salahub


[Package AssocBin version 1.0-0 Index]