dmi {mpmi} | R Documentation |
Calculate BCMI for categorical (discrete) data
Description
This function calculates MI and BCMI between a set of discrete variables held as columns in a matrix. It also performs jackknife bias correction and provides a z-score for the hypothesis of no association. Also included are the *.pw functions that calculate MI between two vectors only. The *njk functions do not perform the jackknife and are therefore faster.
Usage
dmi(dmat)
dminjk(dmat)
dmi.pw(disc1, disc2)
dminjk.pw(disc1, disc2)
Arguments
dmat |
The data matrix. Each row is an observation and each column is a variable of interest. Should contain categorical data, all types of data will be coerced via factors to integers. |
disc1 |
A vector for the pairwise version |
disc2 |
A vector for the pairwise version |
Details
The results of dmi() are in many ways similar to a correlation matrix, with each row and column index corresponding to a given variable. dminjk() and dminjk.pw() just returns the MI values without performing the jackknife. The number of processor cores used can be changed by setting the environment variable "OMP_NUM_THREADS" before starting R.
Value
Returns a list of 3 matrices each of size ncol(dmat)
by
ncol(dmat)
mi |
The raw MI estimates. |
bcmi |
Jackknife bias corrected MI estimates (BCMI). These are each MI value minus the corresponding jackknife estimate of bias. |
zvalues |
Z-scores for each hypothesis that the corresponding bcmi value is zero. These have poor statistical properties but can be useful as a rough measure of the strength of association. |
Examples
data(cars)
# Discretise the data first
d <- cut(cars$dist, breaks = 10)
s <- cut(cars$speed, breaks = 10)
# Discrete MI values
dmi.pw(s, d)
# For comparison, analysed as continuous data:
cmi.pw(cars$dist, cars$speed)
# Exploring a group of categorical variables
dat <- mtcars[, c("cyl","vs","am","gear","carb")]
discresults <- dmi(dat)
discresults
# Plot the relative magnitude of the BCMI values
diag(discresults$bcmi) <- NA
mp(discresults$bcmi)