entropy_trivar {netropy} | R Documentation |
Trivariate Entropy
Description
Computes trivariate entropies of all triples of (discrete) variables in a multivariate data set.
Usage
entropy_trivar(dat)
Arguments
dat |
dataframe with rows as observations and columns as variables. Variables must all be observed or transformed categorical with finite range spaces. |
Details
Trivariate entropies can be used to check for functional relationships and
stochastic independence between triples of variables.
The trivariate entropy H(X,Y,Z) of three discrete random variables X, Y and Z
is bounded according to
H(X,Y) <= H(X,Y,Z) <= H(X,Z) + H(Y,Z) - H(Z).
The increment between the trivariate entropy and its lower bound is equal to the expected conditional entropy.
Value
Dataframe with the first three columns representing possible triples of variables (V1,V2,V3
)
and the fourth column gives trivariate entropies H(V1,V2,V3)
.
Author(s)
Termeh Shafie
References
Frank, O., & Shafie, T. (2016). Multivariate entropy analysis of network data.
Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique, 129(1), 45-63.
Nowicki, K., Shafie, T., & Frank, O. (Forthcoming 2022). Statistical Entropy Analysis of Network Data.
See Also
entropy_bivar
, prediction_power
Examples
# use internal data set
data(lawdata)
df.att <- lawdata[[4]]
# three steps of data editing:
# 1. categorize variables 'years' and 'age' based on
# approximately three equally size groups (values based on cdf)
# 2. make sure all outcomes start from the value 0 (optional)
# 3. remove variable 'senior' as it consists of only unique values (thus redundant)
df.att.ed <- data.frame(
status = df.att$status,
gender = df.att$gender,
office = df.att$office-1,
years = ifelse(df.att$years<=3,0,
ifelse(df.att$years<=13,1,2)),
age = ifelse(df.att$age<=35,0,
ifelse(df.att$age<=45,1,2)),
practice = df.att$practice,
lawschool= df.att$lawschool-1)
# calculate trivariate entropies
H.triv <- entropy_trivar(df.att.ed)