discrepancy {FactorCopula}R Documentation

Diagnostics to detect a factor dependence structure

Description

The diagnostic method in Joe (2014, pages 245-246) to show that each dataset has a factor structure based on linear factor analysis. The correlation matrix \R_{\mathrm{observed}} has been obtained based on the sample correlations from the bivariate pairs of the observed variables. These are the linear (when both variables are continuous), polychoric (when both variables are ordinal), and polyserial (when one variable is continuous and the other is ordinal) sample correlations among the observed variables. The resulting \R_{\mathrm{observed}} is generally positive definite if the sample size is not small enough; if not one has to convert it to positive definite. We calculate various measures of discrepancy between \R_{\mathrm{observed}} and \R_{\mathrm{model}} (the resulting correlation matrix of linear factor analysis), such as the maximum absolute correlation difference D_1=\max|\R_{\mathrm{model}} - \R_{\mathrm{observed}}|, the average absolute correlation difference D_2=\mathrm{avg}| \R_{\mathrm{model}} - \R_{\mathrm{observed}}|, and the correlation matrix discrepancy measure D_3=\log\bigl( \det(\R_{\mathrm{model}}) \bigr) - \log\bigl( \det(\R_{\mathrm{observed}})\bigr) + \mathrm{tr}( \R^{-1}_{\mathrm{model}} \R_{\mathrm{observed}} ) - d.

Usage

discrepancy(cormat, n, f3)

Arguments

cormat

\R_{\mathrm{observed}}.

n

Sample size.

f3

If TRUE, then the linear 3-factor analysis is fitted.

Value

A matrix with the calculated discrepancy measures for different number of factors.

Author(s)

Sayed H. Kadhem s.kadhem@uea.ac.uk
Aristidis K. Nikoloulopoulos a.nikoloulopoulos@uea.ac.uk

References

Joe, H. (2014). Dependence Modelling with Copulas. Chapman & Hall, London.

Kadhem, S.H. and Nikoloulopoulos, A.K. (2021) Factor copula models for mixed data. British Journal of Mathematical and Statistical Psychology, 74, 365–403. doi:10.1111/bmsp.12231.

Examples

#------------------------------------------------
#                     PE Data
#------------------             -----------------
data(PE)
#correlation
continuous.PE1 <- -PE[,1] 
continuous.PE <- cbind(continuous.PE1, PE[,2])
u.PE <- apply(continuous.PE, 2, rank)/(nrow(PE)+1)
z.PE <- qnorm(u.PE)
categorical.PE <- data.frame(apply(PE[, 3:5], 2, factor))
nPE <- cbind(z.PE, categorical.PE)

#-------------------------------------------------
# Discrepancy measures----------------------------
#-------------------------------------------------
#correlation matrix for mixed data
cormat.PE <- as.matrix(polycor::hetcor(nPE, std.err=FALSE))
#discrepancy measures
out.PE = discrepancy(cormat.PE, n = nrow(nPE), f3 = FALSE)

#------------------------------------------------
#------------------------------------------------
#                    GSS Data
#------------------             -----------------
data(GSS)
attach(GSS)
continuous.GSS <- cbind(INCOME,AGE)
continuous.GSS <- apply(continuous.GSS, 2, rank)/(nrow(GSS)+1)
z.GSS <- qnorm(continuous.GSS)
ordinal.GSS <- cbind(DEGREE,PINCOME,PDEGREE) 
count.GSS <- cbind(CHILDREN,PCHILDREN)

# Transforming the count variables to ordinal
# count1 : CHILDREN
count1  = count.GSS[,1]
count1[count1 > 3] = 3

# count2: PCHILDREN
count2  = count.GSS[,2]
count2[count2 > 7] = 7

# Combining both transformed count variables
ncount.GSS = cbind(count1, count2)

# Combining ordinal and transformed count variables
categorical.GSS <- cbind(ordinal.GSS, ncount.GSS)
categorical.GSS <- data.frame(apply(categorical.GSS, 2, factor))

# combining continuous and categorical variables
nGSS = cbind(z.GSS, categorical.GSS)

#-------------------------------------------------
# Discrepancy measures----------------------------
#-------------------------------------------------
#correlation matrix for mixed data
cormat.GSS <- as.matrix(polycor::hetcor(nGSS, std.err=FALSE))
#discrepancy measures
out.GSS = discrepancy(cormat.GSS, n = nrow(nGSS), f3 = TRUE)


[Package FactorCopula version 0.9.3 Index]