R: Core consistency diagnostic for PARAFAC models

feemcorcondia {albatross}

R Documentation

Core consistency diagnostic for PARAFAC models

Description

Compute the core consistency diagnostic (“CORCONDIA”) by fitting a “Tucker3” core array to the existing PARAFAC loadings.

Usage

  feemcorcondia(
    model, divisor = c("nfac", "core"),
    kind = c('pinv', 'iterative', 'vec'), ...
  )
  ## S3 method for class 'feemcorcondia'
print(x, ...)

Arguments

`model`	A PARAFAC model returned by `feemparafac`.
`divisor`	The divisor used in computation of the CORCONDIA value, see Details.
`kind`	A string choosing the method used to compute the least squares Tucker3 core. Defaults to “pinv” for PARAFAC models without missing data and “iterative” for models where missing data is present. See Details.
`x`	An object returned by `feemcorcondia`.
`...`	`feemcorcondia` For `kind = 'iterative'`, forwarded to `optim` (see Details). Otherwise, not allowed. `print.feemcorcondia` Ignored.

Details

The “Tucker3” model uses three loading matrices and a small three-way “core array” to describe a larger three-way array:

X_{i,j,k} = \sum_r \sum_s \sum_t A_{i,r} B_{j,s} C_{k,t} G_{r,s,t}

It's easy to show that constraining G_{r,s,t} = 1_{r = s = t} makes the Tucker3 model equivalent to a PARAFAC model. The core consistency diagnostic works by constraining the loading matrices of a Tucker3 model to the existing loading matrices from a PARAFAC model and estimating the core array. The closer the resulting \mathbf{G} tensor is to a diagonal one, the better.

Given the least-squares estimated core tensor \mathbf{G}, the ideal core tensor T_{r,s,t} = 1_{r = s = t} and the denominator D, the CORCONDIA metric is defined as follows:

\left( 1 - \frac{ \sum_r \sum_s \sum_t (G_{r,s,t} - T_{r,s,t})^2 }{D} \right) \cdot 100\%

The denominator can be chosen to be either \sum_r \sum_s \sum_t T_{r,s,t}^2, which is equal to the number of factors in the model (divisor = 'nfac'), or \sum_r \sum_s \sum_t G_{r,s,t}^2, which will avoid resulting negative values (divisor = 'core').

There are multiple ways how the least squares Tucker3 core can be computed. When no data is missing, the matricised form of the model can be used to derive the expression (kind = 'pinv', the default in such cases):

\mathbf{X} = \mathbf{A} \mathbf{G} (\mathbf{C} \otimes \mathbf{B})^\top + \epsilon

\hat{\mathbf{G}} = \mathbf{A}^{+} \mathbf{X} ( (\mathbf{C}^\top)^{+} \otimes (\mathbf{B}^\top)^{+} )

With missing data present, a binary matrix of weights \mathbf{W} appears:

\min_\mathbf{G} \left\| \mathbf W \circ ( \mathbf{A} \mathbf{G} (\mathbf{C} \otimes \mathbf{B})^\top - \mathbf{X} ) \right\|^2

A gradient-based method can be used to solve this problem iteratively without allocating too much memory, but care must be taken to ensure convergence. For kind = 'iterative' (which is the default for models with missing data), optim is used with parameters method = 'L-BFGS-B', control = list(maxit = 1000, factr = 1). Warnings will be produced if the method doesn't indicate successful convergence.

The problem can also be solved exactly by unfolding the tensor into a vector and skipping the elements marked as missing:

\min_\mathbf{G} \left\| \mbox{vec}( \mathbf{A} \mathbf{G} (\mathbf{C} \otimes \mathbf{B})^\top )_{\mbox{non-missing}} - \mbox{vec}(\mathbf{X})_{\mbox{non-missing}} \right\|^2

\mbox{vec}(\mathbf{C} \otimes \mathbf{B} \otimes \mathbf{A}) _{\mbox{non-missing}} \times \mbox{vec}\:\mathbf{G} = \mbox{vec}(\mathbf{X})_{\mbox{non-missing}}

Unfortunately, when this method is used (kind = 'vec'), the left-hand side of the equation has the size of \mathbf{X} times the number of components cubed, which grows very quickly for models with large numbers of components.

Value

A numeric scalar of class feemcorcondia with the following attributes:

`divisor`	The `divisor` argument, expanded to one of the valid options.
`core`	A three-way array containing the least-squares Tucker core for the given PARAFAC model.

References

Bro R, Kiers HAL (2003). “A new efficient method for determining the number of components in PARAFAC models.” Journal of Chemometrics, 17(5), 274-286. ISSN 0886-9383, doi:10.1002/cem.801.

Examples

  data(feems)
  cube <- feemscale(feemscatter(cube, c(20, 14)), na.rm = TRUE)
  # kind = 'vec' is exact but may take a lot of memory
  feemcorcondia(feemparafac(cube, nfac = 3, ctol = 1e-4), kind = 'vec')
  # kind = 'iterative' used by default for models with missing data
  feemcorcondia(feemparafac(cube, nfac = 4, ctol = 1e-4))

[Package albatross version 0.3-8 Index]