R: Sparse Principal Loading Analysis

spla {prinvars}

R Documentation

Sparse Principal Loading Analysis

Description

This function performs sparse principal loading analysis on the given data matrix. We refer to Bauer (2022) for more information. The corresponding sparse loadings are calculated either using PMD from the PMA package or using spca from the elasticnet package. The respective methods are given by Zou et al. (2006) and Witten et al. (2009) respectively.

Usage

spla(
  x,
  method = c("pmd", "spca"),
  para,
  cor = FALSE,
  criterion = c("corrected", "normal"),
  threshold = 1e-07,
  rho = 1e-06,
  max.iter = 200,
  trace = FALSE,
  eps.conv = 0.001,
  orthogonal = TRUE,
  check = c("rnc", "rows"),
  ...
)

Arguments

`x`	a numeric matrix or data frame which provides the data for the sparse principal loading analysis.
`method`	chooses the methods to calculate the sparse loadings. `pmd` uses the method from Witten et al. (2009) and `spca` uses the method from Zou et al. (2006).
`para`	when `method="pmd"`: an integer giving the bound for the L1 regularization. When `method="spca"`: a vector containing the regularization parameter for each variable.
`cor`	a logical value indicating whether the calculation should use the correlation or the covariance matrix.
`criterion`	a character string indicating if the weight-corrected evaluation criterion (CEC) or the evaluation criterion (EC) is used. `corrected` changes the loadings to weight all variables equally while `normal` does not change the loadings.
`threshold`	a numeric value used to determine zero elements in the loading. This serves mostly to correct approximation errors.
`rho`	penalty parameter. When `method="SPCA"`, we need further regularizations for the case when the number of variables is larger than the number of observations. We refer to Zou et al. (2006) and Bauer (2022) for more details.
`max.iter`	maximum number of iterations.
`trace`	a logical value indicating if the progress is printed.
`eps.conv`	a numerical value as convergence criterion.
`orthogonal`	a logical value indicating if the sparse loadings are orthogonalized.
`check`	a character string indicating if only rows or rows as well as columns are used to detect the underlying block structure. `rows` checks if the rows fulfill the required structure. `rnc` checks if rows and columns fulfill the required structure.
`...`	further arguments passed to or from other methods.

Value

single or list of pla class containing the following attributes:

`x`	a numeric matrix or data frame which equals the input of `x`.
`EC`	a numeric vector that contains the weight-corrected evaluation criterion (CEC) if `criterion="corrected"` and the evaluation criterion (EC) if `criterion="normal"`.
`loadings`	a matrix of variable loadings (i.e. a matrix containing the sparse loadings).
`blocks`	a list of blocks which are identified by sparse principal loading analysis.
`W`	a matrix of variable loadings used to calculate the evaluation criterion. If `criterion="corrected"`, `W` contains an orthogonal matrix with equal weights in the first column of each loading-block. If `criterion="normal"`, `W` are the `loadings`.

References

Bauer JO (2022). “Variable selection and covariance structure identification using sparse principal loading analysis.” Working Paper. Witten DM, Tibshirani R, Hastie TA (2009). “A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.” Biostatistics, 10(3), 515-534. doi:10.1093/biostatistics/kxp008. Zou H, Hastie T, Tibshirani R (2006). “Sparse Principal Component Analysis.” Journal of Computational and Graphical Statistics, 15(2), 265–286. ISSN 1061-8600, doi:10.1198/106186006X113430.

Examples

#############
## First example: we apply SPLA to a classic example from PCA
#############

spla(USArrests, method = "spca", para=c(0.5, 0.5, 0.5, 0.5), cor=TRUE)

## we obtain two blocks:
## 1x1 (Urbanpop) and 3x3 (Murder, Aussault, Rape).
## The large CEC of 0.922 indicates that the given structure is reasonable.

spla(USArrests, method = "spca", para=c(0.5, 0.5, 0.7, 0.5), cor=TRUE)

## we obtain three blocks:
## 1x1 (Urbanpop), 1x1 (Rape) and 2x2 (Murder, Aussault).
## The mid-ish CEC of 0.571 for (Murder, Aussault) indicates that the found 
## structure might not be adequate.

#############
## Second example: we replicate a synthetic example similar to Bauer (2022)
#############

set.seed(1)
N = 500
V1 = rnorm(N,0,10)
V2 = rnorm(N,0,11)

## Create the blocks (X_1,...,X_4) and (X_5,...,X_8) synthetically

X1 = V1 + rnorm(N,0,1) #X_j = V_1 + N(0,1) for j =1,...,4
X2 = V1 + rnorm(N,0,1)
X3 = V1 + rnorm(N,0,1)
X4 = V1 + rnorm(N,0,1)

X5 = V2 + rnorm(N,0,1) #X_j = V_1 + N(0,1) for j =5,...9
X6 = V2 + rnorm(N,0,1)
X7 = V2 + rnorm(N,0,1)
X8 = V2 + rnorm(N,0,1)

X = cbind(X1, X2, X3, X4, X5, X6, X7, X8)

## Conduct SPLA to obtain the blocks (X_1,...,X_4) and (X_5,...,X_8)

## use method = "pmd" (default)
spla(X, para = 1.4)

## use method = "spca"
spla(X, method = "spca", para = c(500,60,3,8,5,7,13,4))

[Package prinvars version 1.0.0 Index]