R: Convex Logistic Principal Component Analysis

convexLogisticPCA {logisticPCA}

R Documentation

Convex Logistic Principal Component Analysis

Description

Dimensionality reduction for binary data by extending Pearson's PCA formulation to minimize Binomial deviance. The convex relaxation to projection matrices, the Fantope, is used.

Usage

convexLogisticPCA(x, k = 2, m = 4, quiet = TRUE, partial_decomp = FALSE,
  max_iters = 1000, conv_criteria = 1e-06, random_start = FALSE, start_H,
  mu, main_effects = TRUE, ss_factor = 4, weights, M)

Arguments

`x`	matrix with all binary entries
`k`	number of principal components to return
`m`	value to approximate the saturated model
`quiet`	logical; whether the calculation should give feedback
`partial_decomp`	logical; if `TRUE`, the function uses the rARPACK package to quickly initialize `H` when `ncol(x)` is large and `k` is small
`max_iters`	number of maximum iterations
`conv_criteria`	convergence criteria. The difference between average deviance in successive iterations
`random_start`	logical; whether to randomly inititalize the parameters. If `FALSE`, function will use an eigen-decomposition as starting value
`start_H`	starting value for the Fantope matrix
`mu`	main effects vector. Only used if `main_effects = TRUE`
`main_effects`	logical; whether to include main effects in the model
`ss_factor`	step size multiplier. Amount by which to multiply the step size. Quadratic convergence rate can be proven for `ss_factor = 1`, but I have found higher values sometimes work better. The default is `ss_factor = 4`. If it is not converging, try `ss_factor = 1`.
`weights`	an optional matrix of the same size as the `x` with non-negative weights
`M`	depricated. Use `m` instead

Value

An S3 object of class clpca which is a list with the following components:

`mu`	the main effects
`H`	a rank `k` Fantope matrix
`U`	a `ceiling(k)`-dimentional orthonormal matrix with the loadings
`PCs`	the princial component scores
`m`	the parameter inputed
`iters`	number of iterations required for convergence
`loss_trace`	the trace of the average negative log likelihood using the Fantope matrix
`proj_loss_trace`	the trace of the average negative log likelihood using the projection matrix
`prop_deviance_expl`	the proportion of deviance explained by this model. If `main_effects = TRUE`, the null model is just the main effects, otherwise the null model estimates 0 for all natural parameters.

References

Landgraf, A.J. & Lee, Y., 2015. Dimensionality reduction for binary data through the projection of natural parameters. arXiv preprint arXiv:1510.06112.

Examples

# construct a low rank matrix in the logit scale
rows = 100
cols = 10
set.seed(1)
mat_logit = outer(rnorm(rows), rnorm(cols))

# generate a binary matrix
mat = (matrix(runif(rows * cols), rows, cols) <= inv.logit.mat(mat_logit)) * 1.0

# run convex logistic PCA on it
clpca = convexLogisticPCA(mat, k = 1, m = 4)