ssvdEN_sol_path {MOSS} | R Documentation |
'Solution path' for sparse Singular Value Decomposition via Elastic Net.
Description
This function allows to explore values on the solution path of the sparse singular value decomposition (SVD) problem. The goal of this is to tune the degree of sparsity of subjects, features, or both subjects/features. The function performs a penalized SVD that imposes sparsity/smoothing in both left and right singular vectors. The penalties at both levels are Elastic Net-like, and the trade-off between ridge and Lasso like penalties is controlled by two 'alpha' parameters. The proportion of variance explained is the criteria used to choose the optimal degrees of sparsity.
Usage
ssvdEN_sol_path(
O,
center = TRUE,
scale = TRUE,
dg.grid.right = seq_len(ncol(O)) - 1,
dg.grid.left = NULL,
n.PC = 1,
svd.0 = NULL,
alpha.f = 1,
alpha.s = 1,
maxit = 500,
tol = 0.001,
approx = FALSE,
plot = FALSE,
ncores = 1,
verbose = TRUE,
lib.thresh = TRUE,
left.lab = "Subjects",
right.lab = "Features",
exact.dg = FALSE
)
Arguments
O |
Numeric matrix of n subjects (rows) and p features (columns). Only objects supported are 'matrix' and 'FBM'. |
center |
Should we center? Logical. Defaults to TRUE. |
scale |
Should we scale? Logical. Defaults to TRUE. |
dg.grid.right |
Grid with degrees of sparsity at the features level. Numeric. Default is the entire solution path for features (i.e. 1 : (ncol(O) - 1)). |
dg.grid.left |
Grid with degrees of sparsity at the subjects level. Numeric. Defaults to dg.grid.left = nrow(O). |
n.PC |
Number of desired principal axes. Numeric. Defaults to 1. |
svd.0 |
Initial SVD (i.e. least squares solution). Defaults to NULL. |
alpha.f |
Elastic net mixture parameter at the features level. Measures the compromise between lasso (alpha = 1) and ridge (alpha = 0) types of sparsity. Numeric. Defaults to 1. |
alpha.s |
Elastic net mixture parameter at the subjects level. Defaults to alpha.s = 1. |
maxit |
Maximum number of iterations. Defaults to 500. |
tol |
Convergence is determined when ||U_j - U_j-1||_F < tol, where U_j is the matrix of estimated left regularized singular vectors at iteration j. |
approx |
Should we use standard SVD or random approximations? Defaults to FALSE. If TRUE & is(O,'matrix') == TRUE, irlba is called. If TRUE & is(O, "FBM") == TRUE, big_randomSVD is called. |
plot |
Should we plot the solution path? Logical. Defaults to FALSE |
ncores |
Number of cores used by big_randomSVD. Default does not use parallelism. Ignored when is(O, "FBM") == TRUE. |
verbose |
Should we print messages?. Logical. Defaults to TRUE. |
lib.thresh |
Should we use a liberal or conservative threshold to tune degrees of sparsity? Logical. Defaults to TRUE. |
left.lab |
Label for the subjects level. Character. Defaults to 'subjects'. |
right.lab |
Label for the features level. Character. Defaults to 'features'. |
exact.dg |
Should we compute exact degrees of sparsity? Logical. Defaults to FALSE. Only relevant When alpha.s or alpha.f are in the (0,1) interval and exact.dg = TRUE. |
Details
The function returns the degree of sparsity for which the change in PEV is the steepest ('liberal' option), or for which the change in PEV stabilizes ('conservative' option). This heuristics relax the need of tuning parameters on a testing set.
For one PC (rank 1 case), the algorithm finds vectors u, w that minimize: ||x - u w'||_F^2 + lambda_w (alpha_w||w||_1 + (1 - alpha_w)||w||_F^2) + lambda_u (alpha||u||_1 + (1 - alpha_u)||u||_F^2) such that ||u|| = 1. The right Eigen vector is obtained from v = w / ||w|| and the corresponding Eigen value = u^T x v. The penalties lambda_u and lambda_w are mapped from specified desired degrees of sparsity (dg.spar.features & dg.spar.subjects).
Value
A list with the results of the (sparse) SVD and (if argument 'plot'=TRUE) the corresponding graphical displays.
SVD: a list with the results of the (sparse) SVD, containing:
u: Matrix with left eigenvectors.
v: Matrix with right eigenvectors.
d: Matrix with singular values.
opt.dg.right: Selected degrees of sparsity for right eigenvectors.
opt.dg.left: Selected degrees of sparsity for left eigenvectors.
plot: A ggplot object.
Note
Although the degree of sparsity maps onto number of features/subjects for Lasso, the user needs to be aware that this conceptual correspondence is lost for full EN (alpha belonging to (0, 1); e.g. the number of features selected with alpha < 1 will be eventually larger than the optimal degree of sparsity). This allows to rapidly increase the number of non-zero elements when tuning the degrees of sparsity. In order to get exact values for the degrees of sparsity at subjects or features levels, the user needs to set the value of 'exact.dg' parameter from 'FALSE' (the default) to 'TRUE'.
References
Shen, Haipeng, and Jianhua Z. Huang. 2008. Sparse Principal Component Analysis via Regularized Low Rank Matrix Approximation. Journal of Multivariate Analysis 99 (6).
Baglama, Jim, Lothar Reichel, and B W Lewis. 2018. Irlba: Fast Truncated Singular Value Decomposition and Principal Components Analysis for Large Dense and Sparse Matrices.
Examples
library("MOSS")
# Extracting simulated omic blocks.
sim_blocks <- simulate_data()$sim_blocks
X <- sim_blocks$`Block 3`
# Tuning sparsity degree for features (increments of 20 units).
out <- ssvdEN_sol_path(X, dg.grid.right = seq(1, 1000, by = 20))