moss {MOSS}R Documentation

Multi-Omic integration via Sparse Singular value decomposition.

Description

This function integrates omic blocks to perform sparse singular value decomposition (SVD), non-linear embedding, and/or cluster analysis. Both supervised and unsupervised methods are supported. In both cases, if multiple omic blocks are used as predictors, they are concatenated and normalized to form an 'extended' omic matrix 'X' (Gonzalez-Reymundez and Vazquez, 2020). Supervised analysis can be obtained by indicating which omic block defines a multivariate response 'Y'. Each method within MOSS returns a matrix 'B', which form depends on the technique used (e.g. B = X in pca; B = X'Y, for pls; B = (X'X)^-X'Y, for lrr). A sparse SVD of matrix B is then obtained to summarize the variability among samples and features in terms of latent factors.

Usage

moss(
  data.blocks,
  scale.arg = TRUE,
  norm.arg = TRUE,
  method = "pca",
  resp.block = NULL,
  covs = NULL,
  K.X = 5,
  K.Y = K.X,
  verbose = TRUE,
  nu.parallel = FALSE,
  nu.u = NULL,
  nu.v = NULL,
  alpha.u = 1,
  alpha.v = 1,
  plot = FALSE,
  cluster = FALSE,
  clus.lab = NULL,
  tSNE = FALSE,
  axes.pos = seq_len(K.Y),
  approx.arg = FALSE,
  exact.dg = FALSE,
  use.fbm = FALSE,
  lib.thresh = TRUE
)

Arguments

data.blocks

List containing omic blocks of class 'matrix' or 'FBM'. In each block, rows represent subjects and columns features.

scale.arg

Should the omic blocks be centered and scaled? Logical. Defaults to TRUE.

norm.arg

Should omic blocks be normalized? Logical. Defaults to TRUE.

method

Multivariate method. Character. Defaults to 'pca'. Possible options are pca, mbpca, pca-lda, mbpca-lda, pls, mbpls, pls-lda, mbpls-lda, lrr, mblrr, lrr-lda, mblrr-lda.

resp.block

What block should be used as response? Integer. Only used when the specified method is supervised.

covs

Covariates which effect we wish to adjust for. Accepts matrix, data.frame, numeric, or character vectors.

K.X

Number of principal components for predictors. Integer. Defaults to 5.

K.Y

Number of responses PC index when method is supervised. Defaults to K.X.

verbose

Should we print messages? Logical. Defaults to TRUE.

nu.parallel

Tuning degrees of sparsity in parallel. Defaults to FALSE.

nu.u

A grid with increasing integers representing degrees of sparsity for left Eigenvectors. Defaults to NULL.

nu.v

Same but for right Eigenvectors. Defaults to NULL.

alpha.u

Elastic Net parameter for left Eigenvectors. Numeric between 0 and 1. Defaults to 1.

alpha.v

Elastic Net parameter for right Eigenvectors. Numeric between 0 and 1. Defaults to 1.

plot

Should results be plotted? Logical. Defaults to FALSE.

cluster

Arguments passed to the function tsne2clus as a list. Defaults to FALSE. If cluster=TRUE, default parameters are used (eps_range=c(0,4), eps_res=100).

clus.lab

A vector of same length than number of subjects with labels used to visualize clusters. Factor. Defaults to NULL. When sparsity is imposed on the left Eigenvectors, the association between non-zero loadings and labels' groups is shown by a Chi-2 statistics for each pc. When sparsity is not imposed, the association between labels and PC is addressed by a Kruskal-Wallis statistics.

tSNE

Arguments passed to the function pca2tsne as a list. Defaults to FALSE. If tSNE=T, default parameters are used (perp=50,n.samples=1,n.iter=1e3).

axes.pos

PC index used for tSNE. Defaults to 1 : K.Y. Used only when tSNE is different than NULL.

approx.arg

Should we use standard SVD or random approximations? Defaults to FALSE. If TRUE and at least one block is of class 'matrix', irlba is called. If TRUE & is(O,'FBM')==TRUE, big_randomSVD is called.

exact.dg

Should we compute exact degrees of sparsity? Logical. Defaults to FALSE. Only relevant When alpha.s or alpha.f are in the (0,1) interval and exact.dg = TRUE.

use.fbm

Should we treat omic blocks as Filed Backed Matrix (FBM)? Logical. Defaults to FALSE.

lib.thresh

Should we use a liberal or conservative threshold to tune degrees of sparsity? Logical. Defaults to TRUE.

Details

Once 'dense' solutions are found (the result of SVD on a matrix B), the function ssvdEN_sol_path is called to perform sparse SVD (sSVD) on a grid of possible degrees of sparsity (nu), for a possible value of the elastic net parameter (alpha). The sSVD is performed using the algorithm of Shen and Huang (2008), extended to include Elastic Net type of regularization. For one latent factor (rank 1 case), the algorithm finds vectors u and v' and scalar d that minimize:

||B-d* uv'||^2 + lambda(nu_v)(alpha_v||v'||_1 + (1-alpha_v)||v'||^2) + lambda(nu_u)(alpha_u||u||_1 + (1-alpha_u)||u||^2)

such that ||u|| = 1. The right Eigenvector is obtained from v / ||v|| and the corresponding d from u'Bv. The element lambda(nu_.) is a monotonically decreasing function of nu_. (the number of desired element different from zero) onto positive real numbers, and alpha_. is any number between zero and one balancing shrinking and variable selection. Selecting degree of sparsity: The function allows to tune the degree of sparsity using an ad-hoc method based on the one presented in Shen & Huang (2008, see reference) and generalized for tuning both nu_v and nu_u. This is done by exploring the proportion of explained variance (PEV) on a grid of possible values. Drastic and/or steep changes in the PEV trajectory across degrees of sparsity are used for automatic selection (see help for the function ssvdEN_sol_path). By imposing the additional assumption of omic blocks being conditionally independent, each multivariate technique can be extended using a 'multi-block' approach, where the contribution of each omic block to the total (co)variance is addressed. When response Y is a character column matrix, with classes or categories by subject, each multivariate technique can be extended to perform linear discriminant analysis.

Value

Returns a list with the results of the sparse SVD. If plot=TRUE, a series of plots is generated as well.

Note

  1. The function does not return PEV for EN parameter (alpha_v and/or alpha_u), the user needs to provide a single value for each.

  2. When number of PC index > 1, columns of T might not be orthogonal.

  3. Although the user is encouraged to perform data projection and cluster separately, MOSS allows to do this automatically. However, both tasks might require further tuning than the provided by default, and computations could become cumbersome.

  4. Tuning of degrees of sparsity is done heuristically on training set. In our experience, this results in high specificity, but rather low sensitivity. (i.e. too liberal cutoffs, as compared with extensive cross-validation on testing set).

  5. When 'method' is an unsupervised technique, 'K.X' is the number of latent factors returned and used in further analysis. When 'method' is a supervised technique, 'K.X' is used to perform a SVD to facilitate the product of large matrices and inverses.

  6. If 'K.X' (or 'K.Y') equal 1, no plots are returned.

  7. Although the degree of sparsity maps onto number of features/subjects for Lasso, the user needs to be aware that this conceptual correspondence is lost for full EN (alpha belonging to (0, 1); e.g. the number of features selected with alpha < 1 will be eventually larger than the optimal degree of sparsity). This allows to rapidly increase the number of non-zero elements when tuning the degrees of sparsity. In order to get exact values for the degrees of sparsity at subjects or features levels, the user needs to set the value of 'exact.dg' parameter from 'FALSE' (the default) to 'TRUE'.

References

Examples

# Example1: sparse PCA of a list of omic blocks.
library("MOSS")
sim_data <- simulate_data()
set.seed(43)

# Extracting simulated omic blocks.
sim_blocks <- sim_data$sim_blocks

# Extracting subjects and features labels.
lab.sub <- sim_data$labels$lab.sub
lab.feat <- sim_data$labels$lab.feat
out <- moss(sim_blocks[-4],
  method = "pca",
  nu.v = seq(1, 200, by = 100),
  nu.u = seq(1, 100, by = 50),
  alpha.v = 0.5,
  alpha.u = 1
)

library(ggplot2)
library(ggthemes)
library(viridis)
library(cluster)
library(fpc)

set.seed(43)

# Example2: sparse PCA with t-SNE, clustering, and association with
# predefined groups of subjects.
out <- moss(sim_blocks[-4],axes.pos=c(1:5),
  method = "pca",
  nu.v = seq(1, 200, by = 10),
  nu.u = seq(1, 100, by = 2),
  alpha.v = 0.5,
  alpha.u = 1,
  tSNE = TRUE,
  cluster = TRUE,
  clus.lab = lab.sub,
  plot = TRUE
)

# This shows clusters obtained with labels from pre-defined groups
# of subjects.
out$clus_plot

# This shows the statistical overlap between PCs and the pre-defined
# groups of subjects.
out$subLabels_vs_PCs

# This shows the contribution of each omic to the features 
# selected by PC index.
out$selected_items

# This shows features forming signatures across clusters.
out$feat_signatures

# Example3: Multi-block PCA with sparsity.
out <- moss(sim_blocks[-4],axes.pos=1:5,
  method = "mbpca",
  nu.v = seq(1, 200, by = 10),
  nu.u = seq(1, 100, by = 2),
  alpha.v = 0.5,
  alpha.u = 1,
  tSNE = TRUE,
  cluster = TRUE,
  clus.lab = lab.sub,
  plot = TRUE
)
out$clus_plot

# This shows the 'weight' each omic block has on the variability
# explained by each PC. Weights in each PC add up to one.
out$block_weights

# Example4: Partial least squares with sparsity (PLS).
out <- moss(sim_blocks[-4],axes.pos=1:5,
  K.X = 500,
  K.Y = 5,
  method = "pls",
  nu.v = seq(1, 100, by = 2),
  nu.u = seq(1, 100, by = 2),
  alpha.v = 1,
  alpha.u = 1,
  tSNE = TRUE,
  cluster = TRUE,
  clus.lab = lab.sub,
  resp.block = 3,
  plot = TRUE
)
out$clus_plot

# Get some measurement of accuracy at detecting features with signal
# versus background noise.
table(out$sparse$u[, 1] != 0, lab.feat[1:2000])
table(out$sparse$v[, 1] != 0, lab.feat[2001:3000])

# Example5: PCA-LDA
out <- moss(sim_blocks,
  method = "pca-lda",
  cluster = TRUE,
  resp.block = 4,
  clus.lab = lab.sub,
  plot = TRUE
)
out$clus_plot



[Package MOSS version 0.2.2 Index]