PlotLossvsLatentFactors {jrSiCKLSNMF}R Documentation

Create plots to help determine the number of latent factors

Description

Generate plots of the lowest achieved loss after a pre-specified number of iterations (default 100) for each latent factor (defaults to 2:20). This operates similarly to a scree plot, so please select a number of latent factors that corresponds to the elbow of the plot. This method is not appropriate for larger sets of data (more than 1000 cells)

Usage

PlotLossvsLatentFactors(
  SickleJr,
  rounds = 100,
  differr = 1e-04,
  d_vector = c(2:20),
  parallel = FALSE,
  nCores = detectCores() - 1,
  subsampsize = NULL,
  minibatch = FALSE,
  random = FALSE,
  random_W_updates = FALSE,
  seed = NULL,
  batchsize = -1,
  lossonsubset = FALSE,
  losssubsetsize = dim(SickleJr@count.matrices[[1]])[2]
)

Arguments

SickleJr

An object of class SickleJr

rounds

Number of rounds to use: defaults to 100; this process is time consuming, so a high number of rounds is not recommended

differr

Tolerance for the percentage update in the likelihood: for these plots, this defaults to 1e-4

d_vector

Vector of d values to test: default is 2 to 20

parallel

Boolean indicating whether to use parallel computation

nCores

Number of desired cores; defaults to the number of cores of the current machine minus 1 for convenience

subsampsize

Size of the random subsample (defaults to NULL, which means all cells will be used); using a random subsample decreases computation time but sacrifices accuracy

minibatch

Boolean indicating whether to use the mini-batch algorithm: default is FALSE

random

Boolean indicating whether to use random initialization to generate the \mathbf{W}^v matrices and \mathbf{H} matrix: defaults to FALSE

random_W_updates

Boolean parameter for mini-batch algorithm; if TRUE, only updates \mathbf{W}^v once per epoch on the penultimate subset of \mathbf{H}; otherwise updates \mathbf{W}^v after every update of the subset of \mathbf{H}

seed

Number representing the random seed

batchsize

Desired batch size; do not use if using a subsample

lossonsubset

Boolean indicating whether to calculate the loss on a subset rather than the full dataset; speeds up computation for larger datasets

losssubsetsize

Number of cells to use for the loss subset; default is total number of cells

Value

An object of class SickleJr with a list of initialized \mathbf{W}^v matrices and an \mathbf{H} matrix for each latent factor d\in\{1,...,D\} added to the WHinitials slot, a data frame holding relevant values for plotting the elbow plot added to the latent.factor.elbow.values slot, diagnostic plots of the loss vs. the number of latent factors added to the plots slot, and the cell indices used to calculate the loss on the subsample added to the lossCalcSubSample slot

References

Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis, 2 edition. Springer International Publishing, Cham, Switzerland. ISBN 978-3-319-24277-4, doi:10.1007/978-3-319-24277-4, https://ggplot2.tidyverse.org/.

Examples

SimSickleJrSmall@latent.factor.elbow.values<-data.frame(NULL,NULL)
SimSickleJrSmall<-PlotLossvsLatentFactors(SimSickleJrSmall,d_vector=c(2:5),
rounds=5,parallel=FALSE)
#Next, we commute 2 of these in parallel.
## Not run: 
SimSickleJrSmall<-PlotLossvsLatentFactors(SimSickleJrSmall,
d_vector=c(6:7),rounds=5,parallel=TRUE,nCores=2)
## End(Not run)

[Package jrSiCKLSNMF version 1.2.1 Index]