reset {RESET}R Documentation

Reconstruction Set Test (RESET)

Description

Implementation of the Reconstruction Set Test (RESET) method, which transforms an n-by-p input matrix X into an n-by-m matrix of sample-level variable set scores and a length m vector of overall variable set scores. Execution of RESET involves the following sequence of steps:

Usage

reset(X, X.test, center.X=TRUE, scale.X=FALSE, center.X.test=TRUE, scale.X.test=FALSE, 
      var.sets, k=2, random.threshold, k.buff=0, q=0, test.dist="normal", norm.type="2",
      per.var=FALSE)

Arguments

X

The n-by-p target matrix; columns represent variables and rows represent samples.

X.test

Matrix that will be combined with the var.set variables to compute the reduced rank reconstruction. This is typically a subset or transformation of X, e.g., projection on top PCs. Reconstruction error will be measured on the variables in X.test. If not specified, the entire X matrix will be used for calculating reconstruction error.

center.X

Flag which controls whether the values in X are mean centered during execution of the algorithm. If only X is specified and center.X=TRUE, then all columns in X will be centered. If both X and X.test are specified, then centering is performed on just the columns of X contained in the specified variable sets. Mean centering is especially important for accurate performance when X.test is specified as a reduced rank representation of the X, e.g, as the projection of X onto the top principal components. However, mean centering the entire matrix X can have a dramatic impact on memory requirements if X is a large sparse matrix. In this case, a non-centered X and appropriate X.test (e.g., project onto top PCs of X) can be provided and mean centering performed on just the needed variables during execution of RESET. This "just-in-time" centering is enabled by setting center.X=TRUE and providing both X and X.test. If X has already been mean-centered (and X.test is a subset of this mean-centered matrix or computed using this mean-centered matrix), then center should be specified as FALSE.

scale.X

Flag which controls whether the values in X are are scaled to have variance 1 during execution of the algorithm. Defaults to false. If only X is specified and scale.X=TRUE, then all columns in X will be scaled. If both X and X.test are specified, then scaling is performed on just the columns of X contained in the specified variable sets.

center.X.test

Flag which controls whether the values in X.test, if specified, are mean centered during execution of the algorithm. Centering should be performed consistently for X and X.test, i.e., if center.X is true or X was previously centered, then center.X.test should te true unless X.test previously centered or generated from a centered X.

scale.X.test

Flag which controls whether the values in X.test, if specified, are scaled to have variance 1 during execution of the algorithm. Similar to centering, scaling should be performed consistently for X and X.test, i.e., if scale.X is true or X was previously scaled then scale.X.test should te true unless X.test previously scaled or generated from a scaled X.

var.sets

List of m variable sets, each element is a vector of indices of variables in the set that correspond to columns in X. If variable set information is instead available in terms of variable names, the appropriate format can be generated using createVarSetCollection.

k

Rank of reconstruction. Default to 2. Cannot be larger than the minimum variable set size.

random.threshold

If specified, indicates the variable set size above which a randomized reduced-rank reconstruction is used. If the variable set size is less or equal to random.threshold, then a non-random reconstruction is computed. Defaults to k and cannot be less than k.

k.buff

Additional dimensions used in randomized reduced-rank construction algorithm. Defaults to 0. Values above 0 can improve the accuracy of the randomized reconstruction at the expense of additional computational complexity. If k.buff=0, then the reduced rank reconstruction can be generated directly from the output of randomColumnSpace, otherwise, a reduced rank SVD must also be computed with the reconstruction based on the top k components.

q

Number of power iterations for randomized SVD (see randomSVD). Defaults to 0. Although power iterations can improve randomized SVD performance in general, it can decrease the sensitivity of the RESET method to detect mean or covariance differences.

test.dist

Distribution for non-zero elements of random test matrix used in randomized SVD algorithm. See description for test.dist parameter of randomSVD method.

norm.type

The type of norm to use for computing reconstruction error. Defaults to "2" for Euclidean/Frobenius norm. Other supported option is "1" for L1 norm.

per.var

If true, the computed scores for each variable set are divided by the scaled variable set size to generate per-variable scores. Variable set size scaling is performed by dividing all sizes by the mean size (this will generate per-variable scores of approximately the same magnitude as the non-per-variable scores).

Value

A list with the following elements:

See Also

createVarSetCollection,randomColumnSpace

Examples

  # Create a collection of 5 variable sets each of size 10
  var.sets = list(set1=1:10, 
                  set2=11:20,
                  set3=21:30,
                  set4=31:40,
                  set5=41:50)                  

  # Simulate a 100-by-100 matrix of random Poisson data
  X = matrix(rpois(10000, lambda=1), nrow=100)

  # Inflate first 10 rows for first 10 variables, i.e., the first
  # 10 samples should have elevated scores for the first variable set
  X[1:10,1:10] = rpois(100, lambda=5)

  # Execute RESET using non-randomized basis computation
  reset(X, var.sets=var.sets, k=2, random.threshold=10)

  # Execute RESET with randomized basis computation
  # (random.threshold will default to k value which is less
  # than the size of all variable sets)
  reset(X, var.sets=var.sets, k=2, k.buff=2)

  # Execute RESET with non-zero k.buff
  reset(X, var.sets=var.sets, k=2, k.buff=2)
  
  # Execute RESET with non-zero q
  reset(X, var.sets=var.sets, k=2, q=1)

  # Execute RESET with L1 vs L2 norm
  reset(X, var.sets=var.sets, k=2, norm.type="1")

  # Project the X matrix onto the first 5 PCs and use that as X.test
  # Scale X before calling prcomp() so that no centering or scaling
  # is needed within reset()
  X = scale(X)
  X.test = prcomp(X,center=FALSE,scale=FALSE,retx=TRUE)$x[,1:5]
  reset(X, X.test=X.test, center.X=FALSE, scale.X=FALSE, 
    center.X.test=FALSE, scale.X.test=FALSE, var.sets=var.sets, k=2)

[Package RESET version 1.0.0 Index]