reset {RESET} | R Documentation |
Reconstruction Set Test (RESET)
Description
Implementation of the Reconstruction Set Test (RESET) method, which transforms an n-by-p input matrix X
into an n-by-m matrix of sample-level variable set scores and a length m vector of overall variable set scores. Execution of RESET involves the following sequence of steps:
If
center.X=TRUE
, mean center the columns ofX
. IfX.test
is specified, the centering is instead performed on just the columns ofX
corresponding to each variable set. See documentation for theX
andcenter.X
parameters for more details.If
scale.X=TRUE
, scale the columns ofX
to have variance 1. IfX.test
is specified, the scaling is instead performed on just the columns ofX
corresponding to each variable set. See documentation for theX
andscale.X
parameters for more details.If
center.X.test=TRUE
, mean center the columns ofX.test
. See documentation for theX.test
andcenter.X.test
parameters for more details.If
scale.X.test=TRUE
, scale the columns ofX.test
. See documentation for theX.test
andscale.X.test
parameters for more details.Set the reconstruction target matrix
T
toX
or, ifX.test
is specified, toX.test
.Compute the norm of
T
and norm of each row ofT
. By default, these are the Frobenius and Euclidean norms respectively.For each set in
var.sets
, sample-level and matrix level scores are generated as follows:Create a subset of
X
calledX.var.set
that only includes the columns ofX
correponding to the variables in the set.Compute a rank
k
orthonormal basisQ
for the column space ofX.var.set
. If the size of the set is less then or equal torandom.threshold
, then this is computed as the topk
columns of theQ
matrix from a column-pivoted QR decomposition ofX.var.set
, otherwise, it is approximated using a randomized algorithm implemented byrandomColumnSpace
.The reduced rank reconstruction of
T
is then created asQ Q^T T
.The original
T
is subtracted from the reconstruction to represent the reconstruction error and the appropriate norm is computed on each row and the entire error matrix.The overall score is the log2 ratio of the norm of the original
T
to the norm of the reconstruction error matrix.The score for each sample is the log2 ratio of the norm of the corresponding row of the original
T
to the norm of the same row of the reconstruction error matrix.If
per.var=TRUE
, then the overall and sample-level scores are divided by the variable set size.
Usage
reset(X, X.test, center.X=TRUE, scale.X=FALSE, center.X.test=TRUE, scale.X.test=FALSE,
var.sets, k=2, random.threshold, k.buff=0, q=0, test.dist="normal", norm.type="2",
per.var=FALSE)
Arguments
X |
The n-by-p target matrix; columns represent variables and rows represent samples. |
X.test |
Matrix that will be combined with the |
center.X |
Flag which controls whether the values in |
scale.X |
Flag which controls whether the values in |
center.X.test |
Flag which controls whether the values in |
scale.X.test |
Flag which controls whether the values in |
var.sets |
List of m variable sets, each element is a vector of indices of variables in the set that correspond to columns in |
k |
Rank of reconstruction. Default to 2. Cannot be larger than the minimum variable set size. |
random.threshold |
If specified, indicates the variable set size above which a randomized reduced-rank reconstruction is used. If the variable set size is less or equal to random.threshold, then a non-random reconstruction is computed. Defaults to k and cannot be less than k. |
k.buff |
Additional dimensions used in randomized reduced-rank construction algorithm. Defaults to 0.
Values above 0 can improve the accuracy of the
randomized reconstruction at the expense of additional computational complexity. If |
q |
Number of power iterations for randomized SVD (see |
test.dist |
Distribution for non-zero elements of random test matrix used in randomized SVD algorithm. See description for |
norm.type |
The type of norm to use for computing reconstruction error. Defaults to "2" for Euclidean/Frobenius norm. Other supported option is "1" for L1 norm. |
per.var |
If true, the computed scores for each variable set are divided by the scaled variable set size to generate per-variable scores. Variable set size scaling is performed by dividing all sizes by the mean size (this will generate per-variable scores of approximately the same magnitude as the non-per-variable scores). |
Value
A list with the following elements:
-
S
an n-by-m matrix of sample-level variable set scores. -
v
a length m vector of overall variable set scores.
See Also
createVarSetCollection
,randomColumnSpace
Examples
# Create a collection of 5 variable sets each of size 10
var.sets = list(set1=1:10,
set2=11:20,
set3=21:30,
set4=31:40,
set5=41:50)
# Simulate a 100-by-100 matrix of random Poisson data
X = matrix(rpois(10000, lambda=1), nrow=100)
# Inflate first 10 rows for first 10 variables, i.e., the first
# 10 samples should have elevated scores for the first variable set
X[1:10,1:10] = rpois(100, lambda=5)
# Execute RESET using non-randomized basis computation
reset(X, var.sets=var.sets, k=2, random.threshold=10)
# Execute RESET with randomized basis computation
# (random.threshold will default to k value which is less
# than the size of all variable sets)
reset(X, var.sets=var.sets, k=2, k.buff=2)
# Execute RESET with non-zero k.buff
reset(X, var.sets=var.sets, k=2, k.buff=2)
# Execute RESET with non-zero q
reset(X, var.sets=var.sets, k=2, q=1)
# Execute RESET with L1 vs L2 norm
reset(X, var.sets=var.sets, k=2, norm.type="1")
# Project the X matrix onto the first 5 PCs and use that as X.test
# Scale X before calling prcomp() so that no centering or scaling
# is needed within reset()
X = scale(X)
X.test = prcomp(X,center=FALSE,scale=FALSE,retx=TRUE)$x[,1:5]
reset(X, X.test=X.test, center.X=FALSE, scale.X=FALSE,
center.X.test=FALSE, scale.X.test=FALSE, var.sets=var.sets, k=2)