getK {ruv} | R Documentation |
Get K
Description
Finds an often-suitable value of K for use in RUV-4.
Usage
getK(Y, X, ctl, Z = 1, eta = NULL, include.intercept = TRUE,
fullW0 = NULL, cutoff = NULL, method="select", l=1, inputcheck = TRUE)
Arguments
Y |
The data. A m by n matrix, where m is the number of samples and n is the number of features. |
X |
The factor(s) of interest. A m by p matrix, where m is the number of samples and p is the number of factors of interest. Note that X should be only a single column, i.e. p = 1; if X has more than one column, only column |
ctl |
An index vector to specify the negative controls. Either a logical vector of length n or a vector of integers. |
Z |
Any additional covariates to include in the model, typically a m by q matrix. Factors and dataframes are also permissible, and converted to a matrix by |
eta |
Gene-wise (as opposed to sample-wise) covariates. These covariates are adjusted for by RUV-1 before any further analysis proceeds. Can be either (1) a matrix with n columns, (2) a matrix with n rows, (3) a dataframe with n rows, (4) a vector or factor of length n, or (5) simply 1, for an intercept term. |
include.intercept |
Applies to both |
fullW0 |
Can be included to speed up execution. Is returned by previous calls of |
cutoff |
Specify an alternative cut-off. Default is the (approximate) 90th percentile of the distribution of the first singular value of an m by n gaussian matrix. |
method |
Can be set to either |
l |
Which column of X to use in the getK algorithm. |
inputcheck |
Perform a basic sanity check on the inputs, and issue a warning if there is a problem. |
Value
A list containing
k |
the estimated value of k |
cutoff |
The cutoff value used |
sizeratios |
A measure of the relative sizes of the rows of alpha. |
fullW0 |
Can be used to speed up future calls of RUV4. |
Warning
This value of K will not be the best choice in all cases. Moreover, it will often be a poor choice of K for use with RUV2. See Gagnon-Bartsch and Speed (2012) for commentary on estimating k.
Author(s)
Johann Gagnon-Bartsch johanngb@umich.edu
References
Using control genes to correct for unwanted variation in microarray data. Gagnon-Bartsch and Speed, 2012. Available at: http://biostatistics.oxfordjournals.org/content/13/3/539.full.
Removing Unwanted Variation from High Dimensional Data with Negative Controls. Gagnon-Bartsch, Jacob, and Speed, 2013. Available at: http://statistics.berkeley.edu/tech-reports/820.
See Also
Examples
## Create some simulated data
m = 50
n = 10000
nc = 1000
p = 1
k = 20
ctl = rep(FALSE, n)
ctl[1:nc] = TRUE
X = matrix(c(rep(0,floor(m/2)), rep(1,ceiling(m/2))), m, p)
beta = matrix(rnorm(p*n), p, n)
beta[,ctl] = 0
W = matrix(rnorm(m*k),m,k)
alpha = matrix(rnorm(k*n),k,n)
epsilon = matrix(rnorm(m*n),m,n)
Y = X%*%beta + W%*%alpha + epsilon
## Run getK
temp = getK(Y, X, ctl)
K = temp$k