RUV4 {ruv} | R Documentation |
Remove Unwanted Variation, 4-step
Description
The RUV-4 algorithm. Estimates and adjusts for unwanted variation using negative controls.
Usage
RUV4(Y, X, ctl, k, Z = 1, eta = NULL, include.intercept=TRUE,
fullW0=NULL, inputcheck=TRUE)
Arguments
Y |
The data. A m by n matrix, where m is the number of samples and n is the number of features. |
X |
The factor(s) of interest. A m by p matrix, where m is the number of samples and p is the number of factors of interest. Very often p = 1. Factors and dataframes are also permissible, and converted to a matrix by |
ctl |
An index vector to specify the negative controls. Either a logical vector of length n or a vector of integers. |
k |
The number of unwanted factors to use. Can be 0. |
Z |
Any additional covariates to include in the model, typically a m by q matrix. Factors and dataframes are also permissible, and converted to a matrix by |
eta |
Gene-wise (as opposed to sample-wise) covariates. These covariates are adjusted for by RUV-1 before any further analysis proceeds. Can be either (1) a matrix with n columns, (2) a matrix with n rows, (3) a dataframe with n rows, (4) a vector or factor of length n, or (5) simply 1, for an intercept term. |
include.intercept |
Applies to both |
fullW0 |
Can be included to speed up execution. Is returned by previous calls of |
inputcheck |
Perform a basic sanity check on the inputs, and issue a warning if there is a problem. |
Details
Implements the RUV-4 algorithm as described in Gagnon-Bartsch, Jacob, and Speed (2013), using the SVD as the factor analysis routine. Unwanted factors W are estimated using control genes. Y is then regressed on the variables X, Z, and W.
Value
A list containing
betahat |
The estimated coefficients of the factor(s) of interest. A p by n matrix. |
sigma2 |
Estimates of the features' variances. A vector of length n. |
t |
t statistics for the factor(s) of interest. A p by n matrix. |
p |
P-values for the factor(s) of interest. A p by n matrix. |
Fstats |
F statistics for testing all of the factors in |
Fpvals |
P-values for testing all of the factors in |
multiplier |
The constant by which |
df |
The number of residual degrees of freedom. |
W |
The estimated unwanted factors. |
alpha |
The estimated coefficients of W. |
byx |
The coefficients in a regression of Y on X (after both Y and X have been "adjusted" for Z). Useful for projection plots. |
bwx |
The coefficients in a regression of W on X (after X has been "adjusted" for Z). Useful for projection plots. |
X |
|
k |
|
ctl |
|
Z |
|
eta |
|
fullW0 |
Can be used to speed up future calls of RUV4. |
include.intercept |
|
method |
Character variable with value "RUV4". Included for reference. |
Note
Additional resources can be found at http://www-personal.umich.edu/~johanngb/ruv/.
Author(s)
Johann Gagnon-Bartsch johanngb@umich.edu
References
Using control genes to correct for unwanted variation in microarray data. Gagnon-Bartsch and Speed, 2012. Available at: http://biostatistics.oxfordjournals.org/content/13/3/539.full.
Removing Unwanted Variation from High Dimensional Data with Negative Controls. Gagnon-Bartsch, Jacob, and Speed, 2013. Available at: http://statistics.berkeley.edu/tech-reports/820.
See Also
RUV2
, RUVinv
, RUVrinv
, variance_adjust
Examples
## Create some simulated data
m = 50
n = 10000
nc = 1000
p = 1
k = 20
ctl = rep(FALSE, n)
ctl[1:nc] = TRUE
X = matrix(c(rep(0,floor(m/2)), rep(1,ceiling(m/2))), m, p)
beta = matrix(rnorm(p*n), p, n)
beta[,ctl] = 0
W = matrix(rnorm(m*k),m,k)
alpha = matrix(rnorm(k*n),k,n)
epsilon = matrix(rnorm(m*n),m,n)
Y = X%*%beta + W%*%alpha + epsilon
## Run RUV-4
fit = RUV4(Y, X, ctl, k)
## Get adjusted variances and p-values
fit = variance_adjust(fit)