RUV4 {ruv}R Documentation

Remove Unwanted Variation, 4-step

Description

The RUV-4 algorithm. Estimates and adjusts for unwanted variation using negative controls.

Usage

RUV4(Y, X, ctl, k, Z = 1, eta = NULL, include.intercept=TRUE,
fullW0=NULL, inputcheck=TRUE)

Arguments

Y

The data. A m by n matrix, where m is the number of samples and n is the number of features.

X

The factor(s) of interest. A m by p matrix, where m is the number of samples and p is the number of factors of interest. Very often p = 1. Factors and dataframes are also permissible, and converted to a matrix by design.matrix.

ctl

An index vector to specify the negative controls. Either a logical vector of length n or a vector of integers.

k

The number of unwanted factors to use. Can be 0.

Z

Any additional covariates to include in the model, typically a m by q matrix. Factors and dataframes are also permissible, and converted to a matrix by design.matrix. Alternatively, may simply be 1 (the default) for an intercept term. May also be NULL.

eta

Gene-wise (as opposed to sample-wise) covariates. These covariates are adjusted for by RUV-1 before any further analysis proceeds. Can be either (1) a matrix with n columns, (2) a matrix with n rows, (3) a dataframe with n rows, (4) a vector or factor of length n, or (5) simply 1, for an intercept term.

include.intercept

Applies to both Z and eta. When Z or eta (or both) is specified (not NULL) but does not already include an intercept term, this will automatically include one. If only one of Z or eta should include an intercept, this variable should be set to FALSE, and the intercept term should be included manually where desired.

fullW0

Can be included to speed up execution. Is returned by previous calls of RUV4, RUVinv, or RUVrinv (see below).

inputcheck

Perform a basic sanity check on the inputs, and issue a warning if there is a problem.

Details

Implements the RUV-4 algorithm as described in Gagnon-Bartsch, Jacob, and Speed (2013), using the SVD as the factor analysis routine. Unwanted factors W are estimated using control genes. Y is then regressed on the variables X, Z, and W.

Value

A list containing

betahat

The estimated coefficients of the factor(s) of interest. A p by n matrix.

sigma2

Estimates of the features' variances. A vector of length n.

t

t statistics for the factor(s) of interest. A p by n matrix.

p

P-values for the factor(s) of interest. A p by n matrix.

Fstats

F statistics for testing all of the factors in X simultaneously.

Fpvals

P-values for testing all of the factors in X simultaneously.

multiplier

The constant by which sigma2 must be multiplied in order get an estimate of the variance of betahat

df

The number of residual degrees of freedom.

W

The estimated unwanted factors.

alpha

The estimated coefficients of W.

byx

The coefficients in a regression of Y on X (after both Y and X have been "adjusted" for Z). Useful for projection plots.

bwx

The coefficients in a regression of W on X (after X has been "adjusted" for Z). Useful for projection plots.

X

X. Included for reference.

k

k. Included for reference.

ctl

ctl. Included for reference.

Z

Z. Included for reference.

eta

eta. Included for reference.

fullW0

Can be used to speed up future calls of RUV4.

include.intercept

include.intercept. Included for reference.

method

Character variable with value "RUV4". Included for reference.

Note

Additional resources can be found at http://www-personal.umich.edu/~johanngb/ruv/.

Author(s)

Johann Gagnon-Bartsch johanngb@umich.edu

References

Using control genes to correct for unwanted variation in microarray data. Gagnon-Bartsch and Speed, 2012. Available at: http://biostatistics.oxfordjournals.org/content/13/3/539.full.

Removing Unwanted Variation from High Dimensional Data with Negative Controls. Gagnon-Bartsch, Jacob, and Speed, 2013. Available at: http://statistics.berkeley.edu/tech-reports/820.

See Also

RUV2, RUVinv, RUVrinv, variance_adjust

Examples

## Create some simulated data
m = 50
n = 10000
nc = 1000
p = 1
k = 20
ctl = rep(FALSE, n)
ctl[1:nc] = TRUE
X = matrix(c(rep(0,floor(m/2)), rep(1,ceiling(m/2))), m, p)
beta = matrix(rnorm(p*n), p, n)
beta[,ctl] = 0
W = matrix(rnorm(m*k),m,k)
alpha = matrix(rnorm(k*n),k,n)
epsilon = matrix(rnorm(m*n),m,n)
Y = X%*%beta + W%*%alpha + epsilon

## Run RUV-4
fit = RUV4(Y, X, ctl, k)

## Get adjusted variances and p-values
fit = variance_adjust(fit)

[Package ruv version 0.9.7.1 Index]