R: Fit design matrices to one or multiple outcomes

GLM fits {brainGraph}

R Documentation

Fit design matrices to one or multiple outcomes

Description

These are the “base” model-fitting functions that solve the least squares problem to estimate model coefficients, residuals, etc. for brain network data.

fastLmBG_t and fastLmBG_f calculate contrast-based statistics for T or F contrasts, respectively. It accepts any number of contrasts (i.e., a multi-row contrast matrix).

Usage

fastLmBG(X, Y, QR = qr.default(X), Q = qr_Q2(QR, n = n, p = p),
  R = qr_R2(QR, p), n = dim(X)[1L], p = QR$rank, ny = dim(Y)[2L],
  dfR = n - p, XtXinv = inv(QR))

fastLmBG_3d(X, Y, runX, QR = qr(X[, , runX, drop = FALSE]),
  Q = lapply(QR, qr_Q2, n = n, p = p), R = lapply(QR, qr_R2, p),
  n = dim(X)[1L], p = QR[[1L]]$rank, ny = length(runX), dfR = n -
  p, XtXinv = inv(QR))

fastLmBG_3dY(X, Y, runX, QR = qr(X[, , runX, drop = FALSE]),
  Q = lapply(QR, qr_Q2, n = n, p = p), R = lapply(QR, qr_R2, p),
  n = dim(X)[1L], p = QR[[1L]]$rank, ny = length(runX), dfR = n -
  p, XtXinv = inv(QR))

fastLmBG_3dY_1p(X, Y, runX, QR = qr(X[, , runX, drop = FALSE]),
  Q = lapply(QR, qr_Q2, diag(1L, n, 1L), n, 1L), R = lapply(QR,
  function(r) r$qr[1L]), n = dim(X)[1L], p = 1L, ny = length(runX),
  dfR = n - 1L, XtXinv = inv(QR))

fastLmBG_t(fits, contrasts, alternative = c("two.sided", "less",
  "greater"), alpha = NULL)

fastLmBG_f(fits, contrasts, rkC = NULL, nC = length(contrasts))

Arguments

`X`	Design matrix or 3D array of design matrices
`Y`	Numeric matrix; there should be 1 column for each outcome variable (so that in a graph-level analysis, this is a column matrix)
`QR`, `Q`, `R`	The QR decomposition(s) and Q and R matrix(es) of the design matrix(es). If `X` is a 3D array, these should be lists
`n`, `p`, `ny`, `dfR`	Integers; the number of observations, model rank, number of regions/outcome variables, and residual degrees of freedom
`XtXinv`	Numeric matrix or array; the inverse of the cross-product of the design matrix(es)
`runX`	Character vector of the regions for which the design matrix is not singular
`fits`	List object output by one of the model fitting functions (e.g., `fastLmBG`)
`contrasts`	Numeric matrix (for T statistics) or list of matrices (for F statistics) specifying the contrast(s) of interest; if only one contrast is desired, you can supply a vector (for T statistics)
`alternative`	Character string, whether to do a two- or one-sided test. Default: `'two.sided'`
`alpha`	Numeric; the significance level. Default: 0.05
`rkC`, `nC`	Integers; the rank of the contrast matrix and number of contrasts, respectively (for F contrasts)

Value

A list with elements

`coefficients`	Parameter estimates
`rank`	Model rank
`df.residual`	Residual degrees of freedom
`residuals`	Model residuals
`sigma`	The residual standard deviation, or root mean square error (RMSE)
`fitted.values`	Model fitted values
`qr`	The design matrix QR decomposition(s)
`cov.unscaled`	The “unscaled covariance matrix”

fastLmBG_t – A multidimensional array with the third dimension equaling the number of contrasts; each matrix contains the contrast of parameter estimates, standard error of the contrast, T-statistics, P-values, FDR-adjusted P-values, and confidence intervals (if alpha is given)

fastLmBG_f – A numeric matrix with columns for the effect size, standard error, F statistic, P-values, and FDR-adjusted P-values

Parameter estimation

These functions use the QR decomposition to calculate the least squares solution which is the same as the base lm function. If we substitute X = QR in the standard normal equations, the equation to be solved reduces to

X^T X \hat{\beta} = X^T y \Rightarrow R \hat{\beta} = Q^T y

Since R is an upper-triangular matrix, we can use the backsolve function which is a bit faster than solve. In some cases, the fastLmBG* functions are about as fast or faster (particularly when X is not permuted) as one in which the normal equations are solved directly; additionally, using the QR method affords greater numerical stability.

Different scenarios

There are a few different scenarios for fitting models of the data, with a separate function for each:

fastLmBG: The main function for when there is a single design matrix X and any number of outcome variables Y.
fastLmBG_3d: Fits models when there is a different design matrix X for each region and a single outcome variable Y, which in this case will be a column matrix.
fastLmBG_3dY: Fits models when there is both a different design matrix X and outcome variable Y for each region. Occurs under permutation for the Freedman-Lane, ter Braak, and Still-White methods.
fastLmBG_3dY_1p: Fits models when there is both a different design and outcome variable for each region, and also when X is a rank-1 matrix (i.e., it has 1 column). Only occurs under permutation with the Still-White method if there is a single regressor of interest.

In the last case above, model coefficients are calculated by simple (i.e., non-matrix) algebra.

Improving speed/efficiency

Speed/efficiency gains will be vast for analyses in which there is a single design matrix X for all regions, there are multiple outcome variables (i.e., vertex-level analysis), and the permutation method chosen does not permute X. Specifically, these are Freedman-Lane, ter Braak, and Manly methods. Therefore, the QR decomposition, the Q and R matrices, and the “unscaled covariance matrix” (which is (X^T X)^{-1}) only need to be calculated once for the entire analysis. Other functions (e.g., lm.fit) would recalculate these for each permutation.

Furthermore, this (and the other model fitting functions in the package) will likely only work in models with full rank. I sacrifice proper error checking in favor of speed, but hopefully any issues with the model will be identified prior to the permutation step. Finally, the number of observations, model rank, number of outcome variables, and degrees of freedom will not change and therefore do not need to be recalculated (although these probably amount to a negligible speed boost).

In case there are multiple design matrices, or the permutation method permutes the design, then the QR decomposition will need to be calculated each time anyway. For these cases, I use more simplified functions qr_Q2 and qr_R2 to calculate the Q and R matrices, and then the fitted values, residuals, and residual standard deviation are calculated at the same time (whereas lm.fit and others would calculate these each time).

Contrast-based statistics

The contrast of parameter estimates, \gamma, for T contrasts is

\gamma = C \hat{\beta}

where C is the contrast matrix with size k \times p (where k is the number of contrasts) and \hat{\beta} is the matrix of parameter estimates with size p \times r (where r is the number of regions). For F contrasts, the effect size is the extra sum of squares and is calculated as

\gamma (C (X^T X)^{-1} C^T)^{-1} \gamma^T

The standard error of a T contrast is

\sqrt{\hat{\sigma} (X^T X)^{-1}}

where \hat{\sigma} is the residual standard deviation of the model and the second term is the unscaled covariance matrix. The standard error for F contrasts is simply the residual sum of squares. P-values and FDR-adjusted P-values (across regions) are also calculated. Finally, if \alpha is provided for T contrasts, confidence limits are calculated.

Author(s)

Christopher G. Watson, cgwatson@bu.edu