GLM fits {brainGraph} | R Documentation |
Fit design matrices to one or multiple outcomes
Description
These are the “base” model-fitting functions that solve the least squares problem to estimate model coefficients, residuals, etc. for brain network data.
fastLmBG_t
and fastLmBG_f
calculate contrast-based statistics
for T or F contrasts, respectively. It accepts any number of contrasts
(i.e., a multi-row contrast matrix).
Usage
fastLmBG(X, Y, QR = qr.default(X), Q = qr_Q2(QR, n = n, p = p),
R = qr_R2(QR, p), n = dim(X)[1L], p = QR$rank, ny = dim(Y)[2L],
dfR = n - p, XtXinv = inv(QR))
fastLmBG_3d(X, Y, runX, QR = qr(X[, , runX, drop = FALSE]),
Q = lapply(QR, qr_Q2, n = n, p = p), R = lapply(QR, qr_R2, p),
n = dim(X)[1L], p = QR[[1L]]$rank, ny = length(runX), dfR = n -
p, XtXinv = inv(QR))
fastLmBG_3dY(X, Y, runX, QR = qr(X[, , runX, drop = FALSE]),
Q = lapply(QR, qr_Q2, n = n, p = p), R = lapply(QR, qr_R2, p),
n = dim(X)[1L], p = QR[[1L]]$rank, ny = length(runX), dfR = n -
p, XtXinv = inv(QR))
fastLmBG_3dY_1p(X, Y, runX, QR = qr(X[, , runX, drop = FALSE]),
Q = lapply(QR, qr_Q2, diag(1L, n, 1L), n, 1L), R = lapply(QR,
function(r) r$qr[1L]), n = dim(X)[1L], p = 1L, ny = length(runX),
dfR = n - 1L, XtXinv = inv(QR))
fastLmBG_t(fits, contrasts, alternative = c("two.sided", "less",
"greater"), alpha = NULL)
fastLmBG_f(fits, contrasts, rkC = NULL, nC = length(contrasts))
Arguments
X |
Design matrix or 3D array of design matrices |
Y |
Numeric matrix; there should be 1 column for each outcome variable (so that in a graph-level analysis, this is a column matrix) |
QR , Q , R |
The QR decomposition(s) and Q and R matrix(es) of the design
matrix(es). If |
n , p , ny , dfR |
Integers; the number of observations, model rank, number of regions/outcome variables, and residual degrees of freedom |
XtXinv |
Numeric matrix or array; the inverse of the cross-product of the design matrix(es) |
runX |
Character vector of the regions for which the design matrix is not singular |
fits |
List object output by one of the model fitting functions (e.g.,
|
contrasts |
Numeric matrix (for T statistics) or list of matrices (for F statistics) specifying the contrast(s) of interest; if only one contrast is desired, you can supply a vector (for T statistics) |
alternative |
Character string, whether to do a two- or one-sided test.
Default: |
alpha |
Numeric; the significance level. Default: 0.05 |
rkC , nC |
Integers; the rank of the contrast matrix and number of contrasts, respectively (for F contrasts) |
Value
A list with elements
coefficients |
Parameter estimates |
rank |
Model rank |
df.residual |
Residual degrees of freedom |
residuals |
Model residuals |
sigma |
The residual standard deviation, or root mean square error (RMSE) |
fitted.values |
Model fitted values |
qr |
The design matrix QR decomposition(s) |
cov.unscaled |
The “unscaled covariance matrix” |
fastLmBG_t
– A multidimensional array with the third
dimension equaling the number of contrasts; each matrix contains the
contrast of parameter estimates, standard error of the contrast,
T-statistics, P-values, FDR-adjusted P-values, and confidence intervals (if
alpha
is given)
fastLmBG_f
– A numeric matrix with columns for the effect
size, standard error, F statistic, P-values, and FDR-adjusted P-values
Parameter estimation
These functions use the QR decomposition to calculate the least
squares solution which is the same as the base lm
function. If we substitute X = QR
in the standard normal equations, the
equation to be solved reduces to
X^T X \hat{\beta} = X^T y \Rightarrow R \hat{\beta} = Q^T y
Since R
is an upper-triangular matrix, we can use the
backsolve
function which is a bit faster than
solve
. In some cases, the fastLmBG*
functions are about
as fast or faster (particularly when X
is not permuted) as one in
which the normal equations are solved directly; additionally, using the
QR method affords greater numerical stability.
Different scenarios
There are a few different scenarios for fitting models of the data, with a separate function for each:
- fastLmBG
The main function for when there is a single design matrix
X
and any number of outcome variablesY
.- fastLmBG_3d
Fits models when there is a different design matrix
X
for each region and a single outcome variableY
, which in this case will be a column matrix.- fastLmBG_3dY
Fits models when there is both a different design matrix
X
and outcome variableY
for each region. Occurs under permutation for the Freedman-Lane, ter Braak, and Still-White methods.- fastLmBG_3dY_1p
Fits models when there is both a different design and outcome variable for each region, and also when
X
is a rank-1 matrix (i.e., it has 1 column). Only occurs under permutation with the Still-White method if there is a single regressor of interest.
In the last case above, model coefficients are calculated by simple (i.e., non-matrix) algebra.
Improving speed/efficiency
Speed/efficiency gains will be vast for analyses in which there is a single
design matrix X
for all regions, there are multiple outcome variables
(i.e., vertex-level analysis), and the permutation method chosen does
not permute X
. Specifically, these are Freedman-Lane, ter
Braak, and Manly methods. Therefore, the QR decomposition, the
Q
and R
matrices, and the “unscaled covariance matrix”
(which is (X^T X)^{-1}
) only need to be calculated once for the entire
analysis. Other functions (e.g., lm.fit
) would recalculate these for
each permutation.
Furthermore, this (and the other model fitting functions in the package) will likely only work in models with full rank. I sacrifice proper error checking in favor of speed, but hopefully any issues with the model will be identified prior to the permutation step. Finally, the number of observations, model rank, number of outcome variables, and degrees of freedom will not change and therefore do not need to be recalculated (although these probably amount to a negligible speed boost).
In case there are multiple design matrices, or the permutation method
permutes the design, then the QR decomposition will need to be calculated
each time anyway. For these cases, I use more simplified functions
qr_Q2
and qr_R2
to calculate the Q
and R
matrices,
and then the fitted values, residuals, and residual standard deviation are
calculated at the same time (whereas lm.fit
and others would calculate
these each time).
Contrast-based statistics
The contrast of parameter estimates, \gamma
, for T contrasts is
\gamma = C \hat{\beta}
where C
is the contrast matrix with size k \times p
(where
k
is the number of contrasts) and \hat{\beta}
is the matrix of
parameter estimates with size p \times r
(where r
is the number
of regions). For F contrasts, the effect size is the extra sum of
squares and is calculated as
\gamma (C (X^T X)^{-1} C^T)^{-1} \gamma^T
The standard error of a T contrast is
\sqrt{\hat{\sigma} (X^T X)^{-1}}
where \hat{\sigma}
is the residual standard deviation of the
model and the second term is the unscaled covariance matrix. The standard
error for F contrasts is simply the residual sum of squares. P-values
and FDR-adjusted P-values (across regions) are also calculated. Finally, if
\alpha
is provided for T contrasts, confidence limits are calculated.
Author(s)
Christopher G. Watson, cgwatson@bu.edu
See Also
randomise
Other GLM functions: GLM design
,
GLM
, mtpc