calc_rsquared {mvrsquared} | R Documentation |
Calculate R-Squared.
Description
Calculate R-Squared for univariate or multivariate outcomes.
Usage
calc_rsquared(y, yhat, ybar = NULL, return_ss_only = FALSE, threads = 1)
Arguments
y |
The true outcome. This must be a numeric vector, numeric matrix, or
coercible to a sparse matrix of class |
yhat |
The predicted outcome or a list of two matrices whose dot product makes the predicted outcome. See 'Details' below for more information. |
ybar |
Numeric scalar or vector; the mean of |
return_ss_only |
Logical. Do you want to forego calculating R-squared and only return the sums of squares? |
threads |
Integer number of threads for parallelism; defaults to 1. |
Details
There is some flexibility in what you can pass as y
and yhat
.
In general, y
can be a numeric vector, numeric matrix, a sparse
matrix of class dgCMatrix
from the Matrix
package,
or any object that can be coerced into a dgCMatrix
.
yhat
can be a numeric vector, numeric matrix, or a list of two
matrices whose dot product has the same dimensionality as y
. If
yhat
is a list of two matrices you may optionally name them x
and w
indicating the order of multiplication (x
left
multiplies w
). If unnamed or ambiguously named, then it is assumed
that yhat[[1]]
left multiplies yhat[[2]]
.
Value
If return_ss_only = FALSE
, calc_rsqured
returns a numeric
scalar R-squared. If return_ss_only = TRUE
, calc_rsqured
returns a vector; the first element is the error sum of squares (SSE) and
the second element is the total sum of squares (SST). R-squared may then
be calculated as 1 - SSE / SST
.
Note
On some Linux systems, setting threads
greater than 1 for parallelism
may introduce some imprecision in the calculation. As of this writing, the
cause is still under investigation. In the meantime setting threads = 1
should fix the issue.
Setting return_ss_only
to TRUE
is useful for parallel or
distributed computing for large data sets, particularly when y
is
a large matrix. However if you do parallel execution you MUST pre-calculate
'ybar' and pass it to the function. If you do not, SST will be calculated
based on means of each batch independently. The resulting r-squared will
be incorrect.
See example below for parallel computation with future_map
from the furr
package.
Examples
# standard r-squared with y and yhat as vectors
f <- stats::lm(mpg ~ cyl + disp + hp + wt, data = datasets::mtcars)
y <- f$model$mpg
yhat <- f$fitted.values
calc_rsquared(y = y, yhat = yhat)
# standard r-squared with y as a matrix and yhat containing 'x' and linear coefficients
s <- summary(f)
x <- cbind(1, as.matrix(f$model[, -1]))
w <- matrix(s$coefficients[, 1], ncol = 1)
calc_rsquared(y = matrix(y, ncol = 1), yhat = list(x, w))
# multivariate r-squared with y and yhat as matrices
calc_rsquared(y = cbind(y, y), yhat = cbind(yhat, yhat))
# multivariate r-squared with yhat as a linear reconstruction of two matrices
calc_rsquared(y = cbind(y, y), yhat = list(x, cbind(w,w)))