BranchGLM {BranchGLM} | R Documentation |
Fits generalized linear models via RcppArmadillo. Also has the ability to fit the models with parallelization via OpenMP.
BranchGLM(
formula,
data,
family,
link,
offset = NULL,
method = "Fisher",
grads = 10,
parallel = FALSE,
nthreads = 8,
tol = 1e-06,
maxit = NULL,
init = NULL,
fit = TRUE,
contrasts = NULL,
keepData = TRUE,
keepY = TRUE
)
BranchGLM.fit(
x,
y,
family,
link,
offset = NULL,
method = "Fisher",
grads = 10,
parallel = FALSE,
nthreads = 8,
init = NULL,
maxit = NULL,
tol = 1e-06
)
formula |
a formula for the model. |
data |
a dataframe that contains the response and predictor variables. |
family |
distribution used to model the data, one of "gaussian", "gamma", "binomial", or "poisson". |
link |
link used to link mean structure to linear predictors. One of "identity", "logit", "probit", "cloglog", "sqrt", "inverse", or "log". |
offset |
offset vector, by default the zero vector is used. |
method |
one of "Fisher", "BFGS", or "LBFGS". BFGS and L-BFGS are quasi-newton methods which are typically faster than Fisher's scoring when there are many covariates (at least 50). |
grads |
number of gradients used to approximate inverse information with, only for |
parallel |
whether or not to make use of parallelization via OpenMP. |
nthreads |
number of threads used with OpenMP, only used if |
tol |
tolerance used to determine model convergence. |
maxit |
maximum number of iterations performed. The default for Fisher's scoring is 50 and for the other methods the default is 200. |
init |
initial values for the betas, if not specified then they are automatically selected. |
fit |
a logical value to indicate whether to fit the model or not. Setting this to false will make it so no coefficients matrix or variance-covariance matrix are returned. |
contrasts |
see |
keepData |
Whether or not to store a copy of data and design matrix, the default
is TRUE. If this is FALSE, then the results from this cannot be used inside of |
keepY |
Whether or not to store a copy of y, the default is TRUE. If
this is FALSE, then the binomial GLM helper functions may not work and this
cannot be used inside of |
x |
design matrix used for the fit, must be numeric. |
y |
outcome vector, must be numeric. |
Can use BFGS, L-BFGS, or Fisher's scoring to fit the GLM. BFGS and L-BFGS are
typically faster than Fisher's scoring when there are at least 50 covariates
and Fisher's scoring is typically best when there are fewer than 50 covariates.
This function does not currently support the use of weights. In the special
case of gaussian regression with identity link the method
argument is ignored
and the normal equations are solved directly.
The models are fit in C++ by using Rcpp and RcppArmadillo. In order to help
convergence, each of the methods makes use of a backtracking line-search using
the strong Wolfe conditions to find an adequate step size. There are also
two conditions used to control convergence, the first is whether there is a
sufficient decrease in the negative log-likelihood, and the other is whether
the norm of the score is sufficiently small. The
tol
argument controls both of these criteria. If the algorithm fails to
converge, then iterations
will be -1.
All observations with any missing values are removed before model fitting.
The dispersion parameter for gamma regression is estimated via maximum likelihood,
very similar to the gamma.dispersion
function from the MASS package.
BranchGLM.fit
can be faster than calling BranchGLM
if the
x matrix and y vector are already available, but doesn't return as much information.
The object returned by BranchGLM.fit
is not of class BranchGLM
, so
all of the methods for BranchGLM
objects such as predict
or
VariableSelection
cannot be used.
BranchGLM
returns a BranchGLM
object which is a list with the following components
coefficients |
a matrix with the coefficients estimates, SEs, wald test statistics, and p-values |
iterations |
number of iterations it took the algorithm to converge, if the algorithm failed to converge then this is -1 |
dispersion |
the value of the dispersion parameter |
logLik |
the log-likelihood of the fitted model |
vcov |
the variance-covariance matrix of the fitted model |
resDev |
the residual deviance of the fitted model |
AIC |
the AIC of the fitted model |
preds |
predictions from the fitted model |
linpreds |
linear predictors from the fitted model |
tol |
tolerance used to fit the model |
maxit |
maximum number of iterations used to fit the model |
formula |
formula used to fit the model |
method |
iterative method used to fit the model |
grads |
number of gradients used to approximate inverse information for L-BFGS |
y |
y vector used in the model, not included if |
x |
design matrix used to fit the model, not included if |
offset |
offset vector in the model, not included if |
data |
original dataframe supplied to the function, not included if |
mf |
the model frame, not included if |
numobs |
number of observations in the design matrix |
names |
names of the variables |
yname |
name of y variable |
parallel |
whether parallelization was employed to speed up model fitting process |
missing |
number of missing values removed from the original dataset |
link |
link function used to model the data |
family |
family used to model the data |
ylevel |
the levels of y, only included for binomial glms |
xlev |
the levels of the factors in the dataset |
terms |
the terms object used |
BranchGLM.fit
returns a list with the following components
coefficients |
a matrix with the coefficients estimates, SEs, wald test statistics, and p-values |
iterations |
number of iterations it took the algorithm to converge, if the algorithm failed to converge then this is -1 |
dispersion |
the value of the dispersion parameter |
logLik |
the log-likelihood of the fitted model |
vcov |
the variance-covariance matrix of the fitted model |
resDev |
the residual deviance of the fitted model |
AIC |
the AIC of the fitted model |
preds |
predictions from the fitted model |
linpreds |
linear predictors from the fitted model |
tol |
tolerance used to fit the model |
maxit |
maximum number of iterations used to fit the model |
Data <- iris
### Using BranchGLM
BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")
### Using BranchGLM.fit
x <- model.matrix(Sepal.Length ~ ., data = Data)
y <- Data$Sepal.Length
BranchGLM.fit(x, y, family = "gaussian", link = "identity")