rrvglm {VGAM} | R Documentation |
Fitting Reduced-Rank Vector Generalized Linear Models (RR-VGLMs) and Doubly Constrained RR-VGLMs (DRR-VGLMs)
Description
A reduced-rank vector generalized linear model (RR-VGLM) is fitted. RR-VGLMs are VGLMs but some of the constraint matrices are estimated. Doubly constrained RR-VGLMs (DRR-VGLMs) can also be fitted.
Usage
rrvglm(formula, family = stop("'family' is unassigned"),
data = list(), weights = NULL, subset = NULL,
na.action = na.fail, etastart = NULL, mustart = NULL,
coefstart = NULL, control = rrvglm.control(...),
offset = NULL, method = "rrvglm.fit", model = FALSE,
x.arg = TRUE, y.arg = TRUE, contrasts = NULL,
constraints = NULL, extra = NULL, qr.arg = FALSE,
smart = TRUE, ...)
Arguments
formula , family , weights |
See |
data |
an optional data frame containing the
variables in the model.
By default the variables are taken from
|
subset , na.action |
See |
etastart , mustart , coefstart |
See |
control |
a list of parameters for controlling
the fitting process.
See |
offset , model , contrasts |
See |
method |
the method to be used in fitting the model.
The default (and presently only)
method |
x.arg , y.arg |
logical values indicating whether the model matrix
and response vector/matrix used in the fitting
process should be assigned in the |
constraints |
See |
extra , smart , qr.arg |
See |
... |
further arguments passed into |
Details
In this documentation, M
is the
number of linear predictors.
For RR-VGLMs,
the central formula is given by
\eta = B_1^T x_1 + A \nu
where x_1
is a vector
(usually just a 1 for an intercept),
x_2
is another vector of explanatory variables, and
\nu = C^T x_2
is an R
-vector of
latent variables.
Here, \eta
is a vector of linear predictors,
e.g., the m
th element is
\eta_m = \log(E[Y_m])
for the
m
th Poisson response.
The dimension of \eta
is M
by
definition.
The matrices B_1
, A
and
C
are estimated from the data, i.e., contain the
regression coefficients. For ecologists, the central
formula represents a constrained linear ordination
(CLO) since it is linear in the latent variables. It
means that the response is a monotonically increasing or
decreasing function of the latent variables.
For identifiability
it is common to enforce corner constraints on A
:
by default, the top R
by R
submatrix is fixed to
be the order-R
identity matrix and the remainder of A
is estimated.
The underlying algorithm of RR-VGLMs is iteratively reweighted least squares (IRLS) with an optimizing algorithm applied within each IRLS iteration (e.g., alternating algorithm).
In theory, any VGAM family function that works for
vglm
and vgam
should work
for rrvglm
too.
The function that actually does the work is rrvglm.fit
;
it is vglm.fit
with some extra code.
Value
An object of class "rrvglm"
, which has the the same
slots as a "vglm"
object. The only difference is
that the some of the constraint matrices are estimates
rather than known. But VGAM stores the models the
same internally. The slots of "vglm"
objects are
described in vglm-class
.
Note
The arguments of rrvglm
are in general the same
as those of vglm
but with some extras in
rrvglm.control
.
The smart prediction (smartpred
) library
is packed with the VGAM library.
In an example below, a rank-1 stereotype
(reduced-rank multinomial logit)
model of Anderson (1984) is fitted to some car data.
The reduced-rank regression is performed, adjusting for
two covariates. Setting a trivial constraint matrix
(diag(M)
)
for the latent variable variables in x_2
avoids
a warning message when it is overwritten by a (common)
estimated constraint matrix. It shows that German cars
tend to be more expensive than American cars, given a
car of fixed weight and width.
If fit <- rrvglm(..., data = mydata)
then
summary(fit)
requires corner constraints and no
missing values in mydata
. Often the estimated
variance-covariance matrix of the parameters is not
positive-definite; if this occurs, try refitting the
model with a different value for Index.corner
.
For constrained quadratic ordination (CQO) see
cqo
for more details about QRR-VGLMs.
With multiple binary responses, one must use
binomialff(multiple.responses = TRUE)
to indicate
that the response is a matrix with one response per column.
Otherwise, it is interpreted as a single binary response
variable.
To fit DRR-VGLMs see the arguments H.A
and
H.C
in rrvglm.control
.
DRR-VGLMs provide structure to the A and
C matrices via constraint matrices.
So instead of them being general unstructured
matrices, one can make many of their elements to
have the same value, else be identically equal
to 0, for example. This gives greater control
over what is modelled as a latent variable,
e.g., if one subset of the covariates are physical
variables and the remainder are psychological
variables then a rank-2 model might have each
latent variable a linear combination of each
of the types of variables separately.
Incidentally,
before I forget, if Corner = TRUE
then
the @H.A
slot indicates that
the Index.corner
rows of A
are estimated. This is a remnant of some
internal computations because it is
more efficient to estimate the entire
A, bar rows str0
, and
then normalize it.
In contrast, optimizing over a subset of
A is slow.
Author(s)
Thomas W. Yee
References
Yee, T. W. and Hastie, T. J. (2003). Reduced-rank vector generalized linear models. Statistical Modelling, 3, 15–41.
Yee, T. W. (2004). A new technique for maximum-likelihood canonical Gaussian ordination. Ecological Monographs, 74, 685–701.
Anderson, J. A. (1984). Regression and ordered categorical variables. Journal of the Royal Statistical Society, Series B, Methodological, 46, 1–30.
Yee, T. W. (2014). Reduced-rank vector generalized linear models with two linear predictors. Computational Statistics and Data Analysis, 71, 889–902.
Yee, T. W., Frigau, L. and Ma, C. (2024). Heaping and seeping, GAITD regression and doubly constrained reduced rank vector generalized linear models, in smoking studies. In preparation.
See Also
rrvglm.control
,
summary.drrvglm
,
lvplot.rrvglm
(same as biplot.rrvglm
),
rrvglm-class
,
grc
,
cqo
,
vglmff-class
,
vglm
,
vglm-class
,
smartpred
,
rrvglm.fit
.
Special family functions include
negbinomial
zipoisson
and zinegbinomial
.
(see Yee (2014)
and what was formerly in COZIGAM).
Methods functions include
Coef.rrvglm
,
calibrate.rrvglm
,
etc.
Data include
crashi
.
Examples
## Not run:
# Example 1: RR NB with Var(Y) = mu + delta1 * mu^delta2
nn <- 1000 # Number of observations
delta1 <- 3.0 # Specify this
delta2 <- 1.5 # Specify this; should be greater than 1
a21 <- 2 - delta2
mydata <- data.frame(x2 = runif(nn), x3 = runif(nn))
mydata <- transform(mydata, mu = exp(2 + 3 * x2 + 0 * x3))
mydata <- transform(mydata,
y2 = rnbinom(nn, mu = mu, size = (1/delta1)*mu^a21))
plot(y2 ~ x2, mydata, pch = "+", col = 'blue', las = 1,
main = paste0("Var(Y) = mu + ", delta1, " * mu^", delta2))
rrnb2 <- rrvglm(y2 ~ x2 + x3, negbinomial(zero = NULL),
data = mydata, trace = TRUE)
a21.hat <- (Coef(rrnb2)@A)["loglink(size)", 1]
beta11.hat <- Coef(rrnb2)@B1["(Intercept)", "loglink(mu)"]
beta21.hat <- Coef(rrnb2)@B1["(Intercept)", "loglink(size)"]
(delta1.hat <- exp(a21.hat * beta11.hat - beta21.hat))
(delta2.hat <- 2 - a21.hat)
# delta1.hat:
# exp(a21.hat * predict(rrnb2)[1,1] - predict(rrnb2)[1,2])
summary(rrnb2)
# Obtain a 95 percent CI for delta2:
se.a21.hat <- sqrt(vcov(rrnb2)["I(latvar.mat)", "I(latvar.mat)"])
ci.a21 <- a21.hat + c(-1, 1) * 1.96 * se.a21.hat
(ci.delta2 <- 2 - rev(ci.a21)) # The 95 percent CI
Confint.rrnb(rrnb2) # Quick way to get it
# Plot the abundances and fitted values vs the latent variable
plot(y2 ~ latvar(rrnb2), data = mydata, col = "blue",
xlab = "Latent variable", las = 1)
ooo <- order(latvar(rrnb2))
lines(fitted(rrnb2)[ooo] ~ latvar(rrnb2)[ooo], col = "red")
# Example 2: stereotype model (RR multinomial logit model)
data(car.all)
scar <- subset(car.all,
is.element(Country, c("Germany", "USA", "Japan", "Korea")))
fcols <- c(13,14,18:20,22:26,29:31,33,34,36) # These are factors
scar[, -fcols] <- scale(scar[, -fcols]) # Stdze all numerical vars
ones <- matrix(1, 3, 1)
clist <- list("(Intercept)" = diag(3), Width = ones, Weight = ones,
Disp. = diag(3), Tank = diag(3), Price = diag(3),
Frt.Leg.Room = diag(3))
set.seed(111)
fit <- rrvglm(Country ~ Width + Weight + Disp. + Tank +
Price + Frt.Leg.Room,
multinomial, data = scar, Rank = 2, trace = TRUE,
constraints = clist, noRRR = ~ 1 + Width + Weight,
# Uncor = TRUE, Corner = FALSE, # orig.
Index.corner = c(1, 3), # Less correlation
Bestof = 3)
fit@misc$deviance # A history of the fits
Coef(fit)
biplot(fit, chull = TRUE, scores = TRUE, clty = 2, Ccex = 2,
ccol = "blue", scol = "orange", Ccol = "darkgreen",
Clwd = 2, main = "1=Germany, 2=Japan, 3=Korea, 4=USA")
## End(Not run)