mvord {mvord} | R Documentation |
Multivariate Ordinal Regression Models.
Description
Multivariate ordinal regression models in the R package mvord
can be fitted using the function
mvord()
. Two different data structures can be passed on to mvord()
through
the use of two different multiple measurement objects MMO
and MMO2
in the left-hand side of
the model formula. MMO
uses a long data format, which has the advantage that it allows for
varying covariates across multiple measurements. This flexibility requires the specification a
subject index as well as a multiple measurement index. In contrast to MMO
, the function MMO2
has a simplified data structure, but is only applicable in settings where the covariates do not
vary between the multiple measurements. In this case, the multiple ordinal observations as
well as the covariates are stored in different columns of a data.frame
. We refer to this data
structure as wide data format.
Usage
mvord(
formula,
data,
error.structure = cor_general(~1),
link = mvprobit(),
response.levels = NULL,
coef.constraints = NULL,
coef.values = NULL,
threshold.constraints = NULL,
threshold.values = NULL,
weights.name = NULL,
offset = NULL,
PL.lag = NULL,
contrasts = NULL,
control = mvord.control()
)
Arguments
formula |
an object of class |
data |
|
error.structure |
different |
link |
specifies the link function by |
response.levels |
(optional) |
coef.constraints |
(optional) |
coef.values |
(optional) |
threshold.constraints |
(optional) |
threshold.values |
(optional) |
weights.name |
(optional) character string with the column name of subject-specific weights in |
offset |
(optional) this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. One or more offset terms can be included in the formula instead or as well, and if more than one is specified their sum is used. See model.offset. |
PL.lag |
(optional) specifies the time lag of the pairs in the pairwise likelihood approach to be optimized (can be used with |
contrasts |
(optional) an optional list. See the |
control |
(optional) a list of parameters for controlling the fitting process. See |
Details
- Implementation
MMO
: -
data
:-
In
MMO
we use a long format for the input of data, where each row contains a subject index (i
), a multiple measurement index (j
), an ordinal observation (Y) and all the covariates (X1 to Xp). This long format data structure is internally transformed to a matrix of responses which contains NA in the case of missing entries and a list of covariate matrices. This is performed by the multiple measurement objectMMO(Y, i, j)
specifying the column names of the subject index and the multiple measurement index in data. The column containing the ordinal observations can contain integer or character values or can be of class (ordered) 'factor'. When using the long data structure, this column is basically a concatenated vector of each of the multiple ordinal responses. Internally, this vector is then split according to the measurement index. Then the ordinal variable corresponding to each measurement index is transformed into an ordered factor. For an integer or a character vector the natural ordering is used (ascending, or alphabetical). If for character vectors the alphabetical order does not correspond to the ordering of the categories, the optional argument response.levels allows to specify the levels for each response explicitly. This is performed by a list of length q, where each element contains the names of the levels of the ordered categories in ascending (or if desired descending) order. If all the multiple measurements use the same number of classes and same labelling of the classes, the column Y can be stored as an ordered 'factor' (as it is often the case in longitudinal studies). The order of the multiple measurements is needed when specifying constraints on the thresh- old or regression parameters. This order is based on the type of the multiple measurement index column in data. For 'integer', 'character' or 'factor' the natural ordering is used (ascending, or alphabetical). If a different order of the multiple responses is desired, the multiple measurement index column should be an ordered factor with a corresponding ordering of the levels.If the categories differ across multiple measurements (either the number of categories or the category labels) one needs to specify the
response.levels
explicitly. This is performed by a list of lengthJ
(number of multiple measurements), where each element contains the names of the levels of the ordered categories in ascending or descending order.
response.levels = list(c("G","F","E", "D", "C", "B", "A"), c("G","F","E", "D", "C", "B", "A"), c("O","N","M","L", "K", "J", "I", "H"))
formula
-
The ordinal responses (e.g.,
rating
) are passed by aformula
object. Intercepts can be included or excluded in the model depending on the model paramterization:- Model without intercept:
If the intercept should be removed the
formula
for a given response (rating
) and covariates (X1
toXp
) has the following form:formula = MMO(rating, firm_id, rater_id) ~ 0 + X1 + ... + Xp
.- Model with intercept:
If one wants to include an intercept in the model, there are two equivalent possibilities to set the model
formula
. Either one includes the intercept explicitly by:formula = MMO(rating, firm_id, rater_id) ~ 1 + X1 + ... + Xp
,or by
formula = MMO(rating, firm_id, rater_id) ~ X1 + ... + Xp
.
- Implementation
MMO2
: -
data
:The data structure applied by
MMO2
is slightly simplified, where the multiple ordinal observations as well as the covariates are stored as columns in adata.frame
. Each subjecti
corresponds to one row of the data frame, where all outcomes (with missing observations set to NA) and all the covariates are stored in different columns. Ideally each outcome column is of type ordered factor. For column types like 'integer', 'character' or 'factor' a warning is given and the natural ordering is used (ascending, or alphabetical).formula
-
The ordinal responses (e.g.,
rating
) are passed by aformula
object. Intercepts can be included or excluded in the model depending on the model parameterization:formula = MMO2(rater1, rater2, rater3) ~ X1 + ... + Xp
.
error.structure
-
We allow for different error structures depending on the model parameterization:
-
Correlation:
-
cor_general
The most common parameterization is the general correlation matrix.error.structure = cor_general(~ 1)
This parameterization can be extended by allowing a factor dependent correlation structure, where the correlation of each subject
i
depends on a given subject-specific factorf
. This factorf
is not allowed to vary across multiple measurementsj
for the same subjecti
and due to numerical constraints only up to maximum 30 levels are allowed.error.structure = cor_general(~ f)
-
cor_equi
A covariate dependent equicorrelation structure, where the correlations are equal across allJ
dimensions and depend on subject-specific covariatesS1, ..., Sm
. It has to be noted that these covariatesS1, ..., Sm
are not allowed to vary across multiple measurementsj
for the same subjecti
.error.structure = cor_equi(~ S1 + ... + Sm)
-
cor_ar1
In order to account for some heterogeneity theAR(1)
error structure is allowed to depend on covariatesX1, ..., Xp
that are constant over time for each subjecti
.error.structure = cor_ar1(~ S1 + ... + Sm)
-
-
Covariance:
-
cov_general
In case of a full variance-covariance parameterization the standard parameterization with a full variance-covariance is obtained by:
error.structure = cov_general(~ 1)
This parameterization can be extended to the factor dependent covariance structure, where the covariance of each subject depends on a given factor
f
:error.structure = cov_general(~ f)
-
-
coef.constraints
-
The package supports constraints on the regression coefficients. Firstly, the user can specify whether the regression coefficients should be equal across some or all response dimensions. Secondly, the values of some of the regression coefficients can be fixed.
As there is no unanimous way to specify such constraints, we offer two options. The first option is similar to the specification of constraints on the thresholds. The constraints can be specified in this case as a vector or matrix of integers, where coefficients getting same integer value are set equal. Values of the regression coefficients can be fixed through a matrix. Alternatively constraints on the regression coefficients can be specified by using the design employed by the VGAM package. The constraints in this setting are set through a named list, where each element of the list contains a matrix full-column rank. If the values of some regression coefficients should be fixed, offsets can be used. This design has the advantage that it supports constraints on outcome-specific as well as category-specific regression coefficients. While the first option has the advantage of requiring a more concise input, it does not support category-specific coefficients. The second option offers a more flexible design in this respect. For further information on the second option we refer to the vignette and to the documentation of
vglm
.Using the first option, constraints can be specified by a vector or a matrix
coef.constraints
. First, a simple and less flexible way by specifying a vector
coef.constraints
of dimensionJ
. This vector is allocated in the following way: The first element of the vectorcoef.constraints
gets a value of 1. If the coefficients of the multiple measurementj = 2
should be equal to the coefficients of the first dimension (j=1
) again a value of 1 is set. If the coefficients should be different to the coefficients of the first dimension a value of 2 is set. In analogy, if the coefficients of dimensions two and three should be the same one sets both values to 2 and if they should be different, a value of 3 is set. Constraints on the regression coefficients of the remaining multiple measurements are set analogously.coef.constraints <- c(1,1,2,3)
This vector
coef.constraints
sets the coefficients of the first two raters equal\beta_{1\cdot} = \beta_{2\cdot}
A more flexible way to specify constraints on the regression coefficients is a matrix with
J
rows andp
columns, where each column specifies constraints on one of thep
coefficients in the same way as above. In addition, a value ofNA
excludes a corresponding coefficient (meaning it should be fixed to zero).coef.constraints <- cbind(c(1,2,3,4), c(1,1,1,2), c(NA,NA,NA,1), c(1,1,1,NA), c(1,2,3,4), c(1,2,3,4))
This matrix
coef.constraints
gives the following constraints:-
\beta_{12} = \beta_{22} = \beta_{32}
-
\beta_{13} = 0
-
\beta_{23} = 0
-
\beta_{33} = 0
-
\beta_{44} = 0
-
\beta_{14} = \beta_{24} = \beta_{34}
-
coef.values
-
In addition, specific values on regression coefficients can be set in the matrix
coef.values
. Parameters are removed if the value is set to zero (default forNA
's in
coef.constraints
) or to some fixed value. If constraints on parameters are set, these dimensions need to have the same value incoef.values
. Again each column corresponds to one regression coefficient.Together with the
coef.constraints
from above we impose:coef.constraints <- cbind(c(1,2,2), c(1,1,2), c(NA,1,2), c(NA,NA,NA), c(1,1,2))
coef.values <- cbind(c(NA,NA,NA), c(NA,NA,NA), c(0,NA,NA), c(1,1,1), c(NA,NA,NA))
Interaction terms: When constraints on the regression coefficient should be specified in models with interaction terms, the
coef.constraints
matrix has to be expanded manually. In case of interaction terms (specified either byX1 + X2 + X1:X2
or equivalently byX1*X2
), one additional column at the end ofcoef.constraints
for the interaction term has to be specified for numerical variables. For interaction terms including factor variables suitably more columns have to be added to thecoef.constraints
matrix. threshold.constraints
-
Similarly, constraints on the threshold parameters can be imposed by a vector of positive integers, where dimensions with equal threshold parameters get the same integer. When restricting the thresholds of two outcome dimensions to be the same, one has to be careful that the number of categories in the two outcome dimensions must be the same. In our example with
J=4
different outcomes we impose:threshold.constraints <- c(1,1,2)
gives the following restrictions:
-
\bm\theta_{1} = \bm\theta_{2}
-
\bm\theta_{3}
arbitrary.
-
threshold.values
-
In addition, threshold parameter values can be specified by
threshold.values
in accordance with identifiability constraints. For this purpose we use alist
withJ
elements, where each element specifies the constraints of the particular dimension by a vector of length of the number of threshold parameters (number of categories - 1). A number specifies a threshold parameter to a specific value andNA
leaves the parameter flexible. Fordata_mvord
we havethreshold.constraints <- NULL
threshold.values <- list(c(-4,NA,NA,NA,NA,4.5), c(-4,NA,NA,NA,NA,4.5), c(-5,NA,NA,NA,NA,NA,4.5))
Value
The function mvord
returns an object of class
"mvord"
.
The functions summary
and print
are used to display the results.
The function coef
extracts the regression coefficients, a function thresholds
the threshold coefficients
and the function
error_structure
returns the estimated parameters of the corresponding error structure.
An object of class
"mvord"
is a list containing the following components:
beta
-
a named
matrix
of regression coefficients theta
-
a named
list
of threshold parameters error.struct
-
an object of class
error_struct
containing the parameters of the error structure sebeta
-
a named
matrix
of the standard errors of the regression coefficients setheta
-
a named
list
of the standard errors of the threshold parameters seerror.struct
-
a
vector
of standard errors for the parameters of the error structure rho
-
a
list
of all objects that are used inmvord()
References
Hirk R, Hornik K, Vana L (2020). “mvord: An R Package for Fitting Multivariate Ordinal Regression Models.” Journal of Statistical Software, 93(4), 1–41, doi:10.18637/jss.v093.i04.
See Also
print.mvord
, summary.mvord
, coef.mvord
,
thresholds.mvord
, error_structure.mvord
,
mvord.control
, data_cr_panel
,data_cr
,
data_mvord_panel
,data_mvord
, data_mvord2
Examples
library(mvord)
#toy example
data(data_mvord_toy)
#wide data format with MMO2
res <- mvord(formula = MMO2(Y1, Y2) ~ 0 + X1 + X2,
data = data_mvord_toy)
print(res)
summary(res)
thresholds(res)
coefficients(res)
head(error_structure(res))
# convert data_mvord_toy into long format
df <- cbind.data.frame("i" = rep(1:100,2), "j" = rep(1:2,each = 100),
"Y" = c(data_mvord_toy$Y1,data_mvord_toy$Y2),
"X1" = rep(data_mvord_toy$X1,2),
"X2" = rep(data_mvord_toy$X2,2))
#for long format data, use MMO instead of MMO2
res <- mvord(formula = MMO(Y, i, j) ~ 0 + X1 + X2, #or formula = MMO(Y) ~ 0 + X1 + X2
data = df)
print(res)
summary(res)
thresholds(res)
coefficients(res)
head(error_structure(res))
res2 <- mvord(formula = MMO(Y) ~ 0 + X1 + X2,
data = df,
control = mvord.control(solver = "BFGS"),
threshold.constraints = c(1,1),
coef.constraints = c(1,1))
print(res2)
summary(res2)
thresholds(res2)
coefficients(res2)
head(error_structure(res2))
## examples
#load data
data(data_mvord)
head(data_mvord)
#-------------
# cor_general
#-------------
# approx 2 min
res_cor <- mvord(formula = MMO(rating) ~ 0 + X1 + X2 + X3 + X4 + X5,
data = data_mvord,
coef.constraints = cbind(c(1,2,2),
c(1,1,2),
c(NA,1,2),
c(NA,NA,NA),
c(1,1,2)),
coef.values = cbind(c(NA,NA,NA),
c(NA,NA,NA),
c(0,NA,NA),
c(1,1,1),
c(NA,NA,NA)),
threshold.constraints = c(1,1,2),
control = mvord.control(solver = "newuoa"))
print(res_cor)
summary(res_cor)
thresholds(res_cor)
coefficients(res_cor)
head(error_structure(res_cor))
#-------------
# cov_general
#-------------
#approx 4 min
res_cov <- mvord(formula = MMO(rating) ~ 1 + X1 + X2 + X3 + X4 + X5,
data = data_mvord,
error.structure = cov_general(~1),
threshold.values = list(c(-4,NA,NA,NA,NA,4.5),
c(-4,NA,NA,NA,NA,4),
c(-5,NA,NA,NA,NA,NA,4.5))
) #does not converge with BFGS
print(res_cov)
summary(res_cov)
thresholds(res_cov)
coefficients(res_cov)
head(error_structure(res_cov))
#-------------
# cor_ar1
#-------------
#approx 4min
data(data_mvord_panel)
head(data_mvord_panel)
#select subset of data
subset_dat <- data_mvord_panel$year %in% c("year3", "year4", "year5", "year6", "year7")
data_mvord_panel <- data_mvord_panel[subset_dat,]
mult.obs <- 5
res_AR1 <- mvord(formula = MMO(rating) ~ 0 + X1 + X2 + X3 + X4 + X5,
data = data_mvord_panel,
error.structure = cor_ar1(~1),
threshold.constraints = c(1,1,1,2,2),
coef.constraints = c(1,1,1,2,2),
control = mvord.control(solver = "BFGS"))
print(res_AR1)
summary(res_AR1)
thresholds(res_AR1)
coefficients(res_AR1)
head(error_structure(res_AR1))
head(error_structure(res_AR1, type = "corr"))
data(data_mvord2)
# approx 2 min
res_cor <- mvord(formula = MMO2(rater1, rater2, rater3) ~ 0 + X1 + X2 + X3 + X4 + X5,
data = data_mvord2,
coef.constraints = cbind(c(1,2,2),
c(1,1,2),
c(NA,1,2),
c(NA,NA,NA),
c(1,1,2)),
coef.values = cbind(c(NA,NA,NA),
c(NA,NA,NA),
c(0,NA,NA),
c(1,1,1),
c(NA,NA,NA)),
threshold.constraints = c(1,1,2),
control = mvord.control(solver = "newuoa"))
print(res_cor)
summary(res_cor)
thresholds(res_cor)
coefficients(res_cor)
head(error_structure(res_cor))