... |
Arguments passed on to mirt::mirt
SE logical; estimate the standard errors by computing the parameter information matrix?
See SE.type for the type of estimates available
covdata a data.frame of data used for latent regression models
formula an R formula (or list of formulas) indicating how the latent traits
can be regressed using external covariates in covdata . If a named list
of formulas is supplied (where the names correspond to the latent trait names in model )
then specific regression effects can be estimated for each factor. Supplying a single formula
will estimate the regression parameters for all latent traits by default
SE.type type of estimation method to use for calculating the parameter information matrix
for computing standard errors and wald tests. Can be:
-
'Richardson' , 'forward' , or 'central' for the numerical Richardson,
forward difference, and central difference evaluation of observed Hessian matrix
-
'crossprod' and 'Louis' for standard error computations based on the variance of the
Fisher scores as well as Louis' (1982) exact computation of the observed information matrix.
Note that Louis' estimates can take a long time to obtain for large sample sizes and long tests
-
'sandwich' for the sandwich covariance estimate based on the
'crossprod' and 'Oakes' estimates (see Chalmers, 2018, for details)
-
'sandwich.Louis' for the sandwich covariance estimate based on the
'crossprod' and 'Louis' estimates
-
'Oakes' for Oakes' (1999) method using a central difference approximation
(see Chalmers, 2018, for details)
-
'SEM' for the supplemented EM (disables the accelerate option automatically; EM only)
-
'Fisher' for the expected information, 'complete' for information based
on the complete-data Hessian used in EM algorithm
-
'MHRM' and 'FMHRM' for stochastic approximations of observed information matrix
based on the Robbins-Monro filter or a fixed number of MHRM draws without the RM filter.
These are the only options supported when method = 'MHRM'
-
'numerical' to obtain the numerical estimate from a call to optim
when method = 'BL'
Note that both the 'SEM' method becomes very sensitive if the ML solution has
has not been reached with sufficient precision, and may be further sensitive
if the history of the EM cycles is not stable/sufficient for convergence of the respective estimates.
Increasing the number of iterations (increasing NCYCLES and decreasing
TOL , see below) will help to improve the accuracy, and can be
run in parallel if a mirtCluster object has been defined (this will be
used for Oakes' method as well). Additionally,
inspecting the symmetry of the ACOV matrix for convergence issues by passing
technical = list(symmetric = FALSE) can be helpful to determine if a sufficient
solution has been reached
method a character object specifying the estimation algorithm to be used. The default is
'EM' , for the standard EM algorithm with fixed quadrature, 'QMCEM' for
quasi-Monte Carlo EM estimation, or 'MCEM' for Monte Carlo EM estimation.
The option 'MHRM' may also be passed to use the MH-RM algorithm,
'SEM' for the Stochastic EM algorithm (first
two stages of the MH-RM stage using an optimizer other than a single Newton-Raphson iteration),
and 'BL' for the Bock and Lieberman
approach (generally not recommended for longer tests).
The 'EM' is generally effective with 1-3 factors, but methods such as the 'QMCEM' ,
'MCEM' , 'SEM' , or 'MHRM' should be used when the dimensions are 3 or more. Note that
when the optimizer is stochastic the associated SE.type is automatically changed to
SE.type = 'MHRM' by default to avoid the use of quadrature
optimizer a character indicating which numerical optimizer to use. By default, the EM
algorithm will use the 'BFGS' when there are no upper and lower bounds box-constraints and
'nlminb' when there are.
Other options include the Newton-Raphson ('NR' ),
which can be more efficient than the 'BFGS' but not as stable for more complex
IRT models (such as the nominal or nested logit models)
and the related 'NR1' which is also the Newton-Raphson
but consists of only 1 update that has been coupled with RM Hessian (only
applicable when the MH-RM algorithm is used). The MH-RM algorithm uses the 'NR1' by default,
though currently the 'BFGS' , 'L-BFGS-B' , and 'NR'
are also supported with this method (with
fewer iterations by default) to emulate stochastic EM updates.
As well, the 'Nelder-Mead' and 'SANN'
estimators are available, but their routine use generally is not required or recommended.
Additionally, estimation subroutines from the Rsolnp and nloptr
packages are available by passing the arguments 'solnp' and 'nloptr' ,
respectively. This should be used in conjunction with the solnp_args and
nloptr_args specified below. If equality constraints were specified in the
model definition only the parameter with the lowest parnum
in the pars = 'values' data.frame is used in the estimation vector passed
to the objective function, and group hyper-parameters are omitted.
Equality an inequality functions should be of the form function(p, optim_args) ,
where optim_args is a list of internally parameters that largely can be ignored
when defining constraints (though use of browser() here may be helpful)
dentype type of density form to use for the latent trait parameters. Current options include
-
'Gaussian' (default) assumes a multivariate Gaussian distribution with an associated
mean vector and variance-covariance matrix
-
'empiricalhist' or 'EH' estimates latent distribution using an empirical histogram described by
Bock and Aitkin (1981). Only applicable for unidimensional models estimated with the EM algorithm.
For this option, the number of cycles, TOL, and quadpts are adjusted accommodate for
less precision during estimation (namely: TOL = 3e-5 , NCYCLES = 2000 , quadpts = 121 )
-
'empiricalhist_Woods' or 'EHW' estimates latent distribution using an empirical histogram described by
Bock and Aitkin (1981), with the same specifications as in dentype = 'empiricalhist' ,
but with the extrapolation-interpolation method described by Woods (2007). NOTE: to improve stability
in the presence of extreme response styles (i.e., all highest or lowest in each item) the technical option
zeroExtreme = TRUE may be required to down-weight the contribution of these problematic patterns
-
'Davidian-#' estimates semi-parametric Davidian curves described by Woods and Lin (2009),
where the # placeholder represents the number of Davidian parameters to estimate
(e.g., 'Davidian-6' will estimate 6 smoothing parameters). By default, the number of
quadpts is increased to 121, and this method is only applicable for
unidimensional models estimated with the EM algorithm
Note that when itemtype = 'ULL' then a log-normal(0,1) density is used to support the unipolar scaling
constrain a list of user declared equality constraints. To see how to define the
parameters correctly use pars = 'values' initially to see how the parameters are
labeled. To constrain parameters to be equal create a list with separate concatenated
vectors signifying which parameters to constrain. For example, to set parameters 1 and 5
equal, and also set parameters 2, 6, and 10 equal use
constrain = list(c(1,5), c(2,6,10)) . Constraints can also be specified using the
mirt.model syntax (recommended)
calcNull logical; calculate the Null model for additional fit statistics (e.g., TLI)?
Only applicable if the data contains no NA's and the data is not overly sparse
draws the number of Monte Carlo draws to estimate the log-likelihood for the MH-RM
algorithm. Default is 5000
survey.weights a optional numeric vector of survey weights to apply for each case in the
data (EM estimation only). If not specified, all cases are weighted equally (the standard IRT
approach). The sum of the survey.weights must equal the total sample size for proper
weighting to be applied
quadpts number of quadrature points per dimension (must be larger than 2).
By default the number of quadrature uses the following scheme:
switch(as.character(nfact), '1'=61, '2'=31, '3'=15, '4'=9, '5'=7, 3) .
However, if the method input is set to 'QMCEM' and this argument is left blank then
the default number of quasi-Monte Carlo integration nodes will be set to 5000 in total
TOL convergence threshold for EM or MH-RM; defaults are .0001 and .001. If
SE.type = 'SEM' and this value is not specified, the default is set to 1e-5 .
To evaluate the model using only the starting values pass TOL = NaN , and
to evaluate the starting values without the log-likelihood pass TOL = NA
gpcm_mats a list of matrices specifying how the scoring coefficients in the (generalized)
partial credit model should be constructed. If omitted, the standard gpcm format will be used
(i.e., seq(0, k, by = 1) for each trait). This input should be used if traits
should be scored different for each category (e.g., matrix(c(0:3, 1,0,0,0), 4, 2) for a
two-dimensional model where the first trait is scored like a gpcm, but the second trait is only
positively indicated when the first category is selected). Can be used when itemtype s
are 'gpcm' or 'Rasch' , but only when the respective element in
gpcm_mats is not NULL
grsm.block an optional numeric vector indicating where the blocking should occur when
using the grsm, NA represents items that do not belong to the grsm block (other items that may
be estimated in the test data). For example, to specify two blocks of 3 with a 2PL item for
the last item: grsm.block = c(rep(1,3), rep(2,3), NA) . If NULL the all items are assumed
to be within the same group and therefore have the same number of item categories
rsm.block same as grsm.block , but for 'rsm' blocks
monopoly.k a vector of values (or a single value to repeated for each item) which indicate
the degree of the monotone polynomial fitted, where the monotone polynomial
corresponds to monopoly.k * 2 + 1 (e.g., monopoly.k = 2 fits a
5th degree polynomial). Default is monopoly.k = 1 , which fits a 3rd degree polynomial
large a logical indicating whether unique response patterns should be obtained prior
to performing the estimation so as to avoid repeating computations on identical patterns.
The default TRUE provides the correct degrees of freedom for the model since all unique patterns
are tallied (typically only affects goodness of fit statistics such as G2, but also will influence
nested model comparison methods such as anova(mod1, mod2) ), while FALSE will use the
number of rows in data as a placeholder for the total degrees of freedom. As such, model
objects should only be compared if all flags were set to TRUE or all were set to FALSE
Alternatively, if the collapse table of frequencies is desired for the purpose of saving computations
(i.e., only computing the collapsed frequencies for the data onte-time) then a character vector can
be passed with the arguement large = 'return' to return a list of all the desired
table information used by mirt . This list object can then be reused by passing it back
into the large argument to avoid re-tallying the data again
(again, useful when the dataset are very large and computing the tabulated data is
computationally burdensome). This strategy is shown below:
- Compute organized data
e.g., internaldat <- mirt(Science, 1, large = 'return')
- Pass the organized data to all estimation functions
e.g.,
mod <- mirt(Science, 1, large = internaldat)
GenRandomPars logical; generate random starting values prior to optimization instead of
using the fixed internal starting values?
accelerate a character vector indicating the type of acceleration to use. Default
is 'Ramsay' , but may also be 'squarem' for the SQUAREM procedure (specifically,
the gSqS3 approach) described in Varadhan and Roldand (2008).
To disable the acceleration, pass 'none'
verbose logical; print observed- (EM) or complete-data (MHRM) log-likelihood
after each iteration cycle? Default is TRUE
solnp_args a list of arguments to be passed to the solnp::solnp() function for
equality constraints, inequality constraints, etc
nloptr_args a list of arguments to be passed to the nloptr::nloptr()
function for equality constraints, inequality constraints, etc
spline_args a named list of lists containing information to be passed to the bs (default)
and ns for each spline itemtype. Each element must refer to the name of the itemtype with the
spline, while the internal list names refer to the arguments which are passed. For example, if item 2 were called
'read2', and item 5 were called 'read5', both of which were of itemtype 'spline' but item 5 should use the
ns form, then a modified list for each input might be of the form:
spline_args = list(read2 = list(degree = 4),
read5 = list(fun = 'ns', knots = c(-2, 2)))
This code input changes the bs() splines function to have a degree = 4 input,
while the second element changes to the ns() function with knots set a c(-2, 2)
control a list passed to the respective optimizers (i.e., optim() , nlminb() ,
etc). Additional arguments have been included for the 'NR' optimizer: 'tol'
for the convergence tolerance in the M-step (default is TOL/1000 ), while the default
number of iterations for the Newton-Raphson optimizer is 50 (modified with the 'maxit'
control input)
technical a list containing lower level technical parameters for estimation. May be:
- NCYCLES
maximum number of EM or MH-RM cycles; defaults are 500 and 2000
- MAXQUAD
maximum number of quadratures, which you can increase if you have more than
4GB or RAM on your PC; default 20000
- theta_lim
range of integration grid for each dimension; default is c(-6, 6) . Note that
when itemtype = 'ULL' a log-normal distribution is used and the range is change to
c(.01, and 6^2) , where the second term is the square of the theta_lim input instead
- set.seed
seed number used during estimation. Default is 12345
- SEtol
standard error tolerance criteria for the S-EM and MHRM computation of the
information matrix. Default is 1e-3
- symmetric
logical; force S-EM/Oakes information matrix estimates to be symmetric? Default is TRUE
so that computation of standard errors are more stable. Setting this to FALSE can help
to detect solutions that have not reached the ML estimate
- SEM_window
ratio of values used to define the S-EM window based on the
observed likelihood differences across EM iterations. The default is
c(0, 1 - SEtol) , which provides nearly the very full S-EM window (i.e.,
nearly all EM cycles used). To use the a smaller SEM window change the window to
to something like c(.9, .999) to start at a point farther into the EM history
- warn
logical; include warning messages during estimation? Default is TRUE
- message
logical; include general messages during estimation? Default is TRUE
- customK
a numeric vector used to explicitly declare the number of response
categories for each item. This should only be used when constructing mirt model for
reasons other than parameter estimation (such as to obtain factor scores), and requires
that the input data all have 0 as the lowest category. The format is the same as the
extract.mirt(mod, 'K') slot in all converged models
- customPriorFun
a custom function used to determine the normalized density for
integration in the EM algorithm. Must be of the form function(Theta, Etable){...} ,
and return a numeric vector with the same length as number of rows in Theta . The
Etable input contains the aggregated table generated from the current E-step
computations. For proper integration, the returned vector should sum to
1 (i.e., normalized). Note that if using the Etable it will be NULL
on the first call, therefore the prior will have to deal with this issue accordingly
- zeroExtreme
logical; assign extreme response patterns a survey.weight of 0
(formally equivalent to removing these data vectors during estimation)?
When dentype = 'EHW' , where Woods' extrapolation is utilized,
this option may be required if the extrapolation causes expected densities to tend towards
positive or negative infinity. The default is FALSE
- customTheta
a custom Theta grid, in matrix form, used for integration.
If not defined, the grid is determined internally based on the number of quadpts
- nconstrain
same specification as the constrain list argument,
however imposes a negative equality constraint instead (e.g., a12 = -a21 , which
is specified as nconstrain = list(c(12, 21)) ). Note that each specification
in the list must be of length 2, where the second element is taken to be -1 times the
first element
- delta
the deviation term used in numerical estimates when computing the ACOV matrix
with the 'forward' or 'central' numerical approaches, as well as Oakes' method with the
Richardson extrapolation. Default is 1e-5
- parallel
logical; use the parallel cluster defined by mirtCluster ?
Default is TRUE
- storeEMhistory
logical; store the iteration history when using the EM algorithm?
Default is FALSE. When TRUE, use extract.mirt to extract
- internal_constraints
logical; include the internal constraints when using certain
IRT models (e.g., 'grsm' itemtype). Disable this if you want to use special optimizers
such as the solnp. Default is TRUE
- gain
a vector of two values specifying the numerator and exponent
values for the RM gain function (val1 / cycle)^val2 .
Default is c(0.10, 0.75)
- BURNIN
number of burn in cycles (stage 1) in MH-RM; default is 150
- SEMCYCLES
number of SEM cycles (stage 2) in MH-RM; default is 100
- MHDRAWS
number of Metropolis-Hasting draws to use in the MH-RM at each iteration; default is 5
- MHcand
a vector of values used to tune the MH sampler. Larger values will
cause the acceptance ratio to decrease. One value is required for each group in
unconditional item factor analysis (mixedmirt() requires additional values
for random effect). If null, these values are determined internally, attempting to
tune the acceptance of the draws to be between .1 and .4
- MHRM_SE_draws
number of fixed draws to use when SE=TRUE and SE.type = 'FMHRM'
and the maximum number of draws when SE.type = 'MHRM' . Default is 2000
- MCEM_draws
a function used to determine the number of quadrature points to draw for the
'MCEM' method. Must include one argument which indicates the iteration number of the
EM cycle. Default is function(cycles) 500 + (cycles - 1)*2 , which starts the number of
draws at 500 and increases by 2 after each full EM iteration
- info_if_converged
logical; compute the information matrix when using the MH-RM algorithm
only if the model converged within a suitable number of iterations? Default is TRUE
- logLik_if_converged
logical; compute the observed log-likelihood when using the MH-RM algorithm
only if the model converged within a suitable number of iterations? Default is TRUE
- keep_vcov_PD
logical; attempt to keep the variance-covariance matrix of the latent traits
positive definite during estimation in the EM algorithm? This generally improves the convergence
properties when the traits are highly correlated. Default is TRUE
|