mml.sdf {EdSurvey}R Documentation

EdSurvey Direct Estimation


Prepare IRT parameters and score items and then estimate a linear model with direct estimation.


  weightVar = NULL,
  omittedLevels = TRUE,
  composite = TRUE,
  dctPath = NULL,
  verbose = FALSE,
  multiCore = FALSE,
  numberOfCores = NULL,
  minNode = -4,
  maxNode = 4,
  Q = 34,
  scoreDict = defaultNAEPScoreCard(),
  idVar = NULL



a formula for the model.


an for the National Assessment of Educational Progress (NAEP) and the Trends in International Mathematics and Science Study (TIMSS).


a character indicating the weight variable to use. The weightVar must be one of the weights for the If NULL, it uses the default for the


a logical value. When set to the default value of TRUE, drops the levels of all factor variables that are specified in an Use print on an to see the omitted levels.


logical; for a NAEP composite, setting to FALSE fits the model to all items at once, in a single construct, whereas setting to TRUE fits the model as a NAEP composite (i.e., a weighted average of the subscales). This argument is not applicable for TIMSS.


a connection that points to the location of a NAEP dct file. A dct file can be used to input custom item response theory (IRT) parameters and subscale/subtest weights for NAEP assessments compared with those provided in the NAEPirtparams package. Otherwise, the argument defaults to NULL and IRT parameters and subscale weights from NAEPirtparams are used. IRT parameters for TIMSS cannot be supplied through a dctPath and are downloaded by using the downloadTIMSS function.


logical; indicates whether a detailed printout should display during execution, only for NAEP data.


allows the foreach package to be used. You should have already set up registerDoParallel.


the number of cores to be used when using multiCore. Defaults to 75% of available cores. Users can check available cores with detectCores().


numeric; minimum integration point in direct estimation; see mml.


numeric; maximum integration point in direct estimation; see mml.


integer; number of integration points per student used when integrating over the levels of the latent outcome construct.


a data.frame that includes guidelines for scoring the provided NAEP data. Here, scoring refers to turning item responses into scores on each item. To see the default scoring guidelines, call the function defaultNAEPScoreCard(), or see the Examples section. See Details for more information on possible scores.


a variable that is used to explicitly define the name of the student identifier variable to be used from data. Defaults to NULL, and sid is used as the student identifier.


Typically, models are fit with NAEP data using plausible values to integrate out the uncertainty in the measurement of individual student outcomes. When direct estimation is used, the measurement error is integrated out explicitly using Q quadrature points. See documentation for mml in the Dire package.

The scoreDict helps turn response categories that are not simple item responses, such as Not Reached and Multiple, to something coded as inputs for the mml function in Dire. How mml treats these values depends on the test. For NAEP, for a dichotomous item, 8 is scored as the same proportion correct as the guessing parameter for that item, 0 is an incorrect response, an NA does not change the student's score, and 1 is correct. TIMSS does not require a scoreDict.


An edSurveyMML object, which is the outcome from mml.sdf, with the following elements:


an object containing information from the mml procedure. ?mml can be used for further information.


the scoring used in the mml procedure



the item mapping used in the mml procedure



Cohen, J., & Jiang, T. (1999). Comparison of partially measured latent traits across nominal subgroups. Journal of the American Statistical Association, 94(448), 1035–1044.


## Not run: 
## Direct Estimation with NAEP 
# Load data 
sdfNAEP <- readNAEP(system.file("extdata/data", "M36NT2PM.dat", package = "NAEPprimer"))

# Inspect scoring guidelines

# example output: 
#          resCat pointMult pointConst
# 1     Multiple         8          0
# 2  Not Reached        NA         NA
# 3      Missing        NA         NA
# 4      Omitted         8          0
# 5    Illegible         0          0
# 6 Non-Rateable         0          0
# 7     Off Task         0          0

# Run NAEP model, warnings are about item codings
mmlNAEP <- mml.sdf(algebra ~ dsex + b013801, sdfNAEP, weightVar='origwt')

# Call with Taylor
summary(mmlNAEP, varType="Taylor", strataVar="repgrp1", PSUVar="jkunit")

## Direct Estimation with TIMSS 
# Load data 
downloadTIMSS("~/", year=2015)
sdfTIMSS <- readTIMSS("~/TIMSS/2015", countries="usa", grade = "4")

# Run TIMSS model, warnings are about item codings 
mmlTIMSS <- mml.sdf(mmat ~ itsex + asbg04, sdfTIMSS, weightVar='totwgt')

# Call with Taylor
summary(mmlTIMSS, varType="Taylor", strataVar="jkzone", PSUVar="jkrep")

## End(Not run)

[Package EdSurvey version 2.7.1 Index]