R: Generating an artificial item response dataset

DataGeneration {IRTest}

R Documentation

Generating an artificial item response dataset

Description

This function generates an artificial item response dataset allowing various options.

Usage

DataGeneration(
  seed = 1,
  N = 2000,
  nitem_D = 0,
  nitem_P = 0,
  nitem_C = 0,
  model_D = "2PL",
  model_P = "GPCM",
  latent_dist = "Normal",
  item_D = NULL,
  item_P = NULL,
  item_C = NULL,
  theta = NULL,
  prob = 0.5,
  d = 1.7,
  sd_ratio = 1,
  m = 0,
  s = 1,
  a_l = 0.8,
  a_u = 2.5,
  b_m = NULL,
  b_sd = NULL,
  c_l = 0,
  c_u = 0.2,
  categ = 5,
  possible_ans = seq(0.1, 0.9, length = 5)
)

Arguments

`seed`	A numeric value that is used for random sampling. Seed number can guarantee a replicability of the result.
`N`	A numeric value of the number of examinees.
`nitem_D`	A numeric value of the number of dichotomous items.
`nitem_P`	A numeric value of the number of polytomous items.
`nitem_C`	A numeric value of the number of continuous response items.
`model_D`	A vector or a character string that represents the probability model for the dichotomous items.
`model_P`	A character string that represents the probability model for the polytomous items.
`latent_dist`	A character string that determines the type of latent distribution. Currently available options are `"beta"` (four-parameter beta distribution; `rBeta.4P`), `"chi"` (`\chi^2` distribution; `rchisq`), `"normal"`, `"Normal"`, or `"N"` (standard normal distribution; `rnorm`), and `"Mixture"` or `"2NM"` (two-component Gaussian mixture distribution; see Li (2021) for details.)
`item_D`	An item parameter matrix for using fixed parameter values. The number of columns should be 3: `a` parameter for the first, `b` parameter for the second, and `c` parameter for the third column. Default is `NULL`.
`item_P`	An item parameter matrix for using fixed parameter values. The number of columns should be 7: `a` parameter for the first, and `b` parameters for the rest of the columns. Default is `NULL`.
`item_C`	An item parameter matrix for using fixed parameter values. The number of columns should be 3: `a` parameter for the first, `b` parameter for the second, and `nu` parameter for the third column. Default is `NULL`.
`theta`	An ability parameter vector for using fixed parameter values. Default is `NULL`.
`prob`	A numeric value for using `latent_dist = "2NM"`. It is the `\pi = \frac{n_1}{N}` parameter of two-component Gaussian mixture distribution, where `n_1` is the estimated number of examinees belonging to the first Gaussian component and `N` is the total number of examinees (Li, 2021).
`d`	A numeric value for using `latent_dist = "2NM"`. It is the `\delta = \frac{\mu_2 - \mu_1}{\bar{\sigma}}` parameter of two-component Gaussian mixture distribution, where `\mu_1` and `\mu_2` are the estimated means of the first and second Gaussian components, respectively. And `\bar{\sigma}` is the overall standard deviation of the latent distribution (Li, 2021). Without loss of generality, `\mu_2 \ge \mu_1` is assumed, thus `\delta \ge 0`.
`sd_ratio`	A numeric value for using `latent_dist = "2NM"`. It is the `\zeta = \frac{\sigma_2}{\sigma_1}` parameter of two-component Gaussian mixture distribution, where `\sigma_1` and `\sigma_2` are the estimated standard deviations of the first and second Gaussian components, respectively (Li, 2021).
`m`	A numeric value of the overall mean of the latent distribution. The default is 0.
`s`	A numeric value of the overall standard deviation of the latent distribution. The default is 1.
`a_l`	A numeric value. The lower bound of item discrimination parameters (a).
`a_u`	A numeric value. The upper bound of item discrimination parameters (a).
`b_m`	A numeric value. The mean of item difficulty parameters (b). If unspecified, `m` is passed on to the value.
`b_sd`	A numeric value. The standard deviation of item difficulty parameters (b). If unspecified, `s` is passed on to the value.
`c_l`	A numeric value. The lower bound of item guessing parameters (c).
`c_u`	A numeric value. The lower bound of item guessing parameters (c).
`categ`	A scalar or a numeric vector of length `nitem_P`. The default is 5. If `length(categ)>1`, the ith element equals the number of categories of the ith polyotomous item.
`possible_ans`	Possible options for continuous items (e.g., 0.1, 0.3, 0.5, 0.7, 0.9)

Value

This function returns a list of several objects:

`theta`	A vector of ability parameters (`\theta`).
`item_D`	A matrix of dichotomous item parameters.
`initialitem_D`	A matrix that contains initial item parameter values for dichotomous items.
`data_D`	A matrix of dichotomous item responses where rows indicate examinees and columns indicate items.
`item_P`	A matrix of polytomous item parameters.
`initialitem_P`	A matrix that contains initial item parameter values for polytomous items.
`data_P`	A matrix of polytomous item responses where rows indicate examinees and columns indicate items.
`item_D`	A matrix of continuous response item parameters.
`initialitem_D`	A matrix that contains initial item parameter values for continuous response items.
`data_D`	A matrix of continuous response item responses where rows indicate examinees and columns indicate items.

Author(s)

Seewoo Li cu@yonsei.ac.kr

References

Li, S. (2021). Using a two-component normal mixture distribution as a latent distribution in estimating parameters of item response models. Journal of Educational Evaluation, 34(4), 759-789.

Examples

# Dichotomous item responses

Alldata <- DataGeneration(N = 500,
                          nitem_D = 10)


# Polytomous item responses

Alldata <- DataGeneration(N = 1000,
                          nitem_P = 10)


# Mixed-format items

Alldata <- DataGeneration(N = 1000,
                          nitem_D = 20,
                          nitem_P = 10)

# Continuous items

AllData <- DataGeneration(N = 1000,
                          nitem_C = 10)

# Dataset from non-normal latent density using two-component Gaussian mixture distribution

Alldata <- DataGeneration(N=1000,
                          nitem_P = 10,
                          latent_dist = "2NM",
                          d = 1.664,
                          sd_ratio = 2,
                          prob = 0.3)

[Package IRTest version 2.0.0 Index]