| est_bilog {irt} | R Documentation |
Item Calibration via BILOG-MG
Description
The function est_bilog facilitates item calibration through BILOG-MG.
It offers two modes of operation: executing BILOG-MG in batch mode or
processing pre-generated BILOG-MG output files. When using the former, ensure
BILOG-MG is installed in the directory specified by bilog_exe_folder.
In the latter case, if the necessary BILOG-MG files (e.g.,
"<analysis_name>.PAR", "<analysis_name>.PH1", etc.) exist and overwrite
= FALSE, there is no need for the BILOG-MG program itself. This function is
capable of parsing BILOG-MG output without it.
Both BILOG-MG 3.0 and BILOG-MG 4.0 are supported. Refer to the
bilog_exe_folder argument for guidance on selecting the desired
version.
Usage
est_bilog(
x = NULL,
model = "3PL",
target_dir = getwd(),
analysis_name = "bilog_calibration",
items = NULL,
examinee_id_var = NULL,
group_var = NULL,
logistic = TRUE,
num_of_alternatives = NULL,
criterion = 0.01,
num_of_quadrature = 81,
max_em_cycles = 100,
newton = 20,
reference_group = NULL,
fix = NULL,
scoring_options = c("METHOD=1", "NOPRINT"),
calib_options = c("NORMAL"),
prior_ability = NULL,
prior_ip = NULL,
overwrite = FALSE,
show_output_on_console = TRUE,
bilog_exe_folder = file.path("C:/Program Files/BILOGMG")
)
Arguments
x |
Either a |
model |
Specifies the item model. Options include:
The default is |
target_dir |
The directory where BILOG-MG analysis and data files will
be stored. The default is the current working directory (i.e.,
|
analysis_name |
A concise filename (without extension) used for the data files created for the analysis. |
items |
A vector of column names or numbers in |
examinee_id_var |
The column name or number containing individual
subject IDs. If not provided (i.e., |
group_var |
The column name or number containing group membership
information for multi-group calibration. Ideally, the grouping variable
should be represented by single-digit integers. If other data types are
provided, integer values will be automatically assigned to the variables.
The default is |
logistic |
A logical value indicating whether to use logistic calibration.
The default value is |
num_of_alternatives |
An integer specifying the maximum number of response alternatives in the raw data. This value is used as an automatic starting value for estimating pseudo-guessing parameters. The default value is |
criterion |
The convergence criterion for EM and Newton iterations. The default value is 0.01. |
num_of_quadrature |
The number of quadrature points used in MML
estimation. The default value is 81. This value will be represented in the
BILOG-MG control file as: |
max_em_cycles |
An integer (0, 1, ...) representing the maximum number
of EM cycles. This value will be represented in the BILOG-MG control file
as: |
newton |
An integer (0, 1, ...) representing the number of Gauss-Newton
iterations following EM cycles. This value will be represented in the
BILOG-MG control file as: |
reference_group |
A value indicating which group's ability distribution
will be set to mean = 0 and standard deviation = 1. For example, if the
When groups are assumed to come from a single population, set this value to 0. The default value is 'NULL'. This value will be represented in the BILOG-MG control file as: 'REFERENCE = reference_group'. |
fix |
Specifies whether the parameters of specific items are free to be
estimated or should be held fixed at their starting values. This argument
accepts a |
scoring_options |
A string vector of keywords/options to be included in
the The default value is The primary option to add to this vector is
Additionally, you can include the following keywords:
Refer to the BILOG-MG manual for detailed explanations of these keywords/options. |
calib_options |
A string vector of additional keywords/options for the
The default value is Including Including If you're calibrating items using the Additional keywords/options that can be added to - Refer to the BILOG-MG manual for detailed explanations of these keywords/options. NOTE: Do not add the following keywords to |
prior_ability |
Prior ability refers to the quadrature points and weights representing the discrete finite distribution of ability for the groups. It should be structured as a list in the following format:
Here, <GROUP-NAME-1> refers to the name of the first group, <GROUP-NAME-2> refers to the name of the second group, and so on. Please refer to the examples section for a practical implementation. |
prior_ip |
Specify prior distributions for item parameters. The default
value is
Quoted descriptions were taken from the BILOG-MG manual. Examples:
In general, one can adjust the alpha and beta parameters to achieve a desired outcome, considering that the mode of the beta distribution is calculated as:
Additionally, setting Note: A non-null |
overwrite |
If set to |
show_output_on_console |
A logical value indicating whether to capture
and display the output of the command on the R console. The default is
|
bilog_exe_folder |
The directory containing the Bilog-MG executable
files. This function supports two versions: BILOG-MG 3 and BILOG-MG 4. For
BILOG-MG version 3, the directory should include the files
|
Value
A list with following elements is returned:
A list with the following elements is returned:
- "ip"
An
Itempool-classobject holding the item parameters. Check...$convergedto ensure the model has converged before usingip. This element is not created whenmodel = "CTT".- "score"
A data frame object containing information on examinee scores such as items attempted (
tried), items answered correctly (right), estimated examinee scores (ability), standard errors of ability estimates (se), and response string probabilities (prob). This element is not created whenmodel = "CTT".- "ctt"
Classical Test Theory (CTT) statistics, including p-values, biserial, and point-biserial estimates calculated by BILOG-MG. If there are groups, group-specific CTT statistics can be found in
ctt$group$GROUP-NAME. Overall statistics for the entire group are located atctt$overall.- "failed_items"
A data frame containing items that could not be estimated.
- "syntax"
The syntax file.
- "em_cycles"
E-M Cycles of the calibration.
- "newton_cycles"
Newton Cycles of the calibration
- "cycle"
The number of cycles run before calibration converges or fails to converge.
- "largest_change"
The largest change observed between the last two cycles.
- "neg_2_log_likelihood"
-2 Log Likelihood value of the last step of the E-M cycles. See also
$em_cycles. This value isNULLwhen the model does not converge. This element is not created whenmodel = "CTT".- "posterior_dist"
Posterior quadrature points and weights.
- "input"
A list object that stores the arguments passed to the function.
Author(s)
Emre Gonulates
Examples
## Not run:
#############################################
############## Example 1 - 2PL ##############
#############################################
# IRT Two-parameter Logistic Model Calibration
# Create responses to be used in BILOG-MG estimation
true_theta <- rnorm(4000)
true_ip <- generate_ip(n = 30, model = "2PL")
resp <- sim_resp(true_ip, true_theta)
# The following line will run BILOG-MG, estimate 2PL model and put the
# analysis results under the target directory:
bilog_calib <- est_bilog(x = resp, model = "2PL",
target_dir = "C:/Temp/Analysis",
overwrite = TRUE)
# Check whether the calibration converged
bilog_calib$converged
# Get the estimated item pool
bilog_calib$ip
# See the BILOG-MG syntax
cat(bilog_calib$syntax)
# See the classical test theory statistics estimated by BILOG-MG:
bilog_calib$ctt
# Get -2LogLikelihood for the model (mainly for model comparison purposes):
bilog_calib$neg_2_log_likelihood
# Get estimated scores
head(bilog_calib$score)
# Compare true and estimated abilities
plot(true_theta, bilog_calib$score$ability, xlab = "True Theta",
ylab = "Estimated theta")
abline(a = 0, b = 1, col = "red", lty = 2)
# Compare true item parameters
plot(true_ip$a, bilog_calib$ip$a, xlab = "True 'a'", ylab = "Estimated 'a'")
abline(a = 0, b = 1, col = "red", lty = 2)
plot(true_ip$b, bilog_calib$ip$b, xlab = "True 'b'", ylab = "Estimated 'b'")
abline(a = 0, b = 1, col = "red", lty = 2)
# Note that Bilog-MG centers the ability at mean 0.
mean(bilog_calib$score$ability)
# Quadrature points and posterior weights:
head(bilog_calib$posterior_dist)
#############################################
############## Example 2 - EAP ##############
#############################################
# Getting Expected-a-posteriori theta scores
result <- est_bilog(x = resp, model = "2PL",
scoring_options = c("METHOD=2", "NOPRINT"),
target_dir = "C:/Temp/Analysis",
overwrite = TRUE)
head(result$score)
###############################################
############## Example 3 - Rasch ##############
###############################################
# Rasch Model Calibration
true_theta <- rnorm(400)
true_ip <- generate_ip(n = 30, model = "Rasch")
resp <- sim_resp(true_ip, true_theta)
# Run calibration
bilog_calib <- est_bilog(x = resp, model = "Rasch",
target_dir = "C:/Temp/Analysis",
overwrite = TRUE)
bilog_calib$ip
plot(true_ip$b, bilog_calib$ip$b, xlab = "True 'b'", ylab = "Estimated 'b'")
abline(a = 0, b = 1, col = "red", lty = 2)
# Note that the 'b' parameters are rescaled so that their arithmetic mean
# equals 0.0.
mean(bilog_calib$ip$b)
#############################################
############## Example 4 - 3PL ##############
#############################################
# IRT Three-parameter Logistic Model Calibration
# Create responses to be used in BILOG-MG estimation
true_theta <- rnorm(4000)
true_ip <- generate_ip(n = 30, model = "3PL")
resp <- sim_resp(true_ip, true_theta)
# The following line will run BILOG-MG, estimate 3PL model and put the
# analysis results under the target directory:
bilog_calib <- est_bilog(x = resp, model = "3PL",
target_dir = "C:/Temp/Analysis",
overwrite = TRUE)
Estimated item pool:
bilog_calib$ip
# Convergence status:
bilog_calib$converged
# Number of EM cycles:
bilog_calib$cycle
# Note that the maximum number of EM cycles were set at:
bilog_calib$input$max_em_cycles
# Largest change at the last cycle (note that convergence criterion is 0.01)
bilog_calib$largest_change
# Estimated Scores:
bilog_calib$score
# CTT stats calculated by BILOG-MG:
bilog_calib$ctt
#############################################
############## Example 5 - 1PL ##############
#############################################
# One-Parameter Logistic Model Calibration
true_theta <- rnorm(800)
true_ip <- generate_ip(n = 30, model = "2PL")
# Set 'a' parameters to a fixed number
true_ip$a <- 1.5
resp <- sim_resp(true_ip, true_theta)
# Run calibration
bilog_calib <- est_bilog(x = resp, model = "1PL",
target_dir = "C:/Temp/Analysis",
overwrite = TRUE)
# Note that all 'a' parameter values and all 'se_a' values are the same:
bilog_calib$ip
plot(true_ip$b, bilog_calib$ip$b, xlab = "True 'b'", ylab = "Estimated 'b'")
abline(a = 0, b = 1, col = "red", lty = 2)
#############################################################
############## Example 6.1 - Multi-group - 3PL ##############
#############################################################
# Multi-group IRT calibration - 3PL
## Generate Data ##
ip <- generate_ip(n = 35, model = "3PL", D = 1.7)
n_upper <- sample(1200:3000, 1)
n_lower <- sample(1900:2800, 1)
theta_upper <- rnorm(n_upper, 1.5, .25)
theta_lower <- rnorm(n_lower)
resp <- sim_resp(ip = ip, theta = c(theta_lower, theta_upper))
# Create response data where first column group information
dt <- data.frame(level = c(rep("Lower", n_lower), rep("Upper", n_upper)),
resp)
## Run Calibration ##
mg_calib <- est_bilog(x = dt, model = "3PL",
group_var = "level",
reference_group = "Lower",
items = 2:ncol(dt), # Exclude the 'group' column
num_of_alternatives = 5,
# Use MAP ability estimation.
# "FIT": calculate GOF for response patterns
scoring_options = c("METHOD=3", "NOPRINT", "FIT"),
target_dir = "C:/Temp/Analysis", overwrite = TRUE,
show_output_on_console = FALSE)
# Estimated item pool
mg_calib$ip
# Print group means
mg_calib$group_info
# Check Convergence
mg_calib$converged
# Print estimated scores of first five examinees
head(mg_calib$score)
# Posterior distributions of 'Lower' (in red) and 'Upper' group
plot(mg_calib$posterior_dist$Upper$point,
mg_calib$posterior_dist$Upper$weight)
points(mg_calib$posterior_dist$Lower$point,
mg_calib$posterior_dist$Lower$weight, col = "red")
#############################################################
############## Example 6.2 - Multi-group - Response_set #####
#############################################################
# Multi-group IRT calibration - Response_set 2PL
## Generate Data ##
ip <- generate_ip(n = 35, model = "2PL", D = 1.7)
n_upper <- sample(1000:2000, 1)
n_lower <- sample(1000:2000, 1)
resp_set <- generate_resp_set(
ip = ip, theta = c(rnorm(n_lower), rnorm(n_upper, 1.5, .25)))
# Attach the group information
resp_set$mygroup <- c(rep("Lower", n_lower), rep("Upper", n_upper))
## Run Calibration ##
mg_calib <- est_bilog(x = resp_set,
model = "2PL",
group_var = "mygroup",
reference_group = "Lower",
target_dir = "C:/Temp/Analysis",
overwrite = TRUE,
show_output_on_console = FALSE)
# Estimated item pool
mg_calib$ip
# Print group means
mg_calib$group_info
###############################################################
############## Example 6.3 - Multi-group - 1PL ################
###############################################################
# Multi-group IRT calibration - 1PL
## Generate Data ##
n_item <- sample(30:40, 1)
ip <- generate_ip(n = n_item, model = "2PL", D = 1.7)
ip$a <- 1.25
n_upper <- sample(700:1000, 1)
n_lower <- sample(1200:1800, 1)
theta_upper <- rnorm(n_upper, 1.5, .25)
theta_lower <- rnorm(n_lower)
resp <- sim_resp(ip = ip, theta = c(theta_lower, theta_upper))
# Create response data where first column group information
dt <- data.frame(level = c(rep("Lower", n_lower), rep("Upper", n_upper)),
resp)
## Run Calibration ##
mg_calib <- est_bilog(x = dt,
model = "1PL",
group_var = "level",
reference_group = "Lower",
items = 2:ncol(dt), # Exclude the 'group' column
target_dir = "C:/Temp/Analysis",
overwrite = TRUE,
show_output_on_console = FALSE)
# Estimated item pool
mg_calib$ip
# Print group means
mg_calib$group_info
# Check Convergence
mg_calib$converged
# Print estimated scores of first five examinees
head(mg_calib$score)
###############################################################
############## Example 6.4 - Multi-group - Prior Ability ######
###############################################################
# Multi-group IRT calibration - 3PL with user supplied prior ability
# parameters
n_item <- sample(40:70, 1)
ip <- generate_ip(n = n_item, model = "3PL", D = 1.7)
n_upper <- sample(2000:4000, 1)
n_lower <- sample(3000:5000, 1)
theta_upper <- rgamma(n_upper, shape = 2, rate = 2)
# hist(theta_upper)
theta_lower <- rnorm(n_lower)
true_theta <- c(theta_lower, theta_upper)
resp <- sim_resp(ip = ip, theta = true_theta, prop_missing = .2)
# Create response data where first column group information
dt <- data.frame(level = c(rep("Lower", n_lower), rep("Upper", n_upper)),
resp)
# Set prior ability parameters
points <- seq(-4, 4, .1)
prior_ability = list(
Lower = list(points = points, weights = dnorm(points)),
# Also try misspecified prior:
# Upper = list(points = points, weights = dnorm(points, 1, .25))
Upper = list(points = points, weights = dgamma(points, 2, 2))
)
mg_calib <- est_bilog(x = dt,
model = "3PL",
group_var = "level",
reference_group = "Lower",
items = 2:ncol(dt), # Exclude the 'group' column
calib_options = c("IDIST = 2"),
prior_ability = prior_ability,
# Use MAP ability estimation.
scoring_options = c("METHOD=3"),
target_dir = target_dir,
overwrite = TRUE,
show_output_on_console = FALSE)
# Check whether model has convergence
mg_calib$converged
# Group information
mg_calib$group_info
# Quadrature points and posterior weights:
head(mg_calib$posterior_dist$Lower)
plot(mg_calib$posterior_dist$Lower$point,
mg_calib$posterior_dist$Lower$weight,
xlab = "Quadrature Points",
ylab = "Weights",
xlim = c(min(c(mg_calib$posterior_dist$Lower$point,
mg_calib$posterior_dist$Upper$point)),
max(c(mg_calib$posterior_dist$Lower$point,
mg_calib$posterior_dist$Upper$point))),
ylim = c(min(c(mg_calib$posterior_dist$Lower$weight,
mg_calib$posterior_dist$Upper$weight)),
max(c(mg_calib$posterior_dist$Lower$weight,
mg_calib$posterior_dist$Upper$weight))))
points(mg_calib$posterior_dist$Upper$point,
mg_calib$posterior_dist$Upper$weight, col = "red")
# Comparison of true and estimated item parameters
plot(ip$a, mg_calib$ip$a, xlab = "True 'a'", ylab = "Estimated 'a'")
plot(ip$b, mg_calib$ip$b, xlab = "True 'b'", ylab = "Estimated 'b'")
plot(ip$c, mg_calib$ip$c, xlab = "True 'c'", ylab = "Estimated 'c'")
# Ability parameters
plot(true_theta, mg_calib$score$ability,
xlab = "True Theta",
ylab = "Estimated Theta")
abline(a = 0, b = 1, col = "red")
####################################################################
############## Example 7 - Read BILOG-MG Output without BILOG-MG ###
####################################################################
# To read BILOG-MG output files saved in the "Analysis/" directory with file
# names like "my_analysis.PH1", "my_analysis.PH2", etc., and without
# performing the calibration (no need for an installed BILOG-MG program on
# your computer), use the following syntax:
result <- est_bilog(target_dir = file.path("Analysis/"), model = "3PL",
analysis_name = "my_analysis", overwrite = FALSE)
####################################################################
############## Example 8 - Fixed Item Parameters ###################
####################################################################
# Fixed item calibration involves setting specific item parameters to
# predefined values while allowing other items' parameters to be freely
# estimated.
# If you want to fix all values of a particular item parameter(s), you can
# use strong priors. Refer to the documentation for the "prior_ip" argument
# for more details.
# Create responses to be used in BILOG-MG estimation
true_theta <- rnorm(3000)
true_ip <- generate_ip(n = 30, model = "3PL")
resp <- sim_resp(true_ip, true_theta)
# Setup the data frame that will hold 'item_id's to be fixed, and the
# item parameters to be fixed.
fix_pars <- data.frame(item_id = c("Item_5", "Item_4", "Item_10"),
a = c(1, 1.5, 1.75),
b = c(-1, 0.25, 0.75),
c = c(.15, .25, .35))
fixed_calib <- est_bilog(x = resp, fix = fix_pars,
target_dir = "C:/Temp/Analysis", overwrite = TRUE)
# Check item parameters for Item_4, Item_5, Item_10:
fixed_calib$ip
######### #########
# If only some of the parameters are supplied, the defaults will be used
# for the missing parameters. For example, for the example below, the
# default 'a' parameter value is 1, and the default 'c' parameter value is
# (1/num_of_alternatives) = (1/5) = 0.2.
fix_pars2 <- data.frame(item_id = c("Item_1", "Item_2", "Item_3"),
b = c(-1, 0.25, 0.75))
fixed_calib2 <- est_bilog(x = resp, fix = fix_pars2,
target_dir = "C:/Temp/Analysis", overwrite = TRUE)
# Check item parameters for Item_4, Item_5, Item_10:
fixed_calib2$ip
##################################################################
############## Example 9 - 3PL with Common Guessing ##############
##################################################################
# IRT Three-parameter Logistic Model Calibration with Common Guessing
# Create responses to be used in BILOG-MG estimation
true_theta <- rnorm(4000)
true_ip <- generate_ip(n = 30, model = "3PL")
resp <- sim_resp(true_ip, true_theta)
# Run calibration:
bilog_calib <- est_bilog(x = resp, model = "3PL",
target_dir = "C:/Temp/Analysis",
calib_options = c("NORMAL", "COMMON"),
overwrite = TRUE)
# Note the 'c' parameters
bilog_calib$ip
##################################################################
############## Example 10 - 3PL with Fixed Guessing ##############
##################################################################
# IRT Three-parameter Logistic Model Calibration with Fixed Guessing
# The aim is to fix guessing parameters of all items to a fixed
# number like 0.25
true_theta <- rnorm(3000)
true_ip <- generate_ip(n = 30, model = "3PL")
true_ip$c <- 0.25
resp <- sim_resp(true_ip, true_theta)
prc1 <- est_bilog(x = resp, model = "3PL", target_dir = "C:/Temp/Analysis",
prior_ip = list(ALPHA = 10000000, BETA = 30000000),
overwrite = TRUE)
## End(Not run) # end dontrun