mle_hetop {HETOP}R Documentation

Maximum Likelihood Estimation of Heteroskedastic Ordered Probit (HETOP) Model

Description

Computes MLEs of G group means and standard deviations using count data from K ordinal categories under a heteroskedastic ordered probit model. Estimation is conducted conditional on two fixed cutpoints, and additional constraints on group parameters are imposed if needed to achieve identification in the presence of sparse counts.

Usage

mle_hetop(ngk, fixedcuts, svals=NULL, iterlim = 1500, ...)

Arguments

ngk

Numeric matrix of dimension G x K in which column k of row g indicates the number of units from group g falling into category k.

fixedcuts

A vector of length 2 providing the first two cutpoints, to identify the location and scale of the group parameters. Note that this suffices for any K >= 3.

svals

Optional vector of starting values. Its length is 2G + (K-3) when no groups have sparse counts that affect identifiability; otherwise it must be smaller. See Details.

iterlim

Maximum number of iterations used in optimization (passed to nlm).

...

Any other arguments for nlm.

Details

This function requires K >= 3. If ngk has all nonzero counts, all model parameters are identified. Alternatively, arbitrary identification rules are required to ensure the existence of the MLE when there are one or more groups with nonzero counts in fewer than three categories. This function adopts the following rules. For any group with nonzero counts in fewer than three categories, the log of the group standard deviation is constrained to equal the mean of the log standard deviations for the remaining groups. Further constraints are imposed to handle groups for which all data fall into either the lowest or highest category. Let S be the set of groups for which it is not the case that all data fall into an extreme category. Then for any group with all data in the lowest category, the mean for that group is constrained to be the minimum of the group means over S. Similarly, for any group with all data in the highest category, the mean for that grou is constrained to be the maximum of the group means over S.

The location and scale of the group means are identified for the purpose of conducting the estimation by fixing two of the cutpoints. However in practice it may be desirable to express the group means and standard deviations on a scale that is more easily interpreted; see Reardon et al. (2017) for details. This function reports estimates on four different scales: (1) the original estimation scale with two fixed cutpoints; (2) a scale defined by forcing the group means and log group standard deviations each to have weighted mean of zero, where weights are proportional to the total count for each group; (3) a scale where the population mean of the latent variable is zero and the population standard deviation is one; and (4) a scale similar to (3) but where a bias correction is applied. See Reardon et al. (2017) for details on this bias correction.

The function also returns an estimated intracluster correlation (ICC) of the latent variable, defined as the ratio of the between-group variance of the latent variable to its marginal variance. Scales (1)-(3) above lead to the same estimated ICC; scale (4) uses a bias-corrected estimate of the ICC which will not in general equal the estimate from scales (1)-(3).

Value

A list with the following components:

est_fc

A list of estimated group means, group standard deviations, cutpoints and ICC on scale (1).

est_zero

A list of estimated group means, group standard deviations, cutpoints and ICC on scale (2).

est_star

A list of estimated group means, group standard deviations, cutpoints and ICC on scale (3).

est_starbc

A list of estimated group means, group standard deviations, cutpoints and ICC on scale (4).

nlmdetails

The object returned by nlm that summarizes detailed of the optimization.

pstatus

A dataframe, with one row for each group, summarizing the estimation status of the mean and standard deviation for each group. A value of est means that the parameter was estimated without constraints. A value of mean, used for the group standard deviations, indicates that the parameter was constrained. Values of min or max, used for the group means, indicate that the parameter was constrained.

Author(s)

J.R. Lockwood jrlockwood@ets.org

References

Reardon S., Shear B.R., Castellano K.E. and Ho A.D. (2017). “Using heteroskedastic ordered probit models to recover moments of continuous test score distributions from coarsened data,” Journal of Educational and Behavioral Statistics 42(1):3–45.

Lockwood J.R., Castellano K.E. and Shear B.R. (2018). “Flexible Bayesian models for inferences from coarsened, group-level achievement data,” Journal of Educational and Behavioral Statistics. 43(6):663–692.

Examples

set.seed(1001)

## define true parameters
G         <- 10
mug       <- seq(from= -2.0, to= 2.0, length=G)
sigmag    <- seq(from=  2.0, to= 0.8, length=G)
cutpoints <- c(-1.0, 0.0, 0.8)

## generate data with large counts
ng   <- rep(100000,G)
ngk  <- gendata_hetop(G, K = 4, ng, mug, sigmag, cutpoints)
print(ngk)

## compute MLE and check parameter recovery:
m    <- mle_hetop(ngk, fixedcuts = c(-1.0, 0.0))
print(cbind(true = mug,       est = m$est_fc$mug))
print(cbind(true = sigmag,    est = m$est_fc$sigmag))
print(cbind(true = cutpoints, est = m$est_fc$cutpoints))

## estimates on other scales:
p    <- ng/sum(ng)
print(sum(p * m$est_zero$mug))
print(sum(p * log(m$est_zero$sigmag)))

print(sum(p * m$est_star$mug))
print(sum(p * (m$est_star$mug^2 + m$est_star$sigmag^2)))

## dealing with sparse counts
ngk_sparse <- matrix(rpois(G*4, lambda=5), ncol=4)
ngk_sparse[1,] <- c(5,8,0,0)
ngk_sparse[2,] <- c(0,10,10,0)
ngk_sparse[3,] <- c(12,0,0,0)
ngk_sparse[4,] <- c(0,0,0,10)
print(ngk_sparse)

m    <- mle_hetop(ngk_sparse, fixedcuts = c(-1.0, 0.0))
print(m$pstatus)
print(unique(m$est_fc$sigmag[1:4]))
print(exp(mean(log(m$est_fc$sigmag[5:10]))))
print(m$est_fc$mug[3])
print(min(m$est_fc$mug[-3]))
print(m$est_fc$mug[4])
print(max(m$est_fc$mug[-4]))

[Package HETOP version 0.2-6 Index]