R: D-score estimation

dscore {dscore}

R Documentation

D-score estimation

Description

The dscore() function estimates the following quantities: D-score, a numeric score that quantifies child development by one number, Development-for-Age Z-score (DAZ) that corrects the D-score for age, standard error of measurement (SEM) of the D-score.

Usage

dscore(
  data,
  items = names(data),
  key = NULL,
  population = NULL,
  xname = "age",
  xunit = c("decimal", "days", "months"),
  prepend = NULL,
  itembank = dscore::builtin_itembank,
  metric = c("dscore", "logit"),
  prior_mean = NULL,
  prior_sd = NULL,
  transform = NULL,
  qp = NULL,
  dec = c(2L, 3L),
  relevance = c(-Inf, Inf),
  algorithm = c("current", "1.8.7"),
  verbose = FALSE
)

dscore_posterior(
  data,
  items = names(data),
  key = NULL,
  population = NULL,
  xname = "age",
  xunit = c("decimal", "days", "months"),
  prepend = NULL,
  itembank = dscore::builtin_itembank,
  metric = c("dscore", "logit"),
  prior_mean = NULL,
  prior_sd = NULL,
  transform = NULL,
  qp = NULL,
  dec = c(2L, 3L),
  relevance = c(-Inf, Inf),
  algorithm = c("current", "1.8.7"),
  verbose = FALSE
)

Arguments

`data`	A `data.frame` with the data. A row collects all observations made on a child on a set of milestones administered at a given age. The function calculates a D-score for each row. Different rows can correspond to different children or ages.
`items`	A character vector containing names of items to be included into the D-score calculation. Milestone scores are coded numerically as `1` (pass) and `0` (fail). By default, D-score calculation is done on all items found in the data that have a difficulty parameter under the specified `key`.
`key`	String. Name of the key that bundles the difficulty estimates pertaining one the same Rasch model. View `builtin_keys` for an overview of the available keys.
`population`	String. The name of the reference population to calculate DAZ. Use `unique(builtin_references$population)` to obtain the set of currently available reference populations.
`xname`	A string with the name of the age variable in `data`. The default is `"age"`. Do not round age.
`xunit`	A string specifying the unit in which age is measured (either `"decimal"`, `"days"` or `"months"`). The default `"decimal"` corresponds to decimal age in years.
`prepend`	Character vector with column names in `data` that will be prepended to the returned data frame. This is useful for copying columns from data into the result, e.g., for matching.
`itembank`	A `data.frame` with at least two columns named `item` and `tau`. By default, the function uses `dscore::builtin_itembank`. If you specify your own `itembank`, then you should also provide the relevant `transform` and `qp` arguments.
`metric`	A string, either `"dscore"` (default) or `"logit"`, signalling the metric in which ability is estimated. `daz` is not calculated for the logit scale.
`prior_mean`	A string or numeric scalar. If a string, it should refer to a column name in `data` with user-supplied values of the prior mean for each observation. If a numeric scalar, it is used as the prior mean for all observations. The default (`NULL`) will consult the `base_population` field in `builtin_keys`, and use the corresponding median of that reference as prior mean for the D-score.
`prior_sd`	A string or a numeric scalar. If a string, it should refer to a column name in `data` with user-supplied values of the prior sd for all observations. If a numeric scalar, it is used as the prior sd for all observations. The default (`NULL`) uses a values of 5.
`transform`	Numeric vector, length 2, containing the intercept and slope of the linear transform from the logit scale into the the D-score scale. The default (`NULL`) searches `builtin_keys` for intercept and slope values.
`qp`	Numeric vector of equally spaced quadrature points. This vector should span the range of all D-score or logit values. The default (`NULL`) creates `seq(from, to, by)` searching the arguments from `builtin_keys`.
`dec`	A vector of two integers specifying the number of decimals for rounding the D-score and DAZ, respectively. The default is `dec = c(2L, 3L)`.
`relevance`	A numeric vector of length with the lower and upper bounds of the relevance interval. The procedure calculates a dynamic EAP for each item. If the difficulty level (tau) of the next item is outside the relevance interval around EAP, the procedure ignore the score on the item. The default is `c(-Inf, +Inf)` does not ignore scores.
`algorithm`	Computational method, for backward compatibility. Either `"current"` (default) or `"1.8.7"` (deprecated).
`verbose`	Logical. Print settings.

Details

The scoring algorithm is based on the method by Bock and Mislevy (1982). The method uses Bayes rule to update a prior ability into a posterior ability.

The item names should correspond to the "gsed" lexicon.

A key is defined by the set of estimated item difficulties.

Key	Model	Quadrature	Instruments	Direct/Caregiver	Reference
`"dutch"`	`⁠75_0⁠`	`-10:80`	1	direct	Van Buuren, 2014/2020
`"gcdg"`	`⁠565_18⁠`	`-10:100`	13	direct	Weber, 2019
`"gsed1912"`	`⁠807_17⁠`	`-10:100`	21	mixed	GSED Team, 2019
`"293_0"`	`⁠293_0⁠`	`-10:100`	2	mixed	GSED Team, 2022
`"gsed2212"`	`⁠818_6⁠`	`-10:100`	27	mixed	GSED Team, 2022
`"gsed2406"`	`⁠818_6⁠`	`-10:100`	27	mixed	GSED Team, 2024

As a general rule, one should only compare D-scores that are calculated using the same key and the same set of quadrature points. For calculating D-scores on new data, the advice is to use the default, which currently is "gsed2406".

The default starting prior is a mean calculated from a so-called "Count model" that describes mean D-score as a function of age. The The Count models are implemented in the function ⁠[count_mu()]⁠. By default, the spread of the starting prior is 5 D-score points around the mean D-score, which corresponds to approximately 1.5 to 2 times the normal spread of child of a given age. The starting prior is informative for very short test (say <5 items), but has little impact on the posterior for larger tests.

Value

The dscore() function returns a data.frame with nrow(data) rows. Optionally, the first block of columns can be copied to the result by using prepend. The second block consists of the following columns:

Name	Label
`a`	Decimal age
`n`	Number of items with valid (0/1) data
`p`	Percentage of passed milestones
`d`	Ability estimate, mean of posterior
`sem`	Standard error of measurement, standard deviation of the posterior
`daz`	D-score corrected for age, calculated in Z-scale (for metric `"dscore"`)

For more detail, the dscore_posterior() function returns a data frame with nrow(data) rows and length(qp) plus prepended columns with the full posterior density of the D-score at each quadrature point. If no valid responses are found, dscore_posterior() returns the prior density. Versions prior to 1.8.5 returned a matrix (instead of a data.frame). Code that depends on the result being a matrix may break and may need adaptation.

Author(s)

Stef van Buuren, Iris Eekhout, Arjan Huizing (2022)

References

Bock DD, Mislevy RJ (1982). Adaptive EAP Estimation of Ability in a Microcomputer Environment. Applied Psychological Measurement, 6(4), 431-444.

Van Buuren S (2014). Growth charts of human development. Stat Methods Med Res, 23(4), 346-368. https://stefvanbuuren.name/publication/van-buuren-2014-gc/

Weber AM, Rubio-Codina M, Walker SP, van Buuren S, Eekhout I, Grantham-McGregor S, Caridad Araujo M, Chang SM, Fernald LCH, Hamadani JD, Hanlon A, Karam SM, Lozoff B, Ratsifandrihamanana L, Richter L, Black MM (2019). The D-score: a metric for interpreting the early development of infants and toddlers across global settings. BMJ Global Health, BMJ Global Health 4: e001724. https://gh.bmj.com/content/bmjgh/4/6/e001724.full.pdf

Examples

# using all defaults and properly formatted data
ds <- dscore(milestones)
head(ds)

# step-by-step example
data <- data.frame(
  id = c(
    "Jane", "Martin", "ID-3", "No. 4", "Five", "6",
    NA_character_, as.character(8:10)
  ),
  age = rep(round(21 / 365.25, 4), 10),
  ddifmd001 = c(NA, NA, 0, 0, 0, 1, 0, 1, 1, 1),
  ddicmm029 = c(NA, NA, NA, 0, 1, 0, 1, 0, 1, 1),
  ddigmd053 = c(NA, 0, 0, 1, 0, 0, 1, 1, 0, 1)
)
items <- names(data)[3:5]

# third item is not part of the default key
get_tau(items, verbose = TRUE)

# calculate D-score
dscore(data)

# prepend id variable to output
dscore(data, prepend = "id")

# or prepend all data
# dscore(data, prepend = colnames(data))

# calculate full posterior
p <- dscore_posterior(data)

# check that rows sum to 1
rowSums(p)

# plot full posterior for measurement 7
barplot(as.matrix(p[7, 12:36]),
  names = 1:25,
  xlab = "D-score", ylab = "Density", col = "grey",
  main = "Full D-score posterior for measurement in row 7",
  sub = "D-score (EAP) = 11.58, SEM = 3.99")

# plot P10, P50 and P90 of D-score references
g <- expand.grid(age = seq(0.1, 4, 0.1), p = c(0.1, 0.5, 0.9))
d <- zad(z = qnorm(g$p), x = g$age, verbose = TRUE)
matplot(
  x = matrix(g$age, ncol = 3), y = matrix(d, ncol = 3), type = "l",
  lty = 1, col = "blue", xlab = "Age (years)", ylab = "D-score",
  main = "D-score preliminary standards: P10, P50 and P90")
abline(h = seq(10, 80, 10), v = seq(0, 4, 0.5), col = "gray", lty = 2)

# add measurements made on very preterms, ga < 32 weeks
ds <- dscore(milestones)
points(x = ds$a, y = ds$d, pch = 19, col = "red")

[Package dscore version 1.9.0 Index]