R: Logistic randomized response regression

RRlog {RRreg}

R Documentation

Logistic randomized response regression

Description

A dichotomous variable, measured once or more per person by a randomized response method, serves as dependent variable using one or more continuous and/or categorical predictors.

Usage

RRlog(
  formula,
  data,
  model,
  p,
  group,
  n.response = 1,
  LR.test = TRUE,
  fit.n = 3,
  EM.max = 1000,
  optim.max = 500,
  ...
)

Arguments

`formula`	specifying the regression model, see `formula`
`data`	`data.frame`, in which variables can be found (optional)
`model`	Available RR models: `"Warner"`, `"UQTknown"`, `"UQTunknown"`, `"Mangat"`, `"Kuk"`, `"FR"`, `"Crosswise"`, `"Triangular"`, `"CDM"`, `"CDMsym"`, `"SLD"`, `"custom"`. See `vignette("RRreg")` for details.
`p`	randomization probability/probabilities (depending on model, see `RRuni` for details)
`group`	vector specifying group membership. Can be omitted for single-group RR designs (e.g., Warner). For two-group RR designs (e.g., `CDM` or `SLD`), use 1 and 2 to indicate the group membership, matching the respective randomization probabilities `p[1], p[2]`. If an RR design and a direct question (DQ) were both used in the study, the group indices are set to 0 (DQ) and 1 (RR; 1 or 2 for two-group RR designs). This can be used to test, whether the RR design leads to a different prevalence estimate by including a dummy variable for the question format (RR vs. DQ) as predictor. If the corresponding regression coefficient is significant, the prevalence estimates differ between RR and DQ. Similarly, interaction hypotheses can be tested (e.g., the correlation between a sensitive attribute and a predictor is only found using the RR but not the DQ design). Hypotheses like this can be tested by including the interaction of the DQ-RR-dummy variable and the predictor in `formula` (e.g., `RR ~ dummy*predictor`).
`n.response`	number of responses per participant, e.g., if a participant responds to 5 RR questions with the same randomization probability `p` (either a single number if all participants give the same number of responses or a vector)
`LR.test`	test regression coefficients by a likelihood ratio test, i.e., fitting the model repeatedly while excluding one parameter at a time (each nested model is fitted only once, which can result in local maxima). The likelihood-ratio test statistic `G^2(df=1)` is reported in the table of coefficiencts as `deltaG2`.
`fit.n`	Number of fitting replications using random starting values to avoid local maxima
`EM.max`	maximum number of iterations of the EM algorithm. If `EM.max=0`, the EM algorithm is skipped.
`optim.max`	Maximum number of iterations within each run of `optim`
`...`	ignored

Details

The logistic regression model is fitted first by an EM algorithm, in which the dependend RR variable is treated as a misclassified binary variable (Magder & Hughes, 1997). The results are used as starting values for a Newton-Raphson based optimization by optim.

Value

Returns an object RRlog which can be analysed by the generic method summary. In the table of coefficients, the column Wald refers to the Chi^2 test statistic which is computed as Chi^2 = z^2 = Estimate^2/StdErr^2. If LR.test = TRUE, the test statistic deltaG2 is the likelihood-ratio-test statistic, which is computed by fitting a nested logistic model without the corresponding predictor.

Author(s)

Daniel W. Heck

References

van den Hout, A., van der Heijden, P. G., & Gilchrist, R. (2007). The logistic regression model with response variables subject to randomized response. Computational Statistics & Data Analysis, 51, 6060-6069.

Examples

# generate data set without biases
dat <- RRgen(1000, pi = .3, "Warner", p = .9)
dat$covariate <- rnorm(1000)
dat$covariate[dat$true == 1] <- rnorm(sum(dat$true == 1), .4, 1)
# analyse
ana <- RRlog(response ~ covariate, dat, "Warner", p = .9, fit.n = 1)
summary(ana)
# check with true, latent states:
glm(true ~ covariate, dat, family = binomial(link = "logit"))

[Package RRreg version 0.7.5 Index]