R: Auto-correlation screening

rp.acors {responsePatterns}

R Documentation

Auto-correlation screening

Description

Auto-correlations of survey data allow for a probabilistic detection of repetitive patterns. This function calculates auto-correlation coefficients for all lags up to the value defined by the max.lag parameter for each observation (respondent). Subsequently, it assigns a percentile value to each observation (respondent) based either on the highest absolute auto-correlation or the sum of absolute auto-correlations. It is essential to keep the variables in the order in which they were presented to respondents.

Usage

rp.acors(
  data,
  max.lag = NULL,
  min.lag = 1,
  id.var = NULL,
  na.rm = FALSE,
  cor.method = c("pearson", "spearman", "kendall"),
  percentile.method = c("max", "sum"),
  na.top = FALSE,
  store.data = TRUE
)

Arguments

`data`	A data frame. A data set containing variables to analyze and, optionally, an ID variable.
`max.lag`	An integer. Define the maximum lag for which auto-correlations should be computed (defaults to the number of items minus 3).
`min.lag`	An integer. Define the minimum lag for which auto-correlations should be computed (defaults to 1).
`id.var`	A string. If the data set contains an ID variable, specify it's name.
`na.rm`	A logical scalar. Should missing values be removed from the computation of auto-correlations?
`cor.method`	A string. Defines the method used to compute auto-correlations (defaults to "pearson").
`percentile.method`	A string. Should the percentiles be based on the maximum absolute auto-correlation or on the sum of the absolute values of all auto-correlations (defaults to "max").
`na.top`	A logical scalar. Should NA indices (i.e., those that could not be computed due to data missingness) be ranked at the top? Defaults to FALSE.
`store.data`	A logical scalar. Should the data be stored within the object? Set to TRUE if you want to use the rp.plot or rp.save2csv functions.

Details

A response pattern yields perfect positive autocorrelation coefficient (r = 1) when the lag is equal to the length of the pattern, provided the pattern itself is uninterrupted over the whole vector of responses. There are two reasons for which the computation of auto-correlation computation can fail, both of which are associated with possible threat to data validity: (1) the pattern is composed of a vector of identical values (e.g., 2,2,2,2,2,2,2). In such cases, an auto-correlation coefficient cannot be computed due to a zero variance but we arbitrarily set the value to r = 1 because it meets the definition of a perfectly repetitive pattern; (2) the sequence contains too many missing values. In such cases we set the value to NA.

Choosing a suitable maximum lag value, i.e. the maximum number of positions for the data to be shifted in auto-correlation analysis, is very important for a reliable screening. Maximum lag value translates into the maximum length of a sequence within a repetitive response pattern that can be efficiently detected. A too low maximum lag value hinders auto-correlation screening ability to detect longer repetitive response patterns, thus potentially lowering the method's sensitivity (i.e., the ability to correctly detect careless responses). On the other hand, maximum lag value set too high generally lowers the reliability, because it makes the instrumental data matrix smaller and it, by calculating higher numbers of auto-correlation coefficients, allows for a higher frequency of occasionally strong auto-correlations that would inflate respondent's final auto-correlation score (determined as the highest absolute autocorrelation coefficient found for the respondent), thus lowering the method's specificity (i.e., the ability to correctly not detect attentive respondents). If not specified by the user, the max.lag value is set to the number of items minus 3.

In order to prevent bias, only questions with the same answer scales should be analyzed at one time, ideally. Analyzing responses on two scales with different number ranges together (e.g., answers on scale 1-5 and answers on scale 1-100) can bias the results to a great extent. See GitHub for an example of how to analyze data from several questionnaires simultaneously. Questions with unique scales or answer options where repetitive response patterns are unlikely or even impossible to emerge, like questions about gender or education, should be excluded prior to screening.

Value

Returns an S4 object of class "ResponsePatterns".

References

Gottfried, J., Jezek, S., & Kralova, M. (2021). Autocorrelation screening: A potentially efficient method for detecting repetitive response patterns in questionnaire data. Manuscript submitted for review.

Examples

rp.acors(rp.simdata, max.lag=10, id.var="optional_ID")

[Package responsePatterns version 0.1.1 Index]