repeated_test {jfa} | R Documentation |
Data Auditing: Repeated Values Test
Description
This function analyzes the frequency with which values get repeated within a set of numbers. Unlike Benford's law, and its generalizations, this approach examines the entire number at once, not only the first or last digit(s).
Usage
repeated_test(
x,
check = c("last", "lasttwo", "all"),
method = c("af", "entropy"),
samples = 2000
)
Arguments
x |
a numeric vector of values from which the digits should be analyzed. |
check |
which digits to shuffle during the procedure. Can be
|
method |
which statistics is used. Defaults to |
samples |
how many samples to use in the bootstraping procedure. |
Details
To determine whether the data show an excessive amount of bunching,
the null hypothesis that x
does not contain an unexpected amount of
repeated values is tested against the alternative hypothesis that x
has more repeated values than expected. The statistic can either be the
average frequency (AF = sum(f_i^2)/sum(f_i))
of the data or the
entropy (E = - sum(p_i * log(p_i))
, with p_i=f_i/n
) of the
data. Average frequency and entropy are highly correlated, but the average
frequency is often more interpretable. For example, an average frequency of
2.5 means that, on average, your observations contain a value that appears
2.5 times in the data set.To quantify what is expected, this test requires
the assumption that the integer portions of the numbers are not associated
with their decimal portions.
Value
An object of class jfaRv
containing:
x |
input data. |
frequencies |
frequencies of observations in |
samples |
vector of simulated samples. |
integers |
counts for extracted integers. |
decimals |
counts for extracted decimals. |
n |
the number of observations in |
statistic |
the value the average frequency or entropy statistic. |
p.value |
the p-value for the test. |
cor.test |
correlation test for the integer portions of the number versus the decimals portions of the number. |
method |
method used. |
check |
checked digits. |
data.name |
a character string giving the name(s) of the data. |
Author(s)
Koen Derks, k.derks@nyenrode.nl
References
Simohnsohn, U. (2019, May 25). Number-Bunching: A New Tool for Forensic Data Analysis. Retrieved from https://datacolada.org/77.
See Also
Examples
set.seed(1)
x <- rnorm(50)
# Repeated values analysis shuffling last digit
repeated_test(x, check = "last", method = "af", samples = 2000)