rv.test {digitTests}R Documentation

Test of Repeated Values

Description

This function analyzes the frequency with which values get repeated within a set of numbers. Unlike Benford's law, and its generalizations, this approach examines the entire number at once, not only the first or last digit.

Usage

rv.test(x, check = 'last', method = 'af', B = 2000)

Arguments

x

a numeric vector of values from which the digits should be analyzed.

check

which digits to shuffle during the procedure. Can be last or lasttwo.

method

which property of the data is calculated. Defaults to af for average frequency, but can also be entropy for entropy.

B

how many samples to use in the bootstraping procedure.

Details

To determine whether the data show an excessive amount of bunching, the null hypothesis that x does not contain an unexpected amount of repeated values is tested against the alternative hypothesis that x has more repeated values than expected. The statistic can either be the average frequency (AF = sum(f_i^2)/sum(f_i)) of the data or the entropy (E = - sum(p_i * log(p_i)), with p_i=f_i/n) of the data. Average frequency and entropy are highly correlated, but the average frequency is often more interpretable. For example, an average frequency of 2.5 means that, on average, your observations contain a value that appears 2.5 times in the data set.To quantify what is expected, this test requires the assumption that the integer portions of the numbers are not associated with their decimal portions.

Value

An object of class dt.rv containing:

x

input data.

frequencies

frequencies of observations in x.

samples

vector of simulated samples.

integers

counts for extracted integers.

decimals

counts for extracted decimals.

n

the number of observations in x.

statistic

the value the average frequency or entropy statistic.

p.value

the p-value for the test.

cor.test

correlation test for the integer portions of the number versus the decimals portions of the number.

method

method used.

check

checked digits.

data.name

a character string giving the name(s) of the data.

Author(s)

Koen Derks, k.derks@nyenrode.nl

References

Simohnsohn, U. (2019, May 25). Number-Bunching: A New Tool for Forensic Data Analysis. Retrieved from https://datacolada.org/77.

See Also

distr.test distr.btest

Examples

 
set.seed(1)
x <- rnorm(50)

# Repeated values analysis shuffling last digit
rv.test(x, check = 'last', method = 'af', B = 2000)


[Package digitTests version 0.1.2 Index]