R: Data Auditing: Digit Distribution Test

digit_test {jfa}

R Documentation

Data Auditing: Digit Distribution Test

Description

This function extracts and performs a test of the distribution of (leading) digits in a vector against a reference distribution. By default, the distribution of leading digits is checked against Benford's law.

Usage

digit_test(
  x,
  check = c("first", "last", "firsttwo", "lasttwo"),
  reference = "benford",
  conf.level = 0.95,
  prior = FALSE
)

Arguments

`x`	a numeric vector.
`check`	location of the digits to analyze. Can be `first`, `last`, `firsttwo`, or `lasttwo`.
`reference`	which character string given the reference distribution for the digits, or a vector of probabilities for each digit. Can be `benford` for Benford's law, `uniform` for the uniform distribution. An error is given if any entry of `reference` is negative. Probabilities that do not sum to one are normalized.
`conf.level`	a numeric value between 0 and 1 specifying the confidence level (i.e., 1 - audit risk / detection risk).
`prior`	a logical specifying whether to use a prior distribution, or a numeric value equal to or larger than 1 specifying the prior concentration parameter, or a numeric vector containing the prior parameters for the Dirichlet distribution on the digit categories.

Details

Benford's law is defined as p(d) = log10(1/d). The uniform distribution is defined as p(d) = 1/d.

Value

An object of class jfaDistr containing:

`data`	the specified data.
`conf.level`	a numeric value between 0 and 1 giving the confidence level.
`observed`	the observed counts.
`expected`	the expected counts under the null hypothesis.
`n`	the number of observations in `x`.
`statistic`	the value the chi-squared test statistic.
`parameter`	the degrees of freedom of the approximate chi-squared distribution of the test statistic.
`p.value`	the p-value for the test.
`check`	checked digits.
`digits`	vector of digits.
`reference`	reference distribution
`match`	a list containing the row numbers corresponding to the observations matching each digit.
`deviation`	a vector indicating which digits deviate from their expected relative frequency under the reference distribution.
`prior`	a logical indicating whether a prior distribution was used.
`data.name`	a character string giving the name(s) of the data.

Author(s)

Koen Derks, k.derks@nyenrode.nl

References

Benford, F. (1938). The law of anomalous numbers. In Proceedings of the American Philosophical Society, 551-572.

Examples

set.seed(1)
x <- rnorm(100)

# First digit analysis against Benford's law
digit_test(x, check = "first", reference = "benford")

# Bayesian first digit analysis against Benford's law
digit_test(x, check = "first", reference = "benford", prior = TRUE)

# Last digit analysis against the uniform distribution
digit_test(x, check = "last", reference = "uniform")

# Bayesian last digit analysis against the uniform distribution
digit_test(x, check = "last", reference = "uniform", prior = TRUE)

# First digit analysis against a custom distribution
digit_test(x, check = "last", reference = 1:9)

# Bayesian first digit analysis against a custom distribution
digit_test(x, check = "last", reference = 1:9, prior = TRUE)

[Package jfa version 0.7.1 Index]