benford.analysis {benford.analysis} | R Documentation |
Benford Analysis for data validation and forensic analytics
Description
The Benford Analysis package provides tools that make it easier to validate data using Benford's Law. The main purpose of the package is to identify suspicious data that need further verification.
Details
More information can be found on its help documentation.
The main function is benford
. It generates a Benford
S3 object.
The package defines S3 methods for plotting and printing Benford type objects.
After running benford
you can easily get the "suspicious" data by using the
functions: suspectsTable
, getSuspects
, duplicatesTable
and
getDuplicates
. See help documentation and examples for further details.
The package also includes 6 real datasets for illustration purposes.
References
Alexander, J. (2009). Remarks on the use of Benford's Law. Working Paper, Case Western Reserve University, Department of Mathematics and Cognitive Science.
Berger, A. and Hill, T. (2011). A basic theory of Benford's Law. Probability Surveys, 8, 1-126.
Hill, T. (1995). A statistical derivation of the significant-digit law. Statistical Science, 10(4), 354-363.
Nigrini, M. J. (2012). Benford's Law: Application for Forensic Accounting, Auditing and Fraud Detection. Wiley and Sons: New Jersey.
Nigrini, M. J. (2011). Forensic Analyticis: Methods and Techniques for Forensic Accounting Investigations.Wiley and Sons: New Jersey.
Examples
data(corporate.payment) #gets data
cp <- benford(corporate.payment$Amount, 2, sign="both") #generates benford object
cp #prints
plot(cp) #plots
head(suspectsTable(cp),10) #prints the digits by decreasing order of discrepancies
#gets observations of the 2 most suspicious groups
suspects <- getSuspects(cp, corporate.payment, how.many=2)
duplicatesTable(cp) #prints the duplicates by decreasing order
#gets the observations of the 2 values with most duplicates
duplicates <- getDuplicates(cp, corporate.payment,how.many=2)
MAD(cp) #gets the Mean Absolute Deviation
chisq(cp) #gets the Chi-squared test
#gets observations starting with 50 or 99
digits_50_and_99 <- getDigits(cp, corporate.payment, digits=c(50, 99))