small_samptest {ptools} | R Documentation |
Small Sample Exact Test for Counts in Bins
Description
Small sample test statistic for counts of N items in bins with particular probability.
Usage
small_samptest(d, p = rep(1/length(d), length(d)), type = "G", cdf = FALSE)
Arguments
d |
vector of counts, e.g. c(0,2,1,3,1,4,0) for counts of crimes in days of the week |
p |
vector of baseline probabilities, defaults to equal probabilities in each bin |
type |
string specifying "G" for likelihhood ratio G stat (the default), "V" for Kuipers test (for circular data), "KS" for Komolgrov-Smirnov test, and "Chi" for Chi-square test |
cdf |
if |
Details
This construct a null distribution for small sample statistics for N counts in M bins. Example use cases are to see if a repeat offender have a proclivity to commit crimes on a particular day of the week (see the referenced paper). It can also be used for Benford's analysis of leading/trailing digits for small samples. Referenced paper shows G test tends to have the most power, although with circular data may consider Kuiper's test.
Value
A small_sampletest object with slots for:
-
CDF
, a dataframe that contains the exact probabilities and test statistic values for every possible permutation -
probabilities
, the null probabilities you specified -
data
, the observed counts you specified -
test
, the type of test conducted (e.g. G, KS, Chi, etc.) -
test_stat
, the test statistic for the observed data -
p_value
, the p-value for the observed stat based on the exact null distribution -
AggregateStatistics
, here is a reduced form aggregate table for the CDF/p-value calculation
If you wish to save the object, you may want to get rid of the CDF part, it can be quite large. It will have a total of choose(n+n-1,m-1)
total rows, where m is the number of bins and n is the total counts. So if you have 10 crimes in 7 days of the week, it will result in a dataframe with choose(7 + 10 - 1,7-1)
, which is 8008 rows.
Currently I keep the CDF part though to make it easier to calculate power for a particular test
References
Nigrini, M. J. (2012). Benford's Law: Applications for forensic accounting, auditing, and fraud detection. John Wiley & Sons.
Wheeler, A. P. (2016). Testing Serial Crime Events for Randomness in Day-of-Week Patterns with Small Samples. Journal of Investigative Psychology and Offender Profiling, 13(2), 148-165.
See Also
powalt()
for calculating power of a test under alternative
Examples
# Counts for different days of the week
d <- c(3,1,1,0,0,1,1) #format N observations in M bins
res <- small_samptest(d=d,type="G")
print(res)
# Example for Benfords analysis
f <- 1:9
p_fd <- log10(1 + (1/f)) #first digit probabilities
#check data from Nigrini page 84
checks <- c(1927.48,27902.31,86241.90,72117.46,81321.75,97473.96,
93249.11,89658.17,87776.89,92105.83,79949.16,87602.93,
96879.27,91806.47,84991.67,90831.83,93766.67,88338.72,
94639.49,83709.28,96412.21,88432.86,71552.16)
# To make example run a bit faster
c1 <- checks[1:10]
#extracting the first digits
fd <- substr(format(c1,trim=TRUE),1,1)
tot <- table(factor(fd, levels=paste(f)))
resG <- small_samptest(d=tot,p=p_fd,type="Chi")
resG
#Can reuse the cdf table if you have the same number of observations
c2 <- checks[11:20]
fd2 <- substr(format(c2,trim=TRUE),1,1)
t2 <- table(factor(fd2, levels=paste(f)))
resG2 <- small_samptest(d=t2,p=p_fd,type="Chi",cdf=resG$CDF)