R: Data distribution

dat.distr {BeyondBenford}

R Documentation

Data distribution

Description

The function returns the histogram of the data. It can also plot one of the Blondeau Da Silva's theoretical distributions (thanks to a lower and an upper bound): this ideal theoretical distribution must be at least approximately followed by the data for the use of Blondeau Da Silva's model to be well-founded. A specific chi-squared statistic can also be computed to find out whether the data distribution is consistent with the theoretical distribution or not.

Usage

dat.distr(dat, xlab = "Data", ylab = "Frequency", main = "Distribution of data", 
theor = TRUE, nclass = 50, col = "lightblue", conv = 0, 
lwbound = max(floor(min(abs(dat))) + 1, (10^(dig - 1))), 
upbound = ceiling(max(dat)), dig = 1, colt = "red", ylim = NULL, border = "blue", 
nchi = 0, legend = TRUE, bg.leg = "gray85")

Arguments

`dat`	The considered dataset, a data frame containing non-zero real numbers.
`xlab`	The x-axis label.
`ylab`	The y-axis label.
`main`	The title of the graph.
`theor`	If theor=TRUE Blondeau Da Silva's theoretical distribution is plotted, otherwise only the histogram is represented.
`nclass`	A strictly positive integer: the number of classes in the histogram.
`col`	The color used to fill the bars of the histogram. NULL yields unfilled bars.
`conv`	If conv=1, all values of the dataset are multiplied by 10^k where k is the smallest positive integer such that all non-zero numerical values in the newly multiplied data frame have an absolute value greater than or equal to 1.
`lwbound`	A positive integer, which characterizes the data. All (or most) of the data are greater than this "lower bound".
`upbound`	A positive integer, which characterizes the data. All (or most) of the data are lower than this "upper bound".
`dig`	The chosen position of the digit (from the left).
`colt`	The color used to plot Blondeau Da Silva's theoretical distribution.
`ylim`	A two-components vector: the range of y values.
`border`	The color of the border around the bars.
`nchi`	A positive integer: the number of classes for values from 10^(p-1) to max(max(data),upbound). If nchi>0, the function returns the chi-squared statistic (with nchi-1 degrees of freedom) of goodness of fit determined by the different classes. The null hypothesis states that the studied distribution is consistent with the considered theoretical distribution.
`legend`	If legend=TRUE, the legend is displayed.
`bg.leg`	The background color for the legend box.

Value

The histogram of the data along with optional Blondeau Da Silva's theoretical distributions and a data frame containing the chi-squared statistic and its associated p-value if requested.

Note

This warning message can appear: NAs introduced during the automatic conversion. This is due to the fact that some data are not numerical in the entered dataset. Non numerical values and zeros are not counted.

Author(s)

Blondeau Da Silva St\'ephane

References

S. Blondeau Da Silva (2020). Benford or not Benford: a systematic but not always well-founded use of an elegant law in experimental fields. Communications in Mathematics and Statistics, 8:167-201. doi: 10.1007/s40304-018-00172-1.

S. Blondeau Da Silva (2018). Benford or not Benford: new results on digits beyond the first. https://arxiv.org/abs/1805.01291.

K. Pearson (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine, 50(302):157-175.

Examples

data(address_PierreBuffiere)
dat.distr(address_PierreBuffiere,nchi=6)

data(census)
dat.distr(census,theor=0,nclass=100,dig=3)

data(address_AixesurVienne)
dat.distr(address_AixesurVienne,lwbound=3,upbound=75)

[Package BeyondBenford version 1.4 Index]