dat.distr {BeyondBenford} R Documentation

## Data distribution

### Description

The function returns the histogram of the data. It can also plot one of the Blondeau Da Silva's theoretical distributions (thanks to a lower and an upper bound): this ideal theoretical distribution must be at least approximately followed by the data for the use of Blondeau Da Silva's model to be well-founded. A specific chi-squared statistic can also be computed to find out whether the data distribution is consistent with the theoretical distribution or not.

### Usage

```dat.distr(dat, xlab = "Data", ylab = "Frequency", main = "Distribution of data",
theor = TRUE, nclass = 50, col = "lightblue", conv = 0,
lwbound = max(floor(min(abs(dat))) + 1, (10^(dig - 1))),
upbound = ceiling(max(dat)), dig = 1, colt = "red", ylim = NULL, border = "blue",
nchi = 0, legend = TRUE, bg.leg = "gray85")
```

### Arguments

 `dat` The considered dataset, a data frame containing non-zero real numbers. `xlab` The x-axis label. `ylab` The y-axis label. `main` The title of the graph. `theor` If theor=TRUE Blondeau Da Silva's theoretical distribution is plotted, otherwise only the histogram is represented. `nclass` A strictly positive integer: the number of classes in the histogram. `col` The color used to fill the bars of the histogram. NULL yields unfilled bars. `conv` If conv=1, all values of the dataset are multiplied by 10^k where k is the smallest positive integer such that all non-zero numerical values in the newly multiplied data frame have an absolute value greater than or equal to 1. `lwbound` A positive integer, which characterizes the data. All (or most) of the data are greater than this "lower bound". `upbound` A positive integer, which characterizes the data. All (or most) of the data are lower than this "upper bound". `dig` The chosen position of the digit (from the left). `colt` The color used to plot Blondeau Da Silva's theoretical distribution. `ylim` A two-components vector: the range of y values. `border` The color of the border around the bars. `nchi` A positive integer: the number of classes for values from 10^(p-1) to max(max(data),upbound). If nchi>0, the function returns the chi-squared statistic (with nchi-1 degrees of freedom) of goodness of fit determined by the different classes. The null hypothesis states that the studied distribution is consistent with the considered theoretical distribution. `legend` If legend=TRUE, the legend is displayed. `bg.leg` The background color for the legend box.

### Value

The histogram of the data along with optional Blondeau Da Silva's theoretical distributions and a data frame containing the chi-squared statistic and its associated p-value if requested.

### Note

This warning message can appear: NAs introduced during the automatic conversion. This is due to the fact that some data are not numerical in the entered dataset. Non numerical values and zeros are not counted.

### Author(s)

Blondeau Da Silva St\'ephane

### References

S. Blondeau Da Silva (2020). Benford or not Benford: a systematic but not always well-founded use of an elegant law in experimental fields. Communications in Mathematics and Statistics, 8:167-201. doi: 10.1007/s40304-018-00172-1.

S. Blondeau Da Silva (2018). Benford or not Benford: new results on digits beyond the first. https://arxiv.org/abs/1805.01291.

K. Pearson (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine, 50(302):157-175.

### Examples

```data(address_PierreBuffiere)