ARPALdf_Summary {ARPALData} | R Documentation |

'ARPALdf_Summary' returns many descriptive statistics summaring the data contained in a data frame of class ARPALdf. Statistics are calculated at overall level (full sample), by station ID and by year. For each variable are reported the basic positioning indices (min, max, mean, median, quantile) and variability indices (range, standard deviation). Other reported statistics are the linear correlation (Pearson) by station and some graphical representation of the distribution (kernel density plot, histogram, Hampel filter and boxplot). In addition, the function returns useful data-quality information: gap length (i.e. number of missing observations for each variable by station and by year).

```
ARPALdf_Summary(
Data,
by_IDStat = 1,
by_Year = 1,
gap_length = 1,
correlation = 1,
histogram = 0,
density = 0,
outlier = 0,
verbose = T
)
```

`Data` |
Dataset of class 'ARPALdf' containing the data to be summarised. |

`by_IDStat` |
Logic value (0 or 1). Use 1 to compute summary statistics by Station ID. Default is 1. |

`by_Year` |
Logic value (0 or 1). Use 1 to compute summary statistics by year. Default is 1. |

`gap_length` |
Logic value (0 or 1). Use 1 to compute summary statistics for the gap length of each variable. Default is 1. |

`correlation` |
Logic value (0 or 1). Use 1 to compute linear correlation of available variables. Default is 1. |

`histogram` |
Logic value (0 or 1). Use 1 to plot the histogram of each variable. Default is 0. |

`density` |
Logic value (0 or 1). Use 1 to plot the kernel density plot of each variable. Default is 0. |

`outlier` |
Logic value (0 or 1). Use 1 to analyse extreme values of each variable (boxplot and Hampel filter). Default is 0. |

`verbose` |
Logic value (T or F). Toggle warnings and messages. If 'verbose=T' (default) the function prints on the screen some messages describing the progress of the tasks. If 'verbose=F' any message about the progression is suppressed. |

A list of data.frames containing summary descriptive statistics for a data frame of class 'ARPALdf'. Summary statistics are computed for the overall sample (Descr), by Station ID (Descr_by_IDStat) and by year (Descr_by_Year). Available statistics are: number of NAs, number of negative values, minimum, mean, maximum and standard deviation.

```
## Download daily air quality data from all the stations for year 2020
d <- get_ARPA_Lombardia_AQ_data(ID_station = NULL, Year = 2020, Frequency = "daily")
## Summarising observed data
sum_stats <- ARPALdf_Summary(Data = d)
```

[Package *ARPALData* version 1.2.3 Index]