R: Plot the top and bottom percentiles of each selected variable

percplot {dataprep}

R Documentation

Plot the top and bottom percentiles of each selected variable

Description

The top and bottom percentiles of selected variables calculated by percdata can be plotted by percplot that offers a vivid check of possible outliers. It uses reshape2::melt or dataprep::melt to melt the data and uses ggplot2 and scales to plot the data.

Usage

percplot(data, start = NULL, end = NULL, group = NULL, ncol = NULL,
diff = 0.1, part = 'both')

Arguments

`data`	A data frame to calculate percentiles, from the column `start` to the column `end`.
`start`	The column number of the first variable to calculate percentiles for.
`end`	The column number of the last variable to calculate percentiles for.
`group`	The column number of the grouping variable. It can be selected according to whether the data needs to be processed in groups. If grouping is not required, leave it default (NULL); if grouping is required, set `group` as the column number (position) where the grouping variable is located. If there are more than one grouping variable, it can be turned into a longer group through combination and transformation in advance.
`ncol`	The total columns of the plot.
`diff`	The common difference between `quantile`'s `probs`. Default is 0.1.
`part`	The option of plotting bottom and/or top percentiles (parts). Default is 'both', or 2 for both bottom and top parts. Setting it as 'bottom' or 0 for bottom part and 'top' or 1 for top part.

Details

Four scenes are considered according to the scales of x and y axes, namely the ranges of x and y values. For example, the code, sd(diff(log(as.numeric(as.character(names(data[, start:end])))))) / mean(diff(log(as.numeric(as.character(names(data[, start:end])))))) < 0.1 & max(data[, start:end], na.rm = T) / min(data[, start:end], na.rm = T) > = 10^3, means that the coefficient of variation of the lagged differences of log(x) is below 0.1 and meanwhile the maximum y is 1000 times greater than or equal to the minimum y.

Value

Top (highest or greatest) and bottom (lowest or smallest) percentiles are plotted.

`0th`	Quantile with `probs = 0`
`0.1th`	Quantile with `probs = 0.001`
`0.2th`	Quantile with `probs = 0.002`
`0.3th`	Quantile with `probs = 0.003`
`0.4th`	Quantile with `probs = 0.004`
`0.5th`	Quantile with `probs = 0.005`
`99.5th`	Quantile with `probs = 0.995`
`99.6th`	Quantile with `probs = 0.996`
`99.7th`	Quantile with `probs = 0.997`
`99.8th`	Quantile with `probs = 0.998`
`99.9th`	Quantile with `probs = 0.999`
`100th`	Quantile with `probs = 1`

Author(s)

Chun-Sheng Liang <liangchunsheng@lzu.edu.cn>

References

1. Example data is from https://smear.avaa.csc.fi/download. It includes particle number concentrations in SMEAR I Varrio forest.

2. Wickham, H. 2007. Reshaping data with the reshape package. Journal of Statistical Software, 21(12):1-20.

3. Wickham, H. 2009. ggplot2: Elegant Graphics for Data Analysis. http://ggplot2.org: Springer-Verlag New York.

4. Wickham, H. 2016. ggplot2: elegant graphics for data analysis. Springer-Verlag New York.

5. Wickham, H. 2017. scales: Scale Functions for Visualization. 0.5.0 ed. https://github.com/hadley/scales.

6. Wickham, H. & Seidel, D. 2019. scales: Scale Functions for Visualization. R package version 1.1.0. https://CRAN.R-project.org/package=scales.

Examples

# Plot
percplot(data,5,65,4)

# Plot
percplot(data1,3,7,2)

[Package dataprep version 0.1.5 Index]