R: Calculate the top and bottom percentiles of each selected...

percdata {dataprep}

R Documentation

Calculate the top and bottom percentiles of each selected variable

Description

Outliers can be preliminarily checked by the calculated top and bottom percentiles. Basic R functions in packages from system library are used to get these percentiles of selected variables in data frames, instead of calling other packages. It saves time.

Usage

percdata(data, start = NULL, end = NULL, group = NULL, diff = 0.1, part = 'both')

Arguments

`data`	A data frame to calculate percentiles, from the column `start` to the column `end`.
`start`	The column number of the first variable to calculate percentiles for.
`end`	The column number of the last variable to calculate percentiles for.
`group`	The column number of the grouping variable. It can be selected according to whether the data needs to be processed in groups. If grouping is not required, leave it default (NULL); if grouping is required, set `group` as the column number (position) where the grouping variable is located. If there are more than one grouping variable, it can be turned into a longer group through combination and transformation in advance.
`diff`	The common difference between `quantile`'s `probs`. Default is 0.1.
`part`	The option of calculating bottom and/or top percentiles (parts). Default is 'both', or 2 for both bottom and top parts. Setting it as 'bottom' or 0 for bottom part and 'top' or 1 for top part.

Details

The data to be processed ranges from the column start to the last column end. The column numbers of these two columns are needed for the arguments. This requires that the variables of the data to be processed are arranged continuously in the database or table. Or else, it is necessary to move the columns in advance to make a continuous arrangement.

Value

Top (highest or greatest) and bottom (lowest or smallest) percentiles are calculated. According to the default diff (=0.1), the calculated values are as follows.

`0th`	Quantile with `probs = 0`
`0.1th`	Quantile with `probs = 0.001`
`0.2th`	Quantile with `probs = 0.002`
`0.3th`	Quantile with `probs = 0.003`
`0.4th`	Quantile with `probs = 0.004`
`0.5th`	Quantile with `probs = 0.005`
`99.5th`	Quantile with `probs = 0.995`
`99.6th`	Quantile with `probs = 0.996`
`99.7th`	Quantile with `probs = 0.997`
`99.8th`	Quantile with `probs = 0.998`
`99.9th`	Quantile with `probs = 0.999`
`100th`	Quantile with `probs = 1`

Author(s)

Chun-Sheng Liang <liangchunsheng@lzu.edu.cn>

References

1. Example data is from https://smear.avaa.csc.fi/download. It includes particle number concentrations in SMEAR I Varrio forest.

Examples

# Select the grouping variable and remaining variables after deletion by varidele.
# Column 4 ('monthyear') is the group and the fraction for varidele is 0.25.
# After extracting according to the result by varidele, the group is in the first column.
percdata(data[,c(4,27:61)],2,36,1)

[Package dataprep version 0.1.5 Index]