percdata {dataprep}R Documentation

Calculate the top and bottom percentiles of each selected variable

Description

Outliers can be preliminarily checked by the calculated top and bottom percentiles. Basic R functions in packages from system library are used to get these percentiles of selected variables in data frames, instead of calling other packages. It saves time.

Usage

percdata(data, start = NULL, end = NULL, group = NULL, diff = 0.1, part = 'both')

Arguments

data

A data frame to calculate percentiles, from the column start to the column end.

start

The column number of the first variable to calculate percentiles for.

end

The column number of the last variable to calculate percentiles for.

group

The column number of the grouping variable. It can be selected according to whether the data needs to be processed in groups. If grouping is not required, leave it default (NULL); if grouping is required, set group as the column number (position) where the grouping variable is located. If there are more than one grouping variable, it can be turned into a longer group through combination and transformation in advance.

diff

The common difference between quantile's probs. Default is 0.1.

part

The option of calculating bottom and/or top percentiles (parts). Default is 'both', or 2 for both bottom and top parts. Setting it as 'bottom' or 0 for bottom part and 'top' or 1 for top part.

Details

The data to be processed ranges from the column start to the last column end. The column numbers of these two columns are needed for the arguments. This requires that the variables of the data to be processed are arranged continuously in the database or table. Or else, it is necessary to move the columns in advance to make a continuous arrangement.

Value

Top (highest or greatest) and bottom (lowest or smallest) percentiles are calculated. According to the default diff (=0.1), the calculated values are as follows.

0th

Quantile with probs = 0

0.1th

Quantile with probs = 0.001

0.2th

Quantile with probs = 0.002

0.3th

Quantile with probs = 0.003

0.4th

Quantile with probs = 0.004

0.5th

Quantile with probs = 0.005

99.5th

Quantile with probs = 0.995

99.6th

Quantile with probs = 0.996

99.7th

Quantile with probs = 0.997

99.8th

Quantile with probs = 0.998

99.9th

Quantile with probs = 0.999

100th

Quantile with probs = 1

Author(s)

Chun-Sheng Liang <liangchunsheng@lzu.edu.cn>

References

1. Example data is from https://smear.avaa.csc.fi/download. It includes particle number concentrations in SMEAR I Varrio forest.

See Also

dataprep::percplot

Examples

# Select the grouping variable and remaining variables after deletion by varidele.
# Column 4 ('monthyear') is the group and the fraction for varidele is 0.25.
# After extracting according to the result by varidele, the group is in the first column.
percdata(data[,c(4,27:61)],2,36,1)

[Package dataprep version 0.1.5 Index]