rmshoulders {kpeaks}R Documentation

Shoulders Removal in Frequency Polygons

Description

Removes the shoulders around the main peaks in a frequency polygon.

Usage

rmshoulders(xm, xc, trmethod, tv)

Arguments

xm

a numeric vector containing the middle values of peaks of a frequency polygon.

xc

an integer vector containing the frequencies of peaks of a frequency polygon.

trmethod

a string representing the type of shoulders removal option for computing a threshold value. Default method is usr. The alternatives are sd, q1, iqr, avg and med. These methods compute the threshold distance value using some statistics of the distances between the middle values of two successive peaks in the vector xm.

  • sd uses the standard deviation.

  • q1 uses the first quartile (Q1).

  • q3 uses the third quartile (Q3).

  • iqr uses the interquartile range (IQR).

  • avg uses the arithmetic mean.

  • med uses the median.

  • usr uses a user-specified number.

tv

a numeric value to be used as the threshold distance for deciding the shoulders. Default threshold is 1 if the removal method usr is chosen. Depending on the selected removal method tv equals to:

  • one standart deviation if trmethod is sd,

  • the first quartile if trmethod is q1,

  • the third quartile if trmethod is q3,

  • one quarter of the interquartile range if trmethod is iqr,

  • the arithmetic mean if trmethod is avg,

  • the median if trmethod is med,

  • a user-specified number if trmethod is usr.

Details

Literally speaking, a shoulder peak or shortly shoulder is a secondary peak in a close location before or after the main peak of a mountain. In a frequency polygon, a shoulder is a smaller peak that is quite close to a higher peak resulting a non-obvious valley between them. Shoulders may occur randomly due to some reasons such as random noises or selecting higher number of classes in histogram building etc. Usually, it is desired to remove them from the peaks vector of a frequency polygon. In 'kpeaks', a peak considered as a shoulder when its height is smaller than the height of its neighbor peak and its distance to its neighbor is also lower than a threshold distance value. In order to compute a threshold distance value, here, we propose to use seven options as listed in the section ‘arguments’. The options q1 and iqr can be applied to remove the minor shoulders that are very near to the main peaks while q3 is recommended to eliminate the substantial shoulders in the processed frequency polygon. The remaining options may be more efficient for removing the moderate shoulders.

Value

pm

a data frame with two columns whose names are pvalues and pfreqs for the middle values and the frequencies of the peaks after removal process, respectively.

np

an integer representing the number of peaks after removal of the shoulders.

Note

The function rmshoulders normally should be called with the input values that are returned by the function findpolypeaks.

Author(s)

Zeynel Cebeci, Cagatay Cebeci

References

Cebeci, Z. & Cebeci, C. (2018). "A novel technique for fast determination of K in partitioning cluster analysis", Journal of Agricultural Informatics, 9(2), 1-11. doi: 10.17700/jai.2018.9.2.442.

Cebeci, Z. & Cebeci, C. (2018). "kpeaks: An R Package for Quick Selection of K for Cluster Analysis", In 2018 International Conference on Artificial Intelligence and Data Processing (IDAP), IEEE. doi: 10.1109/IDAP.2018.8620896.

See Also

findpolypeaks, plotpolygon, genpolygon

Examples

# Build a data vector with three peaks
x1 <-rnorm(100, mean=20, sd=5)
x2 <-rnorm(50, mean=50, sd=5)
x3 <-rnorm(150, mean=90, sd=10)
x <- c(x1,x3,x2)

# generate the frequency polygon and histogram of x by using Doane rule
hvals <- genpolygon(x, binrule="doane")
plotpolygon(x, nbins=hvals$nbins, ptype="p")

# find the peaks in frequency polygon of x by using the default threshold frequency
resfpp <- findpolypeaks(xm=hvals$mids, xc=hvals$freqs)
print(resfpp)

# remove the shoulders with the threshold distance option 'avg'
resrs <- rmshoulders(resfpp$pm[,1], resfpp$pm[,2], trmethod = "avg")
print(resrs)

# remove the shoulders with the threshold distance option 'iqr'
resrs <- rmshoulders(resfpp$pm[,1], resfpp$pm[,2], trmethod = "iqr")
print(resrs)

data(x5p4c)
# plot the frequnecy polygon and histogram of p2 in x5p4c data set 
hvals <- genpolygon(x5p4c$p2, binrule="usr", nbins=30)
plotpolygon(x5p4c$p2, nbins=hvals$nbins, ptype="ph")

# find the peaks in frequency polygon of p2 
resfpp <- findpolypeaks(xm=hvals$mids, xc=hvals$freqs, tcmethod = "min")
print(resfpp)

# remove the shoulders with threshold distance option 'q1'
resrs <- rmshoulders(resfpp$pm[,1], resfpp$pm[,2], trmethod = "q1")
print(resrs)

## Not run: 
data(iris)
# plot the frequency polygon and histogram of Petal.Length in iris data set 
# by using a user-defined class number 
hvals <- genpolygon(iris$Petal.Length, binrule="usr", nbins=30)
plotpolygon(iris$Petal.Length, nbins=hvals$nbins, ptype="p")

# find the peaks in frequency polygon of Petal.Length with default 
# threshold frequency value
resfpp <- findpolypeaks(xm=hvals$mids, xc=hvals$freqs)
print(resfpp)

# remove the shoulders with threshold option 'med'
resrs <- rmshoulders(resfpp$pm[,1], resfpp$pm[,2], trmethod = "med")
print(resrs)

## End(Not run)

[Package kpeaks version 1.1.0 Index]