Extremes {DescTools} | R Documentation |
Find the kth smallest, resp. largest values from a vector x
and return the values and their frequencies.
Small(x, k = 5, unique = FALSE, na.last = NA)
Large(x, k = 5, unique = FALSE, na.last = NA)
HighLow(x, nlow = 5, nhigh = nlow, na.last = NA)
x |
a |
k |
an integer >0 defining how many extreme values should be returned. Default is |
unique |
logical, defining if unique values should be considered or not. If this is set to |
na.last |
for controlling the treatment of |
nlow |
a single integer. The number of the smallest elements of a vector to be printed. Defaults to 5. |
nhigh |
a single integer. The number of the greatest elements of a vector to be printed. Defaults to the number of |
This does not seem to be a difficult problem at first sight. We could simply tabulate and sort the vector and finally take the first or last k values. However sorting and tabulating the whole vector when we're just interested in the few smallest values is a considerable waste of resources. This approach becomes already impracticable for medium vector lengths (~10^{5}). There are several points and solutions of this problem discussed out there. The present implementation is based on highly efficient C++ code and proved to be very fast.
HighLow combines the two upper functions and reports the k extreme values on both sides together with their frequencies in parentheses. It is used for describing univariate variables and is interesting for checking the ends of the vector, where in real data often wrong values accumulate. This is in essence a printing routine for the highest and the lowest values of x.
if unique
is set to FALSE
: a vector with the k most extreme values,
else: a list, containing the k most extreme values and their frequencies.
Andri Signorell <andri@signorell.net>
C++ parts by Nathan Russell and Romain Francois
https://gallery.rcpp.org/articles/top-elements-from-vectors-using-priority-queue/
x <- sample(1:10, 1000, rep=TRUE)
Large(x, 3)
Large(x, k=3, unique=TRUE)
# works fine up to x ~ 1e6
x <- runif(1000000)
Small(x, 3, unique=TRUE)
Small(x, 3, unique=FALSE)
# Both ends
cat(HighLow(d.pizza$temperature, na.last=NA))