fnth-fmedian {collapse} | R Documentation |
Fast (Grouped, Weighted) N'th Element/Quantile for Matrix-Like Objects
Description
fnth
(column-wise) returns the n'th smallest element from a set of unsorted elements x
corresponding to an integer index (n
), or to a probability between 0 and 1. If n
is passed as a probability, ties can be resolved using the lower, upper, or average of the possible elements, or, since v1.9.0, continuous quantile estimation. The new default is quantile type 7 (as in quantile
). For n > 1
, the lower element is always returned (as in sort(x, partial = n)[n]
). See Details.
fmedian
is a simple wrapper around fnth
, which fixes n = 0.5
and (default) ties = "mean"
i.e. it averages eligible elements. See Details.
Usage
fnth(x, n = 0.5, ...)
fmedian(x, ...)
## Default S3 method:
fnth(x, n = 0.5, g = NULL, w = NULL, TRA = NULL, na.rm = .op[["na.rm"]],
use.g.names = TRUE, ties = "q7", nthreads = .op[["nthreads"]],
o = NULL, check.o = is.null(attr(o, "sorted")), ...)
## Default S3 method:
fmedian(x, ..., ties = "mean")
## S3 method for class 'matrix'
fnth(x, n = 0.5, g = NULL, w = NULL, TRA = NULL, na.rm = .op[["na.rm"]],
use.g.names = TRUE, drop = TRUE, ties = "q7", nthreads = .op[["nthreads"]], ...)
## S3 method for class 'matrix'
fmedian(x, ..., ties = "mean")
## S3 method for class 'data.frame'
fnth(x, n = 0.5, g = NULL, w = NULL, TRA = NULL, na.rm = .op[["na.rm"]],
use.g.names = TRUE, drop = TRUE, ties = "q7", nthreads = .op[["nthreads"]], ...)
## S3 method for class 'data.frame'
fmedian(x, ..., ties = "mean")
## S3 method for class 'grouped_df'
fnth(x, n = 0.5, w = NULL, TRA = NULL, na.rm = .op[["na.rm"]],
use.g.names = FALSE, keep.group_vars = TRUE, keep.w = TRUE, stub = .op[["stub"]],
ties = "q7", nthreads = .op[["nthreads"]], ...)
## S3 method for class 'grouped_df'
fmedian(x, w = NULL, TRA = NULL, na.rm = .op[["na.rm"]],
use.g.names = FALSE, keep.group_vars = TRUE, keep.w = TRUE, stub = .op[["stub"]],
ties = "mean", nthreads = .op[["nthreads"]], ...)
Arguments
x |
a numeric vector, matrix, data frame or grouped data frame (class 'grouped_df'). | ||||||||||||||||||||||||||
n |
the element to return using a single integer index such that | ||||||||||||||||||||||||||
g |
a factor, | ||||||||||||||||||||||||||
w |
a numeric vector of (non-negative) weights, may contain missing values only where | ||||||||||||||||||||||||||
TRA |
an integer or quoted operator indicating the transformation to perform:
0 - "na" | 1 - "fill" | 2 - "replace" | 3 - "-" | 4 - "-+" | 5 - "/" | 6 - "%" | 7 - "+" | 8 - "*" | 9 - "%%" | 10 - "-%%". See | ||||||||||||||||||||||||||
na.rm |
logical. Skip missing values in | ||||||||||||||||||||||||||
use.g.names |
logical. Make group-names and add to the result as names (default method) or row-names (matrix and data frame methods). No row-names are generated for data.table's. | ||||||||||||||||||||||||||
ties |
an integer or character string specifying the method to resolve ties between adjacent qualifying elements:
| ||||||||||||||||||||||||||
nthreads |
integer. The number of threads to utilize. Parallelism is across groups for grouped computations on vectors and data frames, and at the column-level otherwise. See Details. | ||||||||||||||||||||||||||
o |
integer. A valid ordering of | ||||||||||||||||||||||||||
check.o |
logical. | ||||||||||||||||||||||||||
drop |
matrix and data.frame method: Logical. | ||||||||||||||||||||||||||
keep.group_vars |
grouped_df method: Logical. | ||||||||||||||||||||||||||
keep.w |
grouped_df method: Logical. Retain | ||||||||||||||||||||||||||
stub |
character. If | ||||||||||||||||||||||||||
... |
for |
Details
For v1.9.0 fnth
was completely rewritten in C and offers significantly enhanced speed and functionality. It uses a combination of quickselect, quicksort, and radixsort algorithms, combined with several (weighted) quantile estimation methods and, where possible, OpenMP multithreading. This synthesis can be summarised as follows:
without weights, quickselect is used to determine a (lower) order statistic. If
ties %!in% c("min", "max")
a second order statistic is found by taking the max of the upper part of the partitioned array, and the two statistics are averaged using a simple mean (ties = "mean"
), or weighted average according to aquantile
method (ties = "q5"-"q9"
). Forn = 0.5
, all supported quantile methods give the sample median. With matrices, multithreading is always across columns, for vectors and data frames it is across groups unlessis.null(g)
for data frames.with weights and no groups (
is.null(g)
),radixorder
is called internally (on each column ofx
). The ordering is used to sum the weights in order ofx
and determine weighted order statistics or quantiles. See details below. Multithreading is disabled asradixorder
cannot be called concurrently on the same memory stack.with weights and groups (
!is.null(g)
), R's quicksort algorithm is used to sort the data in each group and return an index which can be used to sum the weights in order and proceed as before. This is multithreaded across columns for matrices, and across groups otherwise.in
fnth.default
, an ordering ofx
can be supplied to 'o
' e.g.fnth(x, 0.75, o = radixorder(x))
. This dramatically speeds up the estimation both with and without weights, and is useful iffnth
is to be invoked repeatedly on the same data. With groups,o
needs to also account for the grouping e.g.fnth(x, 0.75, g, o = radixorder(g, x))
. Multithreading is possible across groups. See Examples.
If n > 1
, the result is equivalent to (column-wise) sort(x, partial = n)[n]
. Internally, n
is converted to a probability using p = (n-1)/(NROW(x)-1)
, and that probability is applied to the set of non-missing elements to find the as.integer(p*(fnobs(x)-1))+1L
'th element (which corresponds to option ties = "min"
).
When using grouped computations with n > 1
, n
is transformed to a probability p = (n-1)/(NROW(x)/ng-1)
(where ng
contains the number of unique groups in g
).
If weights are used and ties = "q5"-"q9"
, weighted continuous quantile estimation is done as described in fquantile
.
For ties %in% c("mean", "min", "max")
, a target partial sum of weights p*sum(w)
is calculated, and the weighted n'th element is the element k such that all elements smaller than k have a sum of weights <= p*sum(w)
, and all elements larger than k have a sum of weights <= (1 - p)*sum(w)
. If the partial-sum of weights (p*sum(w)
) is reached exactly for some element k, then (summing from the lower end) both k and k+1 would qualify as the weighted n'th element. If the weight of element k+1 is zero, k, k+1 and k+2 would qualify... . If n > 1
, k is chosen (consistent with the unweighted behavior).
If 0 < n < 1
, the ties
option regulates how to resolve such conflicts, yielding lower (ties = "min"
: k), upper (ties = "max"
: k+2) or average weighted (ties = "mean"
: mean(k, k+1, k+2)) n'th elements.
Thus, in the presence of zero weights, the weighted median (default ties = "mean"
) can be an arithmetic average of >2 qualifying elements. Users may prefer a quantile based weighted median by setting ties = "q5"-"q9"
, which is a continuous function of p
and ignores elements with zero weights.
For data frames, column-attributes and overall attributes are preserved if g
is used or drop = FALSE
.
Value
The (w
weighted) n'th element/quantile of x
, grouped by g
, or (if TRA
is used) x
transformed by its (grouped, weighted) n'th element/quantile.
See Also
fquantile
, fmean
, fmode
, Fast Statistical Functions, Collapse Overview
Examples
## default vector method
mpg <- mtcars$mpg
fnth(mpg) # Simple nth element: Median (same as fmedian(mpg))
fnth(mpg, 5) # 5th smallest element
sort(mpg, partial = 5)[5] # Same using base R, fnth is 2x faster.
fnth(mpg, 0.75) # Third quartile
fnth(mpg, 0.75, w = mtcars$hp) # Weighted third quartile: Weighted by hp
fnth(mpg, 0.75, TRA = "-") # Simple transformation: Subtract third quartile
fnth(mpg, 0.75, mtcars$cyl) # Grouped third quartile
fnth(mpg, 0.75, mtcars[c(2,8:9)]) # More groups..
g <- GRP(mtcars, ~ cyl + vs + am) # Precomputing groups gives more speed !
fnth(mpg, 0.75, g)
fnth(mpg, 0.75, g, mtcars$hp) # Grouped weighted third quartile
fnth(mpg, 0.75, g, TRA = "-") # Groupwise subtract third quartile
fnth(mpg, 0.75, g, mtcars$hp, "-") # Groupwise subtract weighted third quartile
## data.frame method
fnth(mtcars, 0.75)
head(fnth(mtcars, 0.75, TRA = "-"))
fnth(mtcars, 0.75, g)
fnth(fgroup_by(mtcars, cyl, vs, am), 0.75) # Another way of doing it..
fnth(mtcars, 0.75, g, use.g.names = FALSE) # No row-names generated
## matrix method
m <- qM(mtcars)
fnth(m, 0.75)
head(fnth(m, 0.75, TRA = "-"))
fnth(m, 0.75, g) # etc..
## method for grouped data frames - created with dplyr::group_by or fgroup_by
mtcars |> fgroup_by(cyl,vs,am) |> fnth(0.75)
mtcars |> fgroup_by(cyl,vs,am) |> fnth(0.75, hp) # Weighted
mtcars |> fgroup_by(cyl,vs,am) |> fnth(0.75, TRA = "/") # Divide by third quartile
mtcars |> fgroup_by(cyl,vs,am) |> fselect(mpg, hp) |> # Faster selecting
fnth(0.75, hp, "/") # Divide mpg by its third weighted group-quartile, using hp as weights
# Efficient grouped estimation of multiple quantiles
mtcars |> fgroup_by(cyl,vs,am) |>
fmutate(o = radixorder(GRPid(), mpg)) |>
fsummarise(mpg_Q1 = fnth(mpg, 0.25, o = o),
mpg_median = fmedian(mpg, o = o),
mpg_Q3 = fnth(mpg, 0.75, o = o))
## fmedian()
fmedian(mpg) # Simple median value
fmedian(mpg, w = mtcars$hp) # Weighted median: Weighted by hp
fmedian(mpg, TRA = "-") # Simple transformation: Subtract median value
fmedian(mpg, mtcars$cyl) # Grouped median value
fmedian(mpg, mtcars[c(2,8:9)]) # More groups..
fmedian(mpg, g)
fmedian(mpg, g, mtcars$hp) # Grouped weighted median
fmedian(mpg, g, TRA = "-") # Groupwise subtract median value
fmedian(mpg, g, mtcars$hp, "-") # Groupwise subtract weighted median value
## data.frame method
fmedian(mtcars)
head(fmedian(mtcars, TRA = "-"))
fmedian(mtcars, g)
fmedian(fgroup_by(mtcars, cyl, vs, am)) # Another way of doing it..
fmedian(mtcars, g, use.g.names = FALSE) # No row-names generated
## matrix method
fmedian(m)
head(fmedian(m, TRA = "-"))
fmedian(m, g) # etc..
## method for grouped data frames - created with dplyr::group_by or fgroup_by
mtcars |> fgroup_by(cyl,vs,am) |> fmedian()
mtcars |> fgroup_by(cyl,vs,am) |> fmedian(hp) # Weighted
mtcars |> fgroup_by(cyl,vs,am) |> fmedian(TRA = "-") # De-median
mtcars |> fgroup_by(cyl,vs,am) |> fselect(mpg, hp) |> # Faster selecting
fmedian(hp, "-") # Weighted de-median mpg, using hp as weights