psel {rPref} | R Documentation |
Preference Selection
Description
Evaluates a preference on a given data set, i.e., returns the maximal elements of a data set for a given preference order.
Usage
psel(df, pref, ...)
psel.indices(df, pref, ...)
peval(pref, ...)
Arguments
df |
A data frame or, for a grouped preference selection, a grouped data frame. See below for details. |
pref |
The preference order constructed via |
... |
Additional (optional) parameters for top(-level)-k selections:
|
Details
The difference between the three variants of the preference selection is:
The
psel
function returns a subset of the data set which contains the maxima according to the given preference.The function
psel.indices
returns just the row indices of the maxima (except top-k queries withshow_level = TRUE
, see top-k preference selection). Hencepsel(df, pref)
is equivalent todf[psel.indices(df, pref),]
for non-grouped data frames.Finally,
peval
does the same likepsel
, but assumes thatp
has an associated data frame which is used for the preference selection. Considerbase_pref
to see how base preferences are associated with data sets or useassoc.df
to explicitly associate a preference with a data frame.
Top-k Preference Selection
For a given top
value of k the k best elements and their level values are returned. The level values are determined as follows:
All the maxima of a data set w.r.t. a preference have level 1.
The maxima of the remainder, i.e., the data set without the level 1 maxima, have level 2.
The n-th iteration of "Take the maxima from the remainder" returns tuples of level n.
By default, psel.indices
does not return the level values. By setting show_level = TRUE
this function
returns a data frame with the columns '.indices' and '.level'.
Note that, if none of the top-k values {top
, at_least
, top_level
} is set,
then all level values are equal to 1.
By definition, a top-k preference selection is non-deterministic.
A top-1 query of two equivalent tuples (equivalence according to pref
)
can return both of these tuples.
For example, a top=1
preference selection on the tuples (a=1, b=1), (a=1, b=2)
w.r.t. low(a)
preference can return either the 'b=1' or the 'b=2' tuple.
On the contrary, a preference selection using at_least
is deterministic by adding all tuples having the same level as the worst level
of the corresponding top-k query. This means, the result is filled with all tuples being not worse than the top-k result.
A preference selection with top-level-k returns all tuples having level k or better.
If the top
or at_least
value is greater than the number of elements in df
(i.e., nrow(df)
), or top_level
is greater than the highest level in df
,
then all elements of df
will be returned without further warning.
Grouped Preference Selection
Using psel
it is also possible to perform a preference selection where the maxima are calculated for every group separately.
The groups have to be created with group_by
from the dplyr package. The preference selection preserves the grouping, i.e.,
the groups are restored after the preference selection.
For example, if the summarize
function from dplyr is applied to
psel(group_by(...), pref)
, the summarizing is done for the set of maxima of each group.
This can be used to e.g., calculate the number of maxima in each group, see the examples below.
A {top
, at_least
, top_level
} preference selection
is applied to each group separately.
A top=k
selection returns the k best tuples for each group.
Hence if there are 3 groups in df
, each containing at least 2 elements,
and we have top = 2
, then 6 tuples will be returned.
Parallel Computation
On multi-core machines the preference selection can be run in parallel using a divide-and-conquer approach. Depending on the data set, this may be faster than a single-threaded computation. To activate parallel computation within rPref the following option has to be set:
options(rPref.parallel = TRUE)
If this option is not set, rPref will use single-threaded computation by default.
With the option rPref.parallel.threads
the maximum number of threads can be specified.
The default is the number of cores on your machine.
To set the number of threads to the value of 4, use:
options(rPref.parallel.threads = 4)
See Also
See complex_pref
on how to construct a Skyline preference.
Examples
# Skyline and top-k/at-least Skyline
psel(mtcars, low(mpg) * low(hp))
psel(mtcars, low(mpg) * low(hp), top = 5)
psel(mtcars, low(mpg) * low(hp), at_least = 5)
# Preference with associated data frame and evaluation
p <- low(mpg, df = mtcars) * (high(cyl) & high(gear))
peval(p)
# Visualizes the Skyline in a plot.
sky1 <- psel(mtcars, high(mpg) * high(hp))
plot(mtcars$mpg, mtcars$hp)
points(sky1$mpg, sky1$hp, lwd=3)
# Grouped preference with dplyr.
library(dplyr)
psel(group_by(mtcars, cyl), low(mpg))
# Returns the size of each maxima group.
summarise(psel(group_by(mtcars, cyl), low(mpg)), n())