TopKSignal {TopKSignal}R Documentation

TopKSignal: A convex optimization tool for signal reconstruction from multiple ranked lists.

Description

A mathematical optimization procedure in combination with statistical bootstrap for the estimation of the latent signals (sometimes called scores) informing the global consensus ranking (often named aggregation ranking). When using TopKSignal in your work please cite: Schimek, M. G. et al. (2024). Effective signal reconstruction from multiple ranked lists via convex optimization. Data Mining and Knowledge Discovery. DOI: 10.1007/s10618-023-00991-z. The goal of estimating consensus signals and therefrom consensus ranks (an alternative form of aggregation ranks) across a number of assessors (humans or machines) is achieved via indirect inference. The input rank matrix is fully represented by order constraints. No distance measures or distributional assumptions are involved. The indirect inference procedure is built around a simple signal plus noise model. TopKSignal implements a set of different functions. They permit to construct artificial ranked lists, to derive sets of constraints from an input rank matrix, to run convex optimization (with a quadratic or a linear objective function), to perform bootstrap estimation (standard or Poisson bootstrap), and to produce numerical and graphical output. Different mathematical optimization techniques are available: Optimization with the full set of constraints or with a computationally cheaper restricted set of constraints in combination with either a quadratic or a linear objective function. Different boostrap sample schemes are available: the classical bootstrap and the computationally less demanding Poisson bootstrap.

estimateTheta function

The main function for the estimation of the signals informing the ranks is called estimateTheta(). The required parameters are: (1) a rank matrix, (2) the number of bootstrap samples (500 is recommended), (3) a constant for the support variables \(b>0\), default is 0.1, (4) the type of optimization technique: fullLinear, fullQuadratic, restrictedLinear, and restrictedQuadratic (the latter two recommended), (5) the type of bootstrap sampling scheme: classic.bootstrap and poisson.bootstrap (recommended), and (6) the number of cores for parallel computation. Each bootstrap sample is executed on a dedicated CPU core.

generate.rank.matrix function

The generate.rank.matrix() function requires the user to specify the number of objects (items), called p, and the number of assessors, called n. The function simulates full ranked lists (i.e. no missing assignments) without ties.

violinPlot function

The violin plot displays the bootstrap distribution of the estimated signals along with its means. The deviations from the mean values +/-2 standard errors SE and are shown in the plot. Analyzing the shape of the distribution and the standard error of the signal of each object, it is possible to evaluate its rank stability with respect to all other objects. The violinPlot function requires (1) the result obtained by the estimation procedure and (2) the 'true' (simulated) signals or ground truth (when available).

heatmapPlot function

The heatmap plot allows us to control for specific error patterns associated with the assessors. The heatmap plot displays information about the noises involved in the estimation process. The rows of the noise matrix are ordered by the estimated ranks of the consensus signal values. The columns are ordered by the column error sums. In the plot, the column with the lowest sum is positioned on the left side and the column with the highest sum is positioned on the right side. Hence, assessors positioned on the left show substantial consensus and thus are more reliable than those positioned to the far right. The heatmap plot is also an exploratory tool for the search for a subset of top-ranked objects (notion of top-$k$ objects ? see the package TopKLists on CRAN for details and functions). Please note, beyond exploratory tasks, the noise matrix can serve as input for various inferential purposes such as testing for assessor group differences. The heatmapPlot function requires the estimation results obtained from the estimateTheta function.

elbowPlot function

The elbow plot permits the identification of subsets of objects, e.g. top-$k$ or bottom-$q$ objects. On the x-axis all objects are ordered according to their rank positions. On the y-axis the corresponding estimated signal values are displayed. The idea of the elbow plot is to scan for 'jumps' in the sequence of ordered objects ? i.e. find signal estimates next to each other that are visually much distant - in an exploratory manner. The elbowPlot function requires the estimation results from the estimateTheta function.

Examples

library(TopKSignal)
set.seed(1421)
p = 8
n = 10
input <- generate.rank.matrix(p, n)
rownames(input$R.input) <- c("a","b","c","d","e","f","g","h")
# For the following code Gurobi needs to be installed
## Not run: 
estimatedSignal <- estimateTheta(R.input = input$R.input, num.boot = 50, b = 0.1, 
solver = "gurobi", type = "restrictedQuadratic", bootstrap.type = "poisson.bootstrap",nCore = 1)	

## End(Not run)
data(estimatedSignal)
estimatedSignal

[Package TopKSignal version 1.0 Index]