R: CEMC based rank aggregation

CEMC {TopKLists}

R Documentation

CEMC based rank aggregation

Description

Performs Cross Entropy Monte Carlo simulations for generating combined ranked list using CEMC, taking into account the different spaces of ranked input lists.

Usage

CEMC(input, space = NULL, k=NULL, dm = "k", kp = 0.5, N = NULL, N1 = NULL,
rho = 0.1, e1 = 0.1, e2 = 1, w = 0.5, b = 0, init.m = "p", init.w = 0,
d.w = NULL, input.par = NULL, extra=0)

Arguments

`input`	A list of several `TopKLists`, may have different length
`space`	A list of the same structure as the input list. Contains underlying spaces for the top-k lists. NULL means all lists share a common space, which is taken to be the union of all input lists
`k`	Desired length of combined list
`dm`	Distance measure, "s" for Spearman, "k" for Kendall (p=0)
`kp`	Partial distance used in Kendall's tau distance measure
`N`	Number of samples generated in each iterate
`N1`	Number of samples retained after each iterate
`rho`	Proportion of samples used to estimate a new probability matrix
`e1`	Stopping criterion with respect to the l1-norm of the difference of the two probability matrices between the current and previous iterations
`e2`	Stopping criterion with respect to the difference in the obtimizing criterion (e.g. the generalized Kemeny guideline) between the current and the previous iterations
`w`	Weight of the new probability vector for the next iterate
`b`	Parameter used in blur function - this is for finding starting values for the algorithem
`init.m`	Initialization method, see the function `init.p` for details
`init.w`	Probability matrix initialization. (See Details)
`d.w`	Weights for distances from different input lists
`input.par`	Input parameters in a data.frame
`extra`	Number of additional items to be included in the combined ranked list during the calculation

Details

The algorithm implemented is the Order Explicit Algorithm, which is an iterative procedure to maximize an objective function (either based on Kendall's distance (dm="k") or Spearman's distance (dm="s")).

init.w: probability matrix initialization: (1-init.w) * uniform + init.w * estimated from input lists

Value

A list containing three components:

`TopK`	A vector giving the aggregate ranked list.
`ProbMatrix`	A matrix, with each column represent the probability vector of a multinomial distribution and thus sum to 1.
`input.par`	A vector containing tuning parameters used in the current run. User may edit this vector and use it as input for a more refined analysis.

Author(s)

Jie Ding <jding@jimmy.harvard.edu>, Shili Lin <shili@stat.osu.edu>

References

Lin, S. and Ding, J. (2009). Integration of ranked lists via Cross Entropy Monte Carlo with applications to mRNA and microRNA studies. Biometrics, 65, 9-18.

Examples

#small data set; a larger data example is available in the vignettes
L1=c("chicken","dog","cat")
L2=c(1,"chicken","cat", 2:5)
L3=c("dog","chicken",1:10)
input=list(L1,L2,L3)
space1=c("chicken","dog","cat",1:10)
space=list(space1,space1,space1)
outCEMC=CEMC(input, space) #underlying space-dependent

[Package TopKLists version 1.0.8 Index]