cds {cds}R Documentation

Constrained Dual Scaling for Successive Categories with Groups

Description

Uses an alternating nonnegative least squares algorithm combined with a k-means-type algorithm to optimize the constrained group dual scaling criterion outlined in the reference. Parallel computations for random starts of the grouping matrix is supported via package parallel.

Usage

cds(x, K = 4, q = NULL, eps.ALS = 0.001, eps.G = 1e-07,
  nr.starts.G = 20, nr.starts.a = 5, maxit.ALS = 20, maxit = 50,
  Gstarts = NULL, astarts = NULL, parallel = FALSE, random.G = FALSE,
  times.a.multistart = 1, info.level = 1, mc.preschedule = TRUE,
  seed = NULL, LB = FALSE, reorder.grps = TRUE, rescale.a = TRUE,
  tol = sqrt(.Machine$double.eps), update.G = TRUE)

Arguments

x

an object of class "dsdata" (see cds.sim()), or a matrix (or object coercible to a matrix) containing the data for n individuals on m objects. The data does not yet contain any additional columns for the rating scale.

K

The number of response style groups to look for. If a vector of length greater than one is given, the algorithm is run for each element and a list of class cdslist is returned.

q

The maximum rating (the scale is assumed to be 1:q).

eps.ALS

Numerical convergence criterion for the alternating least squares part of the algorithm (updates for row and column scores).

eps.G

Numerical convergence criterion for the k-means part of the algorithm.

nr.starts.G

Number of random starts for the grouping matrix.

nr.starts.a

Number of random starts for the row scores.

maxit.ALS

Maximum number of iterations for the ALS part of the algorithm. A warning is given if this maximum is reached. Often it is not a concern if this maximum is reached.

maxit

Maximum number of iterations for the k-means part of the algorithm.

Gstarts

Facility to supply a list of explicit starting values for the grouping matrix G. Each start consists of a two element list: i giving and integer number the start, and G giving the starting configuration as an indicator matrix.

astarts

Supply explicit starts for the a vectors, as a list.

parallel

logical. Should parallelization over starts for the grouping matrix be used?

random.G

logical. Should the k-means part consider the individuals in a random order?

times.a.multistart

The number of times that random starts for the row scores are used. If == 1, then random starts are only used once for each start of the grouping matrix.

info.level

Verbosity of the output. Options are 1, 2, 3 and 4.

mc.preschedule

Argument to mclapply under Unix.

seed

Random seed for random number generators. Only partially implemented.

LB

logical. Load-balancing used in parallelization or not? Windows only.

reorder.grps

logical. Use the Hungarian algorithm to reorder group names so that the trace of the confusion matrix is maximized.

rescale.a

logical. Rescale row score to length sqrt(2n) if TRUE (after the algorithm has converged).

tol

tolerance tol passed to lsei of the limSolve package. Defaults to sqrt(.Machine$double.eps)

update.G

Logical indicating whether or not to update the G matrix from its starting configuration. Useful when clustering is known apriori or not desired.

Details

See the reference for more details.

Value

Object of class ds with elements:

G

Grouping indicator matrix.

K

Number of groups K.

opt.crit

Optimum value of the criterion.

a

The 2n-vector of row scores.

bstar

The m-vector of object scores.

bkmat

The matrix of group-specific boundary scores for the ratings.

alphamat

The estimated spline coefficients for each group.

iter

The number of iterations used for the optimal random start wrt the grouping matrix.

time.G.start

The number of seconds it took for the algorithm to converge for this optimal random start.

grp

The grouping of the individuals as obtained by the algorithm.

kloss

Loss value from G update (not equivalent to that of ALS updates).

hitrate, confusion

Confusion and hitrates of original data object contained a grouping vector.

loss.G

Optimality criterion values for the random starts of G.

q

The number of ratings in the Likert scale 1:q

time.total

Total time taken for the algorithm over all random starts

call

The function call.

data

The input data object.

Author(s)

Pieter C. Schoonees

References

Schoonees, P.C., Velden, M. van de & Groenen, P.J.F. (2013). Constrained Dual Scaling for Detecting Response Styles in Categorical Data. (EI report series EI 2013-10). Rotterdam: Econometric Institute.

Examples


set.seed(1234)
dat <- cds.sim()
out <- cds(dat)


[Package cds version 1.0.3 Index]