rRegMatch {AcrossTic}R Documentation

Regular matching with minimum-cost spanning subgraphs

Description

This function matches each observation in X to r others so as to minimize the total distance across all matches. Optionally it computes the cross-count statistic – the number of matches associated with two observations from different classes.

Usage

rRegMatch(X, r, y = NULL, dister = "daisy", dist.args = list(), keep.X = nrow(X) < 100, 
    keep.D = (dister == "treeClust.dist"), relax = (N >= 100), thresh = 1e-6)

Arguments

X

Matrix or data frame of data, or inter-point distances represented in an object inheriting from "dist"

r

Integer number of matches. The matching is "regular" in that every observation is matched to exactly r others (or, if relax=TRUE, every observation is matched to others with weights in [0, 1] that add up to r).

y

Vector of class membership indices. This is used to compute the cross-count statistic. Optional.

dister

Function to compute inter-point distances. This must take as its first argument a matrix of data argument name x. Default: daisy. If all the columns are numeric, this produces unweighted Euclidean distance by default.

dist.args

List of argument to the dister function.

keep.X

If TRUE, and X was supplied, keep the X matrix in the output object. Default: TRUE if X was supplied and also nrow (X) < 100.

keep.D

If TRUE, keep the distance object in the output. Default: TRUE if the treeClust.dist function is being used to compute the distances (since in that case the distances are random).

relax

If FALSE, solve the exact problem where each observation gets exactly r non-zero pairings, each with weight 1. If TRUE, solve the relaxed problem, where each observation has at least r non-zero pairings, each with its own weight between 0 and 1, the weights adding up to r. The exact problem gets very slow with large samples.

thresh

Weights smaller than this are considered to be exactly zero. Default: 1e-6.

Details

This function solves an optimization problem to extract the set of pairings which make the total weight (distance) associated with all pairings a minimum, subject to the constraint that every observation is paired to r others (or to enough others to have a total pair-weight of r).

Value

A list of class AcrossTic, with elements:

matches

A two-column matrix, each row gving the indices of one matched pair.

total.dist

total distance across all matches – the optimal value from the optimization problem.

status

Status of result – if the optimum was found, a vector of length 1 with name "TM_OPTIMAL_SOLUTION_FOUND" and value 0.

time.required

Time taken to run the optimization, as reported by system.time().

call

The call made to the function, from match.call.

r

The value of r, as supplied at the time of the call.

dister

The value of dister, as supplied at the time of the call.

dist.args

The value of dist.args, as supplied at the time of the call.

X.supplied

Logical indicating whether X was supplied.

X

X matrix, if it was available and asked to be kept

y

y vector, as supplied

edge.weights

vector, of length nrow(matches), giving the distances for each match. For the exact problem (relax = FALSE), each value is equal to 0 or 1. For the relaxed problem (relax = TRUE), each value is between 0 and 1, with values summing to (r * nrow(X) / 2).

cross.sum

Sum of matcher.costs across all matches

cross.count

Number of matches between two observations of different classes, possibly weighted

nrow.X, ncol.X

dimension of X matrix

Author(s)

David Ruth and Sam Buttrey

References

David Ruth, "A new multivariate two-sample test using regular minimum-weight spanning subgraphs," J. Stat. Distributions and Applications (2014)

Examples

set.seed (123)
X <- matrix (rnorm (100), 50, 2) # Create data...
y <- rep (c (1, 2), each=25) # ...and class membership
rRegMatch (X, r = 3, y = y)
## Not run: plot (rRegMatch (X, r = 3, y = y)) # to see picture

[Package AcrossTic version 1.0-3 Index]