R: PRIM for multivariate data

prim.box {prim}

R Documentation

PRIM for multivariate data

Description

PRIM for multivariate data.

Usage

prim.box(x, y, box.init=NULL, peel.alpha=0.05, paste.alpha=0.01,
     mass.min=0.05, threshold, pasting=TRUE, verbose=FALSE,
     threshold.type=0, y.fun=mean)

prim.hdr(prim, threshold, threshold.type, y.fun=mean)
prim.combine(prim1, prim2, y.fun=mean)

Arguments

`x`	matrix of data values
`y`	vector of response values
`y.fun`	function applied to response y. Default is mean.
`box.init`	initial covering box
`peel.alpha`	peeling quantile tuning parameter
`paste.alpha`	pasting quantile tuning parameter
`mass.min`	minimum mass tuning parameter
`threshold`	threshold tuning parameter(s)
`threshold.type`	threshold direction indicator: 1 = ">= threshold", -1 = "<= threshold", 0 = ">= threshold[1] & <= threshold[2]"
`pasting`	flag for pasting
`verbose`	flag for printing output during execution
`prim`, `prim1`, `prim2`	objects of type `prim`

Details

The data are (\bold{X}_1, Y_1), \dots, (\bold{X}_n, Y_n) where \bold{X}_i is d-dimensional and Y_i is a scalar response. PRIM finds modal (and/or anti-modal) regions in the conditional expectation m(\bold{x}) = \bold{E} (Y | \bold{x}).

In general, Y_i can be real-valued. See vignette("prim"). Here, we focus on the special case for binary Y_i. Let Y_i = 1 when \bold{X}_i \sim F^+; and Y_i = -1 when \bold{X}_i \sim F^- where F^+ and F^- are different distribution functions. In this set-up, PRIM finds the regions where F^+ and F^- are most different.

The tuning parameters peel.alpha and paste.alpha control the ‘patience’ of PRIM. Smaller values involve more patience. Larger values less patience. The peeling steps remove data from a box till either the box mean is smaller than threshold or the box mass is less than mass.min. Pasting is optional, and is used to correct any possible over-peeling. The default values for peel.alpha, paste.alpha and mass.min are taken from Friedman & Fisher (1999).

The type of PRIM estimate is controlled threshold and threshold.type:

threshold.type=1, search for {m(\bold{x}) \geq threshold}.
threshold.type=-1, search for {m(\bold{x}) \leq threshold}.
threshold.type=0, search for both {m(\bold{x}) \geq threshold[1]} and {m(\bold{x}) \leq threshold[2]}.

There are two ways of using PRIM. One is prim.box with pre-specified threshold(s). This is appropriate when the threshold(s) are known to produce good estimates.

On the other hand, if the user doesn't provide threshold values then prim.box computes box sequences which cover the data range. These can then be pruned at a later stage. prim.hdr allows the user to specify many different threshold values in an efficient manner, without having to recomputing the entire PRIM box sequence. prim.combine can be used to join the regions computed from prim.hdr. See the examples below.

Value

– prim.box produces a PRIM estimate, an object of type prim, which is a list with 8 fields:

`x`	list of data matrices
`y`	list of response variable vectors
`y.mean`	list of vectors of box mean for y
`box`	list of matrices of box limits (first row = minima, second row = maxima)
`mass`	vector of box masses (proportion of points inside a box)
`num.class`	total number of PRIM boxes
`num.hdr.class`	total number of PRIM boxes which form the HDR
`ind`	threshold direction indicator: 1 = ">= threshold", -1 = "<=threshold"

The above lists have num.class fields, one for each box.

– prim.hdr takes a prim object and prunes it using different threshold values. Returns another prim object. This is much faster for experimenting with different threshold values than calling prim.box each time.

– prim.combine combines two prim objects into a single prim object. Usually used in conjunction with prim.hdr. See examples below.

Examples

data(quasiflow)
qf <- quasiflow[1:1000,1:2]
qf.label <- quasiflow[1:1000,4]

## using only one command
thr <- c(0.25, -0.3)
qf.prim1 <- prim.box(x=qf, y=qf.label, threshold=thr, threshold.type=0)

## alternative - requires more commands but allows more control
## in intermediate stages
qf.primp <- prim.box(x=qf, y=qf.label, threshold.type=1)
   ## default threshold too low, try higher one

qf.primp.hdr <- prim.hdr(prim=qf.primp, threshold=0.25, threshold.type=1)
qf.primn <- prim.box(x=qf, y=qf.label, threshold=-0.3, threshold.type=-1)
qf.prim2 <- prim.combine(qf.primp.hdr, qf.primn)

plot(qf.prim1, alpha=0.2)   ## orange=x1>x2, blue x2<x1
points(qf[qf.label==1,], cex=0.5)
points(qf[qf.label==-1,], cex=0.5, col=2)

[Package prim version 1.0.21 Index]