roc {analogue} | R Documentation |

Fits Receiver Operator Characteristic (ROC) curves to training set data. Used to determine the critical value of a dissimilarity coefficient that best descriminate between assemblage-types in palaeoecological data sets, whilst minimising the false positive error rate (FPF).

roc(object, groups, k = 1, ...) ## Default S3 method: roc(object, groups, k = 1, thin = FALSE, max.len = 10000, ...) ## S3 method for class 'mat' roc(object, groups, k = 1, ...) ## S3 method for class 'analog' roc(object, groups, k = 1, ...)

`object` |
an R object. |

`groups` |
a vector of group memberships, one entry per sample in the training set data. Can be a factor, and will be coerced to one if supplied vecvtor is not a factor. |

`k` |
numeric; the |

`thin` |
logical; should the points on the ROC curve be thinned? See Details, below. |

`max.len` |
numeric; length of analolgue and non-analogue vectors. Used as limit to thin points on ROC curve to. |

`...` |
arguments passed to/from other methods. |

A ROC curve is generated from the within-group and between-group dissimilarities.

For each level of the grouping vector (`groups`

) the
dissimilarity between each group member and it's k closest analogues
within that group are compared with the k closest dissimilarities
between the non-group member and group member samples.

If one is able to discriminate between members of different group on the basis of assemblage dissimilarity, then the dissimilarities between samples within a group will be small compared to the dissimilarities between group members and non group members.

`thin`

is useful for large problems, where the number of analogue
and non-analogue distances can conceivably be large and thus overflow
the largest number R can work with. This option is also useful to
speed up computations for large problems. If `thin == TRUE`

, then
the larger of the analogue or non-analogue distances is thinned to a
maximum length of `max.len`

. The smaller set of distances is
scaled proportionally. In thinning, we approximate the distribution of
distances by taking `max.len`

(or a fraction of `max.len`

for the smaller set of distances) equally-spaced probability
quantiles of the distribution as a new set of distances.

A list with two components; i, `statistics`

, a summary of ROC
statistics for each level of `groups`

and a combined ROC
analysis, and ii, `roc`

, a list of ROC objects, one per level of
`groups`

. For the latter, each ROC object is a list, with the
following components:

`TPF` |
The true positive fraction. |

`FPE` |
The false positive error |

`optimal` |
The optimal dissimilarity value, asessed where the slope of the ROC curve is maximal. |

`AUC` |
The area under the ROC curve. |

`se.fit` |
Standard error of the AUC estimate. |

`n.in` |
numeric; the number of samples within the current group. |

`n.out` |
numeric; the number of samples not in the current group. |

`p.value` |
The p-value of a Wilcoxon rank sum test on the two sets of dissimilarities. This is also known as a Mann-Whitney test. |

`roc.points` |
The unique dissimilarities at which the ROC curve was evaluated |

`max.roc` |
numeric; the position along the ROC curve at which the slope of the ROC curve is maximal. This is the index of this point on the curve. |

`prior` |
numeric, length 2. Vector of observed prior probabilities of true analogue and true non-analogues in the group. |

`analogue` |
a list with components |

Gavin L. Simpson, based on code from Thomas Lumley to optimise the calculation of the ROC curve.

Brown, C.D., and Davis, H.T. (2006) Receiver operating characteristics
curves and related decision measures: A tutorial. *Chemometrics
and Intelligent Laboratory Systems* **80**, 24–38.

Gavin, D.G., Oswald, W.W., Wahl, E.R. and Williams, J.W. (2003) A
statistical approach to evaluating distance metrics and analog
assignments for pollen records. *Quaternary Research*
**60**, 356–367.

Henderson, A.R. (1993) Assessing test accuracy and its clinical
consequences: a primer for receiver operating characteristic curve
analysis. *Annals of Clinical Biochemistry* **30**,
834–846.

`mat`

for fitting of MAT models.
`bootstrap.mat`

and `mcarlo`

for alternative
methods for selecting critical values of analogue-ness for
dissimilarity coefficients.

## load the example data data(swapdiat, swappH, rlgh) ## merge training and test set on columns dat <- join(swapdiat, rlgh, verbose = TRUE) ## extract the merged data sets and convert to proportions swapdiat <- dat[[1]] / 100 rlgh <- dat[[2]] / 100 ## fit an analogue matching (AM) model using the squared chord distance ## measure - need to keep the training set dissimilarities swap.ana <- analog(swapdiat, rlgh, method = "SQchord", keep.train = TRUE) ## fit the ROC curve to the SWAP diatom data using the AM results ## Generate a grouping for the SWAP lakes METHOD <- if (getRversion() < "3.1.0") {"ward"} else {"ward.D"} clust <- hclust(as.dist(swap.ana$train), method = METHOD) grps <- cutree(clust, 12) ## fit the ROC curve swap.roc <- roc(swap.ana, groups = grps) swap.roc ## draw the ROC curve plot(swap.roc, 1) ## draw the four default diagnostic plots layout(matrix(1:4, ncol = 2)) plot(swap.roc) layout(1)

[Package *analogue* version 0.17-6 Index]