roc {analogue} | R Documentation |

## ROC curve analysis

### Description

Fits Receiver Operator Characteristic (ROC) curves to training set data. Used to determine the critical value of a dissimilarity coefficient that best descriminate between assemblage-types in palaeoecological data sets, whilst minimising the false positive error rate (FPF).

### Usage

```
roc(object, groups, k = 1, ...)
## Default S3 method:
roc(object, groups, k = 1, thin = FALSE,
max.len = 10000, ...)
## S3 method for class 'mat'
roc(object, groups, k = 1, ...)
## S3 method for class 'analog'
roc(object, groups, k = 1, ...)
```

### Arguments

`object` |
an R object. |

`groups` |
a vector of group memberships, one entry per sample in the training set data. Can be a factor, and will be coerced to one if supplied vecvtor is not a factor. |

`k` |
numeric; the |

`thin` |
logical; should the points on the ROC curve be thinned? See Details, below. |

`max.len` |
numeric; length of analolgue and non-analogue vectors. Used as limit to thin points on ROC curve to. |

`...` |
arguments passed to/from other methods. |

### Details

A ROC curve is generated from the within-group and between-group dissimilarities.

For each level of the grouping vector (`groups`

) the
dissimilarity between each group member and it's k closest analogues
within that group are compared with the k closest dissimilarities
between the non-group member and group member samples.

If one is able to discriminate between members of different group on the basis of assemblage dissimilarity, then the dissimilarities between samples within a group will be small compared to the dissimilarities between group members and non group members.

`thin`

is useful for large problems, where the number of analogue
and non-analogue distances can conceivably be large and thus overflow
the largest number R can work with. This option is also useful to
speed up computations for large problems. If `thin == TRUE`

, then
the larger of the analogue or non-analogue distances is thinned to a
maximum length of `max.len`

. The smaller set of distances is
scaled proportionally. In thinning, we approximate the distribution of
distances by taking `max.len`

(or a fraction of `max.len`

for the smaller set of distances) equally-spaced probability
quantiles of the distribution as a new set of distances.

### Value

A list with two components; i, `statistics`

, a summary of ROC
statistics for each level of `groups`

and a combined ROC
analysis, and ii, `roc`

, a list of ROC objects, one per level of
`groups`

. For the latter, each ROC object is a list, with the
following components:

`TPF` |
The true positive fraction. |

`FPE` |
The false positive error |

`optimal` |
The optimal dissimilarity value, asessed where the slope of the ROC curve is maximal. |

`AUC` |
The area under the ROC curve. |

`se.fit` |
Standard error of the AUC estimate. |

`n.in` |
numeric; the number of samples within the current group. |

`n.out` |
numeric; the number of samples not in the current group. |

`p.value` |
The p-value of a Wilcoxon rank sum test on the two sets of dissimilarities. This is also known as a Mann-Whitney test. |

`roc.points` |
The unique dissimilarities at which the ROC curve was evaluated |

`max.roc` |
numeric; the position along the ROC curve at which the slope of the ROC curve is maximal. This is the index of this point on the curve. |

`prior` |
numeric, length 2. Vector of observed prior probabilities of true analogue and true non-analogues in the group. |

`analogue` |
a list with components |

### Author(s)

Gavin L. Simpson, based on code from Thomas Lumley to optimise the calculation of the ROC curve.

### References

Brown, C.D., and Davis, H.T. (2006) Receiver operating characteristics
curves and related decision measures: A tutorial. *Chemometrics
and Intelligent Laboratory Systems* **80**, 24–38.

Gavin, D.G., Oswald, W.W., Wahl, E.R. and Williams, J.W. (2003) A
statistical approach to evaluating distance metrics and analog
assignments for pollen records. *Quaternary Research*
**60**, 356–367.

Henderson, A.R. (1993) Assessing test accuracy and its clinical
consequences: a primer for receiver operating characteristic curve
analysis. *Annals of Clinical Biochemistry* **30**,
834–846.

### See Also

`mat`

for fitting of MAT models.
`bootstrap.mat`

and `mcarlo`

for alternative
methods for selecting critical values of analogue-ness for
dissimilarity coefficients.

### Examples

```
## load the example data
data(swapdiat, swappH, rlgh)
## merge training and test set on columns
dat <- join(swapdiat, rlgh, verbose = TRUE)
## extract the merged data sets and convert to proportions
swapdiat <- dat[[1]] / 100
rlgh <- dat[[2]] / 100
## fit an analogue matching (AM) model using the squared chord distance
## measure - need to keep the training set dissimilarities
swap.ana <- analog(swapdiat, rlgh, method = "SQchord",
keep.train = TRUE)
## fit the ROC curve to the SWAP diatom data using the AM results
## Generate a grouping for the SWAP lakes
METHOD <- if (getRversion() < "3.1.0") {"ward"} else {"ward.D"}
clust <- hclust(as.dist(swap.ana$train), method = METHOD)
grps <- cutree(clust, 12)
## fit the ROC curve
swap.roc <- roc(swap.ana, groups = grps)
swap.roc
## draw the ROC curve
plot(swap.roc, 1)
## draw the four default diagnostic plots
layout(matrix(1:4, ncol = 2))
plot(swap.roc)
layout(1)
```

*analogue*version 0.17-6 Index]