R: Logistic regression models for assessing...

logitreg {analogue}

R Documentation

Logistic regression models for assessing analogues/non-analogues

Description

Fits logistic regression models to each level of group to model the probability of two samples being analogues conditional upon the dissimilarity between the two samples.

Usage

logitreg(object, groups, k = 1, ...)

## Default S3 method:
logitreg(object, groups, k = 1,
         biasReduced = FALSE, ...)

## S3 method for class 'analog'
logitreg(object, groups, k = 1, ...)

## S3 method for class 'logitreg'
summary(object, p = 0.9, ...)

Arguments

`object`	for `logitreg`; a full dissimilarity matrix. For `summary.logitreg` an object of class `"logitreg"`, the result of a call to `logitreg`.
`groups`	factor (or object that can be coerced to one) containing the group membership for each sample in `object`.
`k`	numeric; the `k` closest analogues to use in the model fitting.
`biasReduced`	logical; should Firth's method for bias reduced logistic regression be used to fit the models? If `TRUE`, model fits are performed via `brglm`. The default, `FALSE`, indicates that models will be fitted via the standard `glm` function.
`p`	probability at which to predict the dose needed.
`...`	arguments passed to other methods. These arguments are passed on to `glm` or `brglm`. See their respective helps pages for details. Note that `logitreg` sets internally the `formula`, `data`, and `family` arguments and hence can not be specified by the user.

Details

Fits logistic regression models to each level of group to model the probability of two samples being analogues (i.e. in the same group) conditional upon the dissimilarity between the two samples.

This function can be seen as a way of directly modelling the probability that two sites are analogues, conditional upon dissimilarity, that can also be done less directly using roc and bayesF.

Often, the number of true analogues in the training set is small, both in absolute terms and as a proportion of comparisons. Logistic regression is known to suffer from a small-sample bias. Firth's method of bias reduction is a general solution to this problem and is implemented in logitreg through the brglm package of Ioannis Kosmidis.

Value

logitreg returns an object of class "logitreg"; a list whose components are objects returned by glm. See glm for further details on the returned objects.

The components of this list take their names from group.

For summary.logitreg an object of class "summary.logitreg", a data frame with summary statistics of the model fits. The components of this data frame are:

`In`, `Out`	The number of analogue and non-analogue dissimilarities analysed in each group,
`Est.(Dij)`, `Std.Err`	Coefficient and its standard error for dissimilarity from the logit model,
`Z-value`, `p-value`	Wald statistic and associated p-value for each logit model.
`Dij(p=?)`, `Std.Err(Dij)`	The dissimilarity at which the posterior probability of two samples being analogues is equal to `p`, and its standard error. These are computed using `dose.p`.

Note

The function may generate warnings from function glm.fit. These should be investigated and not simply ignored.

If the message is concerns fitted probabilities being numerically 0 or 1, then check the fitted values of each of the models. These may well be numerically 0 or 1. Heed the warning in glm and read the reference cited therein which may indicate problems with the fitted models, such as (quasi-)complete separation.

Author(s)

Gavin L. Simpson

References

Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika 80, 27-38.

Examples

## load the example data
data(swapdiat, swappH, rlgh)

## merge training and test set on columns
dat <- join(swapdiat, rlgh, verbose = TRUE)

## extract the merged data sets and convert to proportions
swapdiat <- dat[[1]] / 100
rlgh <- dat[[2]] / 100

## fit an analogue matching (AM) model using the squared chord distance
## measure - need to keep the training set dissimilarities
swap.ana <- analog(swapdiat, rlgh, method = "SQchord",
                   keep.train = TRUE)

## fit the ROC curve to the SWAP diatom data using the AM results
## Generate a grouping for the SWAP lakes
METHOD <- if (getRversion() < "3.1.0") {"ward"} else {"ward.D"}
clust <- hclust(as.dist(swap.ana$train), method = METHOD)
grps <- cutree(clust, 6)

## fit the logit models to the analog object
swap.lrm <- logitreg(swap.ana, grps)
swap.lrm

## summary statistics
summary(swap.lrm)

## plot the fitted logit curves
plot(swap.lrm, conf.type = "polygon")

## extract fitted posterior probabilities for training samples
## for the individual groups
fit <- fitted(swap.lrm)
head(fit)

## compute posterior probabilities of analogue-ness for the rlgh
## samples. Here we take the dissimilarities between fossil and
## training samples from the `swap.ana` object rather than re-
## compute them
pred <- predict(swap.lrm, newdata = swap.ana$analogs)
head(pred)

## Bias reduction
## fit the logit models to the analog object
swap.brlrm <- logitreg(swap.ana, grps, biasReduced = TRUE)
summary(swap.brlrm)

[Package analogue version 0.17-6 Index]