R: Confidence Intervals for Interest Measures for Association...

confint {arules}

R Documentation

Confidence Intervals for Interest Measures for Association Rules

Description

Defines a method to compute confidence intervals for interest measures for association rules.

Usage

## S3 method for class 'rules'
confint(
  object,
  parm = "oddsRatio",
  level = 0.95,
  measure = NULL,
  side = c("two.sided", "lower", "upper"),
  method = NULL,
  replications = 1000,
  smoothCounts = 0,
  transactions = NULL,
  ...
)

Arguments

`object`	an object of class rules.
`parm`, `measure`	name of the interest measures (see `interestMeasure()`). `measure` can be used instead of `parm`.
`level`	the confidence level required.
`side`	Should a two-sided confidence interval or a one-sided limit be returned? Lower returns an interval with only a lower limit and upper returns an interval with only an upper limit.
`method`	method to construct the confidence interval. The available methods depends on the measure and the most common method is used by default.
`replications`	number of replications for method `"simulation"`. Ignored for other methods.
`smoothCounts`	pseudo count for addaptive smoothing (Laplace smoothing). Often a pseudo counts of .5 is used for smoothing (see Detail Section).
`transactions`	if the rules object does not contain sufficient quality information, then a set of transactions to calculate the confidence interval for can be specified.
`...`	Additional parameters are ignored with a warning.

Details

This method creates a contingency table for each rule and then constructs a confidence interval for the specified measures.

Fast confidence interval approximations are currently available for the measures "support", "count", "confidence", "lift", "oddsRatio", and "phi". For all other measures, bootstrap sampling from a multinomial distribution is used.

Haldan-Anscombe correction (Haldan, 1940; Anscombe, 1956) to avoids issues with zero counts can be specified by smoothCounts = 0.5. Here .5 is added to each count in the contingency table.

Value

Returns a matrix with with one row for each rule and the two columns named "LL" and "UL" with the interval boundaries. The matrix has the following additional attributes:

`measure`	the interest measure.
`level`	the confidence level
`side`	the confidence level
`smoothCounts`	used count smoothing.
`method`	name of the method to create the interval
`desc`	description of the used method to calculate the confidence interval. The mentioned references can be found below.

Author(s)

Michael Hahsler

References

Wilson, E. B. (1927). "Probable inference, the law of succession, and statistical inference". Journal of the American Statistical Association, 22 (158): 209-212. doi:10.1080/01621459.1927.10502953

Clopper, C.; Pearson, E. S. (1934). "The use of confidence or fiducial limits illustrated in the case of the binomial". Biometrika, 26 (4): 404-413. doi:10.1093/biomet/26.4.404

Doob, J. L. (1935). "The Limiting Distributions of Certain Statistics". Annals of Mathematical Statistics, 6: 160-169. doi:10.1214/aoms/1177732594

Fisher, R.A. (1962). "Confidence limits for a cross-product ratio". Australian Journal of Statistics, 4, 41.

Woolf, B. (1955). "On estimating the relation between blood group and diseases". Annals of Human Genetics, 19, 251-253.

Haldane, J.B.S. (1940). "The mean and variance of the moments of chi-squared when used as a test of homogeneity, when expectations are small". Biometrika, 29, 133-134.

Anscombe, F.J. (1956). "On estimating binomial response relations". Biometrika, 43, 461-464.

Examples

data("Income")

# mine some rules with the consequent "language in home=english"
rules <- apriori(Income, parameter = list(support = 0.5),
  appearance = list(rhs = "language in home=english"))

# calculate the confidence interval for the rules' odds ratios.
# note that we use Haldane-Anscombe correction (with smoothCounts = .5)
# to avoid issues with 0 counts in the contingency table.
ci <- confint(rules, "oddsRatio",  smoothCounts = .5)
ci

# We add the odds ratio (with Haldane-Anscombe correction)
# and the confidence intervals to the quality slot of the rules.
quality(rules) <- cbind(
  quality(rules),
  oddsRatio = interestMeasure(rules, "oddsRatio", smoothCounts = .5),
  oddsRatio = ci)

rules <- sort(rules, by = "oddsRatio")
inspect(rules)

# use confidence intervals for lift to find rules with a lift significantly larger then 1.
# We set the confidence level to 95%, create a one-sided interval and check
# if the interval does not cover 1 (i.e., the lower limit is larger than 1).
ci <- confint(rules, "lift", level = 0.95, side = "lower")
ci

inspect(rules[ci[, "LL"] > 1])

[Package arules version 1.7-7 Index]