kleinberg {bursts}R Documentation

Detects periods of increased activity in a stream of events

Description

This function attempts to identify ‘bursts’ of activity in a series of discrete events that take place at known times, based on an infinite hidden Markov model. An optimal state sequence is computed that balances the total trasition cost against the probability of the observed event timing.

Usage

kleinberg(offsets, s = 2, gamma = 1)

Arguments

offsets

a vector containing the time offsets (numeric or Date) of the relevant events.

s

the base of the exponent used to determine event frequencies in a given state.

gamma

a coefficient that modifies the cost of a transition to a higher state.

Details

The model is a hidden Markov process in which, after each event, the state of the system probablistically determines how much time will pass until the next event occurs. While the system is in state i, the gaps between events are assumed to be drawn from an exponential distribution with expected value proportional to s ** -i. The value of s may be modified; higher values increase the strictness of the algorithm's criterion for how dramatic an increase of activity has to be to be considered a ‘burst.’

The cost of a state change is proportional to the increase in state number; this proportion can be modified by setting the parameter gamma. Higher values mean roughly that bursts must be sustained over longer periods of time in order for the algorithm to recognize them.

Note that the algorithm will not work if there are two events that occur at the same time.

Value

Returns a data frame of class ‘bursts.’ Each row represents a (maximal) interval of time in which the system was at or above a given level of activity. The first row always indicates a period at level 1+ lasting from the time of the first event to the time of the last; subsequent rows always indicate levels greater than 1, and thus represent ‘bursts.’

The ‘start’ time of a burst is defined as the time of the event that precedes the state change. This is so that the time interval of a burst contains the first unusually short gap between events, rather than beginning immediately after it.

As per Kleinberg, if the system goes through a burst that increases the level by more than 1, a separate row is created for each level that it goes through.

Author(s)

Jeff Binder

References

Kleinberg, J. (2003). "Bursty and Hierarchical Structure in Streams." Data Mining and Knowledge Discovery 7: 373-397. <doi:10.1023/A:1024940629314>

http://www.cs.cornell.edu/home/kleinber/kdd02.html

See Also

plot.bursts

Examples

offsets <- c(seq(0, 400, 100), seq(410, 450, 5), seq(451, 470, 2),
             seq(480, 600, 5), 700, seq(710, 800, 5), 900, 1000)
bursts <- kleinberg(offsets)
plot.bursts(bursts)

[Package bursts version 1.0-2 Index]