vlmc {VLMC} | R Documentation |
Fit a Variable Length Markov Chain (VLMC)
Description
Fit a Variable Length Markov Chain (VLMC) to a discrete time series,
in basically two steps:
First a large Markov Chain is generated containing (all if
threshold.gen = 1
) the context states of the time series. In
the second step, many states of the MC are collapsed by pruning
the corresponding context tree.
Currently, the “alphabet” may contain can at most 26 different “character”s.
Usage
vlmc(dts,
cutoff.prune = qchisq(alpha.c, df=max(.1,alpha.len-1),lower.tail=FALSE)/2,
alpha.c = 0.05,
threshold.gen = 2,
code1char = TRUE, y = TRUE, debug = FALSE, quiet = FALSE,
dump = 0, ctl.dump = c(width.ct = 1+log10(n), nmax.set = -1) )
is.vlmc(x)
## S3 method for class 'vlmc'
print(x, digits = max(3, getOption("digits") - 3), ...)
Arguments
dts |
a discrete “time series”; can be a numeric, character or factor. |
cutoff.prune |
non-negative number; the cutoff used for pruning;
defaults to half the |
alpha.c |
number in (0,1) used to specify |
threshold.gen |
integer |
code1char |
logical; if true (default), the data |
y |
logical; if true (default), the data |
debug |
logical; should debugging info be printed to stderr. |
quiet |
logical; if true, don't print some warnings. |
dump |
integer in |
ctl.dump |
integer of length 2, say |
x |
a fitted |
digits |
integer giving the number of significant digits for printing numbers. |
... |
potentially further arguments [Generic]. |
Value
A "vlmc"
object, basically a list with components
nobs |
length of data series when fit. (was named |
threshold.gen , cutoff.prune |
the arguments (or their defaults). |
alpha.len |
the alphabet size. |
alpha |
the alphabet used, as one string. |
size |
a named integer vector of length (>=) 4, giving characteristic sizes of the fitted VLMC. Its named components are
|
vlmc.vec |
integer vector, containing (an encoding of) the fitted VLMC tree. |
y |
if |
call |
the |
Note
Set cutoff = 0, thresh = 1
for getting a “perfect fit”,
i.e. a VLMC which perfectly re-predicts the data (apart from the first
observation). Note that even with cutoff = 0
some pruning may
happen, for all (terminal) nodes with \delta
=0.
Author(s)
Martin Maechler
References
Buhlmann P. and Wyner A. (1998) Variable Length Markov Chains. Annals of Statistics 27, 480–513.
Mächler M. and Bühlmann P. (2004) Variable Length Markov Chains: Methodology, Computing, and Software. J. Computational and Graphical Statistics 2, 435–455.
Mächler M. (2004) VLMC — Implementation and R interface; working paper.
See Also
draw.vlmc
,
entropy
, simulate.vlmc
for “VLMC bootstrapping”.
Examples
f1 <- c(1,0,0,0)
f2 <- rep(1:0,2)
(dt1 <- c(f1,f1,f2,f1,f2,f2,f1))
(vlmc.dt1 <- vlmc(dt1))
vlmc(dt1, dump = 1,
ctl.dump = c(wid = 3, nmax = 20), debug = TRUE)
(vlmc.dt1c01 <- vlmc(dts = dt1, cutoff.prune = .1, dump=1))
data(presidents)
dpres <- cut(presidents, c(0,45,70, 100)) # three values + NA
table(dpres <- factor(dpres, exclude = NULL)) # NA as 4th level
levels(dpres)#-> make the alphabet -> warning
vlmc.pres <- vlmc(dpres, debug = TRUE)
vlmc.pres
## alphabet & and its length:
vlmc.pres$alpha
stopifnot(
length(print(strsplit(vlmc.pres$alpha,NULL)[[1]])) == vlmc.pres$ alpha.len
)
## You now can use larger alphabets (up to 95) letters:
set.seed(7); it <- sample(40, 20000, replace=TRUE)
v40 <- vlmc(it)
v40
## even larger alphabets now give an error:
il <- sample(100, 10000, replace=TRUE)
ee <- tryCatch(vlmc(il), error= function(e)e)
stopifnot(is(ee, "error"))