predict.vlmc {VLMC}R Documentation

Prediction of VLMC for (new) Series

Description

Compute predictions on a fitted VLMC object for each (but the first) element of another discrete time series. Computes by default a matrix of prediction probabilities. The argument type allows other predictions such as the most probable "class" or "response", the context length (tree "depth"), or an "ID" of the corresponding context.

Usage

## S3 method for class 'vlmc'
predict(object, newdata,
         type = c("probs", "class","response", "id.node", "depth", "ALL"),
         se.fit = FALSE,
         allow.subset = TRUE, check.alphabet=TRUE,
         ...)
## S3 method for class 'vlmc'
fitted(object, ...)

Arguments

object

typically the result of vlmc(..).

newdata

a discrete “time series”, a numeric, character or factor, as the dts argument of vlmc(.).

type

character indicating the type of prediction required, options given in the Usage secion above, see also the Value section below. The default "probs" returns a matrix of prediction probabilties, whereas "class" or "response" give the corresponding most probable class. The value of this argument can be abbreviated.

se.fit

a switch indicating if standard errors are required.
— NOT YET supported — .

allow.subset

logical; if TRUE, newdata may not have all different “alphabet letters” used in x.

check.alphabet

logical; if TRUE, consistency of newdata's alphabet with those of x is checked.

...

(potentially further arguments) required by generic.

Value

Depending on the type argument,

"probs"

an n \times m matrix pm of (prediction) probabilities, i.e., all the rows of pm sum to 1.

pm[i,k] is \hat P[Y_i = k | Y_{i-1},\dots] (and is therefore NA for i=1). The dimnames of pm are the values of newdata[] and the alphabet letters k.

"class", "response"

the corresponding most probable value of Y[]; as factor for "class" and as integer in 0:(m-1) for type = "response". If there is more than one most probable value, the first one is chosen.

"id.node"

an (integer) “ID” of the current context (= node of the tree represented VLMC).

"depth"

the context length, i.e., the depth of the Markov chain, at the current observation (of newdata).

"ALL"

an object of class "predict.vlmc", a list with the following components,

ID

integer vector as for type = "id.node",

probs

prediction probability matrix, as above,

flags

integer vector, non-zero for particular states only, rather for debugging.

ctxt

character, ctxt[i] a string giving the context (backwards) for newdata[i], using alphabet letters.

fitted

character with fitted values, i.e., the alphabet letter with the highest probability, using max.col where ties are broken at random.

alpha, alpha.len

the alphabet (single string) and its length.

which has its own print method (print.predict.vlmc).

Note

The predict method and its possible arguments may still be developed, and we are considering to return the marginal probabilities instead of NA for the first value(s).

The print method print.predict.vlmc uses fractions from package MASS to display the probabilities Pr[X = j], for j \in \{0,1,\dots\}, as these are rational numbers, shown as fractions of integers.

See Also

vlmc and residuals.vlmc. For simulation, simulate.vlmc.

Examples

f1 <- c(1,0,0,0)
f2 <- rep(1:0,2)
(dt2 <- rep(c(f1,f1,f2,f1,f2,f2,f1),2))

(vlmc.dt2c15  <- vlmc(dt2, cutoff = 1.5))
draw(vlmc.dt2c15)

## Fitted Values:
all.equal(predict(vlmc.dt2c15, dt2), predict(vlmc.dt2c15))
(pa2c15 <- predict(vlmc.dt2c15, type = "ALL"))

## Depth = context length  ([1] : NA) :
stopifnot(nchar(pa2c15 $ ctxt)[-1] ==
          predict(vlmc.dt2c15, type = "depth")[-1])

same <- (ff1 <- pa2c15 $ fitted) ==
        (ff2 <- int2alpha(predict(vlmc.dt2c15, type ="response"), alpha="01"))
which(!same) #-> some are different, since max.col() breaks ties at random!

ndt2 <- c(rep(0,6),f1,f1,f2)
predict(vlmc.dt2c15, ndt2, "ALL")

(newdt2 <- sample(dt2, 17))
pm <- predict(vlmc.dt2c15, newdt2, allow.subset = TRUE)
summary(apply(pm, 1, sum))# all 1

predict(vlmc.dt2c15, newdt2, type = "ALL")

data(bnrf1)
(vbnrf <- vlmc(bnrf1EB))
(pA <- predict(vbnrf, bnrf1EB[1:24], type = "ALL"))
 pc <- predict(vbnrf, bnrf1EB[1:24], type = "class")
 pr <- predict(vbnrf, bnrf1EB[1:24], type = "resp")
stopifnot(as.integer  (pc[-1])   == 1 + pr[-1],
          as.character(pc[-1]) == strsplit(vbnrf$alpha,NULL)[[1]][1 + pr[-1]])

##-- Example of a "perfect" fit -- just for illustration:
##			    the default, thresh = 2 doesn't fit perfectly(i=38)
(vlmc.dt2c0th1 <- vlmc(dt2, cutoff = 0, thresh = 1))

## "Fitted" = "Data" (but the first which can't be predicted):
stopifnot(dt2[-1] == predict(vlmc.dt2c0th1,type = "response")[-1])

[Package VLMC version 1.4-3-1 Index]