weissData {eglhmm} | R Documentation |
Data from “An Introduction to Discrete-Valued Time Series”
Description
Data sets from the book “An Introduction to Discrete-Valued Time Series” by Christian H. Weiß.
The data sets are named Bovine
, Cryptosporidiosis
,
Downloads
, EricssonB_Jul2
, FattyLiver
,
FattyLiver2
, goldparticle380
,
Hanta
, InfantEEGsleepstates
, IPs
,
LegionnairesDisease
, OffshoreRigcountsAlaska
,
PriceStability
, Strikes
and WoodPeweeSong
.
Format
Each data set is a data frame with a single column named "y"
.
-
Bovine
There are 8419 rows. The column"y"
is a factor, with levels"a","c","g","t"
, the DNA “bases”. It constitutes the DNA sequence of the bovine leukemia virus. -
Cryptosporidiosis
There are 365 rows. The column"y"
is a numeric (integer) vector. It consists of weekly counts of new infections, in Germany in the years 2002 to 2008. The counts vary between 2 and 78. -
Downloads
There are 267 rows. The column"y"
is a numeric (integer) vector. It consists of the daily number of downloads of a TEX editor for the period from June 2006 to February 2007. These counts vary between 0 and 14. -
EricssonB_Jul2
There are 460 rows. The column"y"
is a numeric (integer) vector. It consists of the number of transactions per minute, of the Ericsson B stock, between 9:35 and 17:14 on 2 July, 2002. The counts vary between 0 and 37. -
FattyLiver
There are 928 rows. The column"y"
is a numeric (binary) vector. The value 1 indicates that “the considered diagnosis cannot be excluded for the current patient; that is, suitable countermeasures are required”, and the value 0 indicates that this is not so. The values refer to different patients, examined sequentially over time. -
FattyLiver2
There are 449 rows. The column"y"
is a numeric (binary) vector as forFattyLiver
. (Different examiner, different sequence of patients.) -
goldparticle380
There are 380 rows. The column"y"
is a numeric (integer) vector of counts of gold particles measured in a fixed volume element of a colloidal solution over time. The count values vary because of the Brownian motion of the particles. They vary between 0 and 7. -
Hanta
There are 52 rows. The column"y"
is a numeric (integer) vector consisting of the weekly number of territorial units (out ofn = 38
territorial units with at least one new case of a hantavirus infections, in the year 2011. The numbers vary between 0 and 11. -
InfantEEGsleepstates
There are 107 rows. The column"y"
is a factor with levelsqt, qh, tr, al, ah, aw
. The level"aw"
does not actually appear. -
IPs
There are 241 rows. The column"y"
is a numeric (integer) vector of the counts of different IP addresses registered at a web server within periods of length two minutes, “assumed” to have been observed between 10:00 a.m. and 6:00 p.m. on 29 November 2005. The counts vary between 0 and 8. -
LegionnairesDisease
There are 365 rows. The column"y"
is a numeric (integer) vector of weekly counts of new infections in Germany, in the years 2002 to 2008. The counts vary between 0 and 26. -
OffshoreRigcountsAlaska
There are 417 rows. The column"y"
is a numeric (integer) vector of weekly counts of active rotary drilling rigs in Alaska for the period 1990 to 1997. The counts vary between 0 and 6. -
PriceStability
There are 152 rows. The column"y"
is a numeric (integer) vector of monthly counts of countries (out of a group of 17 countries) that showed stable prices (that is, an inflation rate below 2%), in the period from January 2000 to December 2006. The counts vary between 0 and 17. -
Strikes
There are 108 rows. The column"y"
is a numeric (integer) vector of the monthly counts of work stoppages (strikes and lock-outs) of 1000 or more workers in the period 1994 to 2002. The counts vary between 0 and 14. -
WoodPeweeSong
There are 1327 rows. The column"y"
is a factor with levels"1", "2", "3"
corresponding to the three different “phrases” of wood wewee song. The time series comprises a sequence of observations of the “morning twilight” song of the wood pewee.
Details
For detailed information about each of these data sets, see the book cited in the References.
Note that the data sets Cryptosporidiosis
and LegionnairesDisease
are actually
called
Cryptosporidiosis_02-08
and
LegionnairesDisease_02-08
in the given reference.
The
“suffixes” were removed since the minus sign causes
problems in a variable name in R
.
Source
These data sets were kindly provided by Prof. Christian
H. Weiß. The package author is also pleased
to acknowledge the kind permission granted by Prof. Kurt
Brännäs (Professor Emeritus of Economics at
Umeå University) to include the Ericsson time series
data set (EricssonB_Jul2
).
References
Christian H. Weiß (2018). An Introduction to Discrete-Valued Time Series. Chichester: John Wiley & Sons.
Examples
## Not run:
fit1 <- hmm(WoodPeweeSong,K=2,verbose=TRUE)
# EM converges in 6 steps --- suspicious.
set.seed(321)
fit2 <- hmm(WoodPeweeSong,K=2,verbose=TRUE,rand.start=list(tpm=TRUE,Rho=TRUE))
# 52 steps --- note the huge difference between fit1$log.like and fit2$log.like!
set.seed(321)
fit3 <- hmm(WoodPeweeSong,K=2,verbose=TRUE,method="bf",
rand.start=list(tpm=TRUE,Rho=TRUE))
# log likelihood essentially the same as for fit2
## End(Not run)