| weissData {eglhmm} | R Documentation |
Data from “An Introduction to Discrete-Valued Time Series”
Description
Data sets from the book “An Introduction to Discrete-Valued Time Series” by Christian H. Weiß.
The data sets are named Bovine, Cryptosporidiosis,
Downloads, EricssonB_Jul2, FattyLiver,
FattyLiver2, goldparticle380,
Hanta, InfantEEGsleepstates, IPs,
LegionnairesDisease, OffshoreRigcountsAlaska,
PriceStability, Strikes and WoodPeweeSong.
Format
Each data set is a data frame with a single column named "y".
-
BovineThere are 8419 rows. The column"y"is a factor, with levels"a","c","g","t", the DNA “bases”. It constitutes the DNA sequence of the bovine leukemia virus. -
CryptosporidiosisThere are 365 rows. The column"y"is a numeric (integer) vector. It consists of weekly counts of new infections, in Germany in the years 2002 to 2008. The counts vary between 2 and 78. -
DownloadsThere are 267 rows. The column"y"is a numeric (integer) vector. It consists of the daily number of downloads of a TEX editor for the period from June 2006 to February 2007. These counts vary between 0 and 14. -
EricssonB_Jul2There are 460 rows. The column"y"is a numeric (integer) vector. It consists of the number of transactions per minute, of the Ericsson B stock, between 9:35 and 17:14 on 2 July, 2002. The counts vary between 0 and 37. -
FattyLiverThere are 928 rows. The column"y"is a numeric (binary) vector. The value 1 indicates that “the considered diagnosis cannot be excluded for the current patient; that is, suitable countermeasures are required”, and the value 0 indicates that this is not so. The values refer to different patients, examined sequentially over time. -
FattyLiver2There are 449 rows. The column"y"is a numeric (binary) vector as forFattyLiver. (Different examiner, different sequence of patients.) -
goldparticle380There are 380 rows. The column"y"is a numeric (integer) vector of counts of gold particles measured in a fixed volume element of a colloidal solution over time. The count values vary because of the Brownian motion of the particles. They vary between 0 and 7. -
HantaThere are 52 rows. The column"y"is a numeric (integer) vector consisting of the weekly number of territorial units (out ofn = 38territorial units with at least one new case of a hantavirus infections, in the year 2011. The numbers vary between 0 and 11. -
InfantEEGsleepstatesThere are 107 rows. The column"y"is a factor with levelsqt, qh, tr, al, ah, aw. The level"aw"does not actually appear. -
IPsThere are 241 rows. The column"y"is a numeric (integer) vector of the counts of different IP addresses registered at a web server within periods of length two minutes, “assumed” to have been observed between 10:00 a.m. and 6:00 p.m. on 29 November 2005. The counts vary between 0 and 8. -
LegionnairesDiseaseThere are 365 rows. The column"y"is a numeric (integer) vector of weekly counts of new infections in Germany, in the years 2002 to 2008. The counts vary between 0 and 26. -
OffshoreRigcountsAlaskaThere are 417 rows. The column"y"is a numeric (integer) vector of weekly counts of active rotary drilling rigs in Alaska for the period 1990 to 1997. The counts vary between 0 and 6. -
PriceStabilityThere are 152 rows. The column"y"is a numeric (integer) vector of monthly counts of countries (out of a group of 17 countries) that showed stable prices (that is, an inflation rate below 2%), in the period from January 2000 to December 2006. The counts vary between 0 and 17. -
StrikesThere are 108 rows. The column"y"is a numeric (integer) vector of the monthly counts of work stoppages (strikes and lock-outs) of 1000 or more workers in the period 1994 to 2002. The counts vary between 0 and 14. -
WoodPeweeSongThere are 1327 rows. The column"y"is a factor with levels"1", "2", "3"corresponding to the three different “phrases” of wood wewee song. The time series comprises a sequence of observations of the “morning twilight” song of the wood pewee.
Details
For detailed information about each of these data sets, see the book cited in the References.
Note that the data sets Cryptosporidiosis
and LegionnairesDisease are actually
called
Cryptosporidiosis_02-08 and
LegionnairesDisease_02-08 in the given reference.
The
“suffixes” were removed since the minus sign causes
problems in a variable name in R.
Source
These data sets were kindly provided by Prof. Christian
H. Weiß. The package author is also pleased
to acknowledge the kind permission granted by Prof. Kurt
Brännäs (Professor Emeritus of Economics at
Umeå University) to include the Ericsson time series
data set (EricssonB_Jul2).
References
Christian H. Weiß (2018). An Introduction to Discrete-Valued Time Series. Chichester: John Wiley & Sons.
Examples
## Not run:
fit1 <- hmm(WoodPeweeSong,K=2,verbose=TRUE)
# EM converges in 6 steps --- suspicious.
set.seed(321)
fit2 <- hmm(WoodPeweeSong,K=2,verbose=TRUE,rand.start=list(tpm=TRUE,Rho=TRUE))
# 52 steps --- note the huge difference between fit1$log.like and fit2$log.like!
set.seed(321)
fit3 <- hmm(WoodPeweeSong,K=2,verbose=TRUE,method="bf",
rand.start=list(tpm=TRUE,Rho=TRUE))
# log likelihood essentially the same as for fit2
## End(Not run)